Updated 1 year ago
Performance Evaluation for working with large binary files with history
Updated 1 year ago
Repo for running benchmarks against Git LFS, DVC, and LakeFS.
Updated 3 months ago
Blog Authorship Corpus Over 600,000 posts from more than 19 thousand bloggers. Obtained from Kaggle.
Updated 9 months ago
Updated 1 year ago
Updated 3 months ago
Updated 8 months ago
Try Meta's Code Llama models on your laptop or cloud VM in seconds.
Updated 8 months ago
Add custom views to your repository by following the instructions in this template.
Updated 2 months ago
An app to visually summarize any CSV data files stored in the data folder.
Updated 1 month ago
Falcon RefinedWeb is a massive English web dataset built by TII and released under an ODC-By 1.0 license.
Updated 8 months ago
19k+ players and 110 attributes extracted from the latest edition of FIFA. Obtained from Kaggle.
Updated 6 months ago
Simplify the LLM finetuning workflow in Google Colab with XetHub!
Updated 3 months ago
Stream the Flickr30k image dataset on XetHub in seconds. Flickr30k is the benchmark for sentence-based image description, containing 31,000 images collected from Flickr alongside annotatations. Obtained from Kaggle.
Updated 2 weeks ago
URL and caption metadata for the LAION-400M dataset - 400M English (image, text) pairs built for research purposes to enable testing model training on larger scale for broad researcher and other interested communities.
Updated 9 months ago