Sam Horradarn
61e1e57908 3 months ago 4 commits

README.md

Mount and explore the Flickr30k image dataset

Remember the last time you downloaded a 4GB file? How long did that take? Try mounting this repository instead for instant access.

Before beginning, make sure that git-xet is installed and set up.

Mount

Our read-only mount feature is the fastest way to access a repository. Click the mount button, click to copy the mount command, and run it from your terminal.

Screenshot of selecting the mount button

Running the command will result in output that looks like this:

Mounting to "/mydir/Flickr30k"
Cloning into temporary directory "/var/folders/jy/1px5ktln3nd4sftv1bjxx2vc0000gn/T/.tmpsIykqt"
Mounting as a background task...
Setting up mount point...
4.14 GiB in 31879 objects mounted
Mount at "/mydir/Flickr30k" successful. Unmount with 'umount "/mydir/Flickr30k"'
Mount complete in 4.076202s

Explore

Our Flickr30k dataset includes a 13.9MB results.csv file that lists 5 annotations per image, as well as around 32k images organized by the first two numbers of each file name. Phew.

Use your favorite local file browser to navigate to 13/131090759.jpg, which shows how we all feel when we have to lug our big files around to do our jobs.

Summary

Want to easily browse or use a big repository that doesn't fit on your desktop? Mount is the tool for you. You can work directly with any repository (read-only) from any local tool, whether you're using local notebooks, code, or your Finder window.

Need edit access? Clone the full repository with git xet clone, or use the --no-smudge option to only download specific files.

Try the extra credit section below, or return to the Quick Start to push changes to your guided tutorial repository!

Extra credit

  • Check out our Laion400M repository to see how Xet mount can be used with DuckDB, and Pandas for quick exploration of a 54GB repository.
  • This dataset is not perfect. Check out the pull requests on this repository to see what it looks like to collaboratively review dataset updates using XetHub.
File List Total items: 4
Name Last Commit Size Last Modified
flickr30k_images Adding data 10 months ago
.gitattributes Initial commit 79 B 10 months ago
README.md Adding data 2.3 KiB 10 months ago
results.csv add test 13 MiB 3 months ago

Repository Size

Loading repo size...

Commits 4 commits

File Types