README.md
Mount and explore the Flickr30k image dataset
Remember the last time you downloaded a 4GB file? How long did that take? Try mounting this repository instead for instant access.
Before beginning, make sure that git-xet is installed and set up.
Mount
Our read-only mount feature is the fastest way to access a repository. Click the mount button, click to copy the mount command, and run it from your terminal.
Running the command will result in output that looks like this:
Mounting to "/mydir/Flickr30k"
Cloning into temporary directory "/var/folders/jy/1px5ktln3nd4sftv1bjxx2vc0000gn/T/.tmpsIykqt"
Mounting as a background task...
Setting up mount point...
4.14 GiB in 31879 objects mounted
Mount at "/mydir/Flickr30k" successful. Unmount with 'umount "/mydir/Flickr30k"'
Mount complete in 4.076202s
Explore
Our Flickr30k dataset includes a 13.9MB results.csv
file that lists 5 annotations per image, as well as around 32k images organized by the first two numbers of each file name. Phew.
Use your favorite local file browser to navigate to 13/131090759.jpg, which shows how we all feel when we have to lug our big files around to do our jobs.
Summary
Want to easily browse or use a big repository that doesn't fit on your desktop? Mount is the tool for you. You can work directly with any repository (read-only) from any local tool, whether you're using local notebooks, code, or your Finder window.
Need edit access? Clone the full repository with git xet clone
, or use the --no-smudge
option to only download specific files.
Try the extra credit section below, or return to the Quick Start to push changes to your guided tutorial repository!
Extra credit
- Check out our Laion400M repository to see how Xet mount can be used with DuckDB, and Pandas for quick exploration of a 54GB repository.
- This dataset is not perfect. Check out the pull requests on this repository to see what it looks like to collaboratively review dataset updates using XetHub.
File List | Total items: 4 | ||
---|---|---|---|
Name | Last Commit | Size | Last Modified |
flickr30k_images | |||
.gitattributes | |||
README.md | |||
results.csv |