Lazy clone
Some Git concepts like cloning don't always make sense for working with large repositories, especially when you only need to interact with a small set of files. To make it easier to develop with large repositories without having to download every file, we built lazy clone.
Ideal use cases:
- Selectively downloading files
- Directory structure reorganization
- Saving a default clone config for future development
Lazy cloning a repository runs a clone but keeps all Xet-managed files (non-UTF decodable files and files over 256KB) as pointer files, making the initial clone fast regardless of the overall size of the repository.
To see the command for any repository, use the Access dropdown and select Clone to see the option to select a "Lazy clone" checkbox. Click the checkbox and copy the lazy clone command.
Most typical Git commands (e.g.,git clone
) work interchangeably with their Git-Xet counterparts (e.g.,git xet clone
). Lazy clone is a Git-Xet only concept that requires using the full git xet clone --lazy
command.
Basic usage
Let's try it out on a copy of the 5.5GB FIFA23 dataset, which holds large CSVs of EA Games FIFA23 player data.
- Go to the repository and click Duplicate in the top right corner to get your own copy of this repository to play with.
-
Clone your duplicated repository using the Access dropdown, clicking the lazy clone option.
From your terminal, run the command and enter the directory.
git xet clone --lazy xet://<username>/FIFA23_dataset
cd FIFA23_dataset -
List the directory contents and note that all the files are either tiny pointer files or small text files under 256KB.
ls -l
Selectively downloading files
We want to make a minor correction to a single file. To interact with a full file, run the materialize
command to fetch it locally.
-
Download the female teams file and see how the file size updates:
git xet materialize female_teams.csv
ls -l -
Edit
female_teams.csv
with your CSV editor of choice to addEstadio Nacional de Chile
as Chile W's home stadium. -
Commit your change and push it.
git commit -am "Add Chile's national stadium"
git push -
(Optional) Dematerialize the file once you're done working with it, returning it back to a pointer to keep your local directory lean.
git xet dematerialize female_teams.csv
Reorganizing directories
There are two files marked "legacy" in the folder, which refer to older versions of the game. Since lazy clone allows us to see the full structure of the repository, we can easily move them to a directory of their own:
mkdir old
mv *legacy* old/
git commit -am "Move legacy files into their own folder"
git push
Saving lazy configurations for the future
If you use a repository frequently, you may want to save a certain configuration of materialized and unmaterialized files. This is easy to do!
-
Materialize some files:
git xet materialize male_teams.csv
git xet materialize female_teams.csv -
Currently materialized files are listed in an automatically generated config file, located at
.git/xet/lazyconfig
, and can also be listed by running this command:git xet lazy show
-
To preserve the current configuration, copy the lazyconfig file into a file at the repository level and check it in:
cp .git/xet/lazyconfig my_lazyconfig
git add my_lazyconfig
git commit -m "Save my awesome lazy config" -
On your next lazy clone, copy your saved config to the default path and apply the configuration to your repository:
cp my_lazyconfig .git/xet/lazyconfig
git xet lazy apply