Skip to main content

Lazy clone

Some Git concepts like cloning don't always make sense for working with large repositories, especially when you only need to interact with a small set of files. To make it easier to develop with large repositories without having to download every file, we built lazy clone.

Ideal use cases:

  • Selectively downloading files
  • Directory structure reorganization
  • Saving a default clone config for future development

Lazy cloning a repository runs a clone but keeps all Xet-managed files (non-UTF decodable files and files over 256KB) as pointer files, making the initial clone fast regardless of the overall size of the repository.

To see the command for any repository, use the Access dropdown and select Clone to see the option to select a "Lazy clone" checkbox. Click the checkbox and copy the lazy clone command.

note

Most typical Git commands (e.g.,git clone) work interchangeably with their Git-Xet counterparts (e.g.,git xet clone). Lazy clone is a Git-Xet only concept that requires using the full git xet clone --lazy command.

Basic usage

Let's try it out on a copy of the 5.5GB FIFA23 dataset, which holds large CSVs of EA Games FIFA23 player data.

  1. Go to the repository and click Duplicate in the top right corner to get your own copy of this repository to play with.

Screenshot of cursor pointing at the Duplicate button

  1. Clone your duplicated repository using the Access dropdown, clicking the lazy clone option.

    Screenshot of lazy clone select

    From your terminal, run the command and enter the directory.

    git xet clone --lazy xet://<username>/FIFA23_dataset
    cd FIFA23_dataset
  2. List the directory contents and note that all the files are either tiny pointer files or small text files under 256KB.

    ls -l

    List of files in the directory with file sizes

Selectively downloading files

We want to make a minor correction to a single file. To interact with a full file, run the materialize command to fetch it locally.

  1. Download the female teams file and see how the file size updates:

    git xet materialize female_teams.csv
    ls -l

    List of files in the directory after fetching female_teams.csv

  2. Edit female_teams.csv with your CSV editor of choice to add Estadio Nacional de Chile as Chile W's home stadium.

  3. Commit your change and push it.

    git commit -am "Add Chile's national stadium"
    git push
  4. (Optional) Dematerialize the file once you're done working with it, returning it back to a pointer to keep your local directory lean.

    git xet dematerialize female_teams.csv

Reorganizing directories

There are two files marked "legacy" in the folder, which refer to older versions of the game. Since lazy clone allows us to see the full structure of the repository, we can easily move them to a directory of their own:

mkdir old
mv *legacy* old/
git commit -am "Move legacy files into their own folder"
git push

Saving lazy configurations for the future

If you use a repository frequently, you may want to save a certain configuration of materialized and unmaterialized files. This is easy to do!

  1. Materialize some files:

    git xet materialize male_teams.csv
    git xet materialize female_teams.csv
  2. Currently materialized files are listed in an automatically generated config file, located at .git/xet/lazyconfig, and can also be listed by running this command:

    git xet lazy show
  3. To preserve the current configuration, copy the lazyconfig file into a file at the repository level and check it in:

    cp .git/xet/lazyconfig my_lazyconfig
    git add my_lazyconfig
    git commit -m "Save my awesome lazy config"
  4. On your next lazy clone, copy your saved config to the default path and apply the configuration to your repository:

    cp my_lazyconfig .git/xet/lazyconfig
    git xet lazy apply