Skip to main content

Versioning large files in GitHub

caution

The XetData GitHub App is in beta and free to use. Help us improve by filing bugs!

If your code is hosted on GitHub, we make it easy to add Xet superpowers to an existing Git repository. When you install the XetData GitHub app and give it permissions to access to a GitHub repo, the app configures your repo to support large files, managing all non-UTF decodable files and files over the size of 256KB.

All normal Git commands will continue to work with your GitHub repo while the XetData app seamlessly stores managed files, providing rich views via links in commits and pull requests.


Setup

XetData GitHub App

Get started with any existing GitHub repository:

  1. Install the Git-Xet client, returning to these instructions once Git-Xet has been installed.

  2. Go to the XetData GitHub App

    a. Click "Install"

    b. Select "Only select repositories" and select the repositories you want: Add select repositories

    c. Click "Install" once more.

  3. Make changes to your repo.

    a. Clone or run git pull for a repository that has the app installed.

    b. git add and git commit some large files, then git push your changes.

    • Wait for the git-xet filter to complete between commands
    • If you have a passphrase set, save it in your keychain to streamline prompts.
  4. Navigate to the GitHub UI and click on the latest commit or PR to see links to XetData views. Navigating to the file itself will only show a pointer file.

    • Install our browser extension to see XetHub views while browsing GitHub.
    • When sharing repos, reference this app so collaborators have the same experience.

That's it! If you navigate to the repository you specified, you should now see an initial commit from the XetData app to configure your repository. For subsequent repository edits, click "Configure" from the XetData GitHub App page and "Save" to persist your changes.

Browser Extension

We built a browser extension to enhance the experience while browsing GitHub. With this extension installed, navigate to any Xet-managed file to see a link to XetHub in file view.

GitHub File view with Extension

Install the Chrome browser extension now:

  1. Using Chrome, visit the XetData Chrome Extension Page
  2. Click Add to Chrome
  3. When prompted, click Add extension to add the extension to Chrome

Usage

Basic Git Usage

Use the Git commands you normally would with your XetData enabled repo. When large files are detected, we automatically and securely store them. Behind the scenes, we deduplicate your files at the block-level to improve file upload/download speeds. Every time you push a commit with changes to a large file from the command line, we will display a link to a rich view hosted on XetHub.

File view link

Similarly, for pull requests, we add a link to before-and-after difference view for each large file. This works for new and existing pull requests. Diff view link

Mount for quick read access

XetHub provides a read-only mount feature that streams data as you need it without the need to fully download a full repository. Mount usage is supported for all XetData-enabled GitHub repos.

  1. Copy the repo clone path from the Code dropdown in your GitHub repo UI: Clone repo for Xet mount

  2. From your teminal, run mount on the copied path:

    git xet mount <repo clone path>
  3. Use your local file browser or applications to access the mounted repository.

  4. When finished, unmount with:

    umount <repo name>

Known Limitations

The XetData GitHub App is currently in beta.

Processing Limits

Our beta system is currently limited to 20 requests per push and pull request. This means that if you create a push or pull request with more than 20 files in it, we will only process the first 20 files to add the view links. As a workaround, we suggest limiting pushes and pull requests to 20 files each to ensure that you can browse to every file you push.

Default branch protection

Our app requires access to the default branch in order to initialize repos. If your access is limited due to branch protections, add the XetData GitHub App to your list of exclusions as seen below, then re-add the repo to the app.

Adding app to branch protection exclusions Similarly, you can manually configure a repo in a branch with the following:

  1. cd to a clone of your repo
  2. Run git xet init
  3. Add, commit, and push
  4. Push your Git notes with git push origin "refs/notes/*"

Forking workflow

Forking normal Git repos works naturally. When forking a XetData enabled repo, however, you will need to add the new fork to your XetData GitHub App's selected repository list to see large file view links on commits and pull requests.

Large file views

The XetHub viewer currently support visualizing CSV files and image files, as well as the option to download before-and-after versions of any file. Files of up to 5MB can be loaded in the UI; we recommend downloading or mounting larger files for local exploration.

Uninstalling the XetData GitHub app

There are no automatic paths for uninstalling the XetData GitHub app. Follow the manual process below.

  1. Edit .gitattributes, removing the following code:
    # XET LOCK
    * filter=xet diff=xet merge=xet -text
    *.gitattributes filter=
    *.xet/** filter=
  2. Convert your XetData managed files back to raw data with the following command:
    git add --renormalize .
  3. Commit and push From that point on, XetData will no longer manage or store any newly added files.

XetHub compatibility

We currently do not support advanced PyXet or Xet CLI access patterns for GitHub repositories. Other XetHub-specific features that are not supported include Vega visualizations and Capsules.