Versioning large files in GitHub
The XetData GitHub App is currently in beta and free to use. Help us improve by filing bugs!
If your code is happily hosted on GitHub, we make it easy to add Xet superpowers to an existing Git repository. When you install the XetData GitHub app and give it permissions to access to a GitHub repo, the app configures your repo to support large files, managing all non-UTF decodable files and files over the size of 256KB.
All normal Git commands will continue to work with your GitHub repo, while the app stores managed files while providing rich views via links in commits and pull requests.
Setup
XetData GitHub App
Get started with any existing GitHub repository:
-
Install the Git-Xet client
-
Go to the XetData GitHub App
a. Click "Install"
b. Select "Only select repositories" and select the repositories you want:
c. Click "Install" once more.
-
Make changes to your repo.
a. Clone or run
git pull
for a repository that has the app installed.b.
git add
andgit commit
some large files, thengit push
your changes.- Wait for the
git-xet
filter to complete between commands - If you have a passphrase set, save it in your keychain to streamline prompts.
- Wait for the
-
Navigate to the GitHub UI and click on the latest commit or PR to see links to XetData views. Navigating to the file itself will only show a pointer file.
- Install our browser extension to see XetHub views while browsing GitHub.
- When sharing repos, reference this app so collaborators have the same experience.
That's it! If you navigate to the repository you specified, you should now see an initial commit from the XetData app to configure your repository. For subsequent repository edits, click "Configure" from the XetData GitHub App page and "Save" to persist your changes.
Browser Extension
We built a browser extension to enhance the experience while browsing GitHub. With this extension installed, navigate to any Xet-managed file to see a link to XetHub in file view.
- Chrome
- Firefox
Install the Chrome browser extension now:
- Using Chrome, visit the XetData Chrome Extension Page
- Click
Add to Chrome
- When prompted, click
Add extension
to add the extension to Chrome
Install the Firefox browser extension now:
- Using Firefox, visit the XetData Firefox Extension Page
- Click
Add to Firefox
- When prompted, click
Add
to add the extension to Firefox
Usage
Basic Git Usage
Use the Git commands you normally would with your XetData enabled repo. When large files are detected, we automatically and securely store them. Behind the scenes, we deduplicate your files at the block-level to improve file upload/download speeds. Every time you push a commit with changes to a large file from the command line, we will display a link to a rich view hosted on XetHub.
Similarly, for pull requests, we add a link to before-and-after difference view for each large file. This works for new and existing pull requests.
Mount for quick read access
XetHub provides a read-only mount feature that streams data as you need it without the need to fully download a full repository. Mount usage is supported for all XetData-enabled GitHub repos.
-
Copy the repo clone path from the Code dropdown in your GitHub repo UI:
-
From your teminal, run mount on the copied path:
git xet mount <repo clone path>
-
Use your local file browser or applications to access the mounted repository.
-
When finished, unmount with:
umount <repo name>
Known Limitations
The XetData GitHub App is currently in beta.
Processing Limits
Our beta system is currently limited to 20 requests per push and pull request. This means that if you create a push or pull request with more than 20 files in it, we will only process the first 20 files to add the view links. As a workaround, we suggest limiting pushes and pull requests to 20 files each to ensure that you can browse to every file you push.
Default branch protection
Our app requires access to the default branch in order to initialize repos. If your access is limited due to branch protections, add the XetData GitHub App to your list of exclusions as seen below, then re-add the repo to the app.
Similarly, you can manually configure a repo in a branch with the following:
cd
to a clone of your repo- Run
git xet init
- Add, commit, and push
- Push your Git notes with
git push origin "refs/notes/*"
Forking workflow
Forking normal Git repos works naturally. When forking a XetData enabled repo, however, you will need to add the new fork to your XetData GitHub App's selected repository list to see large file view links on commits and pull requests.
Large file views
The XetHub viewer currently support visualizing CSV files and image files, as well as the option to download before-and-after versions of any file. Files of up to 5MB can be loaded in the UI; we recommend downloading or mounting larger files for local exploration.
Uninstalling the XetData GitHub app
There are no automatic paths for uninstalling the XetData GitHub app. Follow the manual process below.
- Edit
.gitattributes
, removing the following code:# XET LOCK
* filter=xet diff=xet merge=xet -text
*.gitattributes filter=
*.xet/** filter= - Convert your XetData managed files back to raw data with the following command:
git add --renormalize .
- Commit and push From that point on, XetData will no longer manage or store any newly added files.
XetHub compatibility
We currently do not support advanced PyXet or Xet CLI access patterns for GitHub repositories. Other XetHub-specific features that are not supported include Vega visualizations and Capsules.