Skip to main content

Introduction

XetHub is a collaborative development platform built for versioning ML data, models, and artifacts. Use XetHub as a GitHub replacement to track the evolution of large files and repositories, or as a versioned storage backend with flexible access patterns to work naturally within your workflow.

It's perfect for teams who want Git-backed reliability and reproducibility for every piece of their ML workflow, without the hassle of running additional commands or managing remote servers. With a per-repository limit of over 100TB and no per file or number of file limits, XetHub brings scale and speed to Git versioning.

Why XetHub?

Existing software development tools were optimized to work with small code files and perform poorly when anything over a few megabytes shows up. If you've ever tried DVC and accidentally forgotten to dvc add a file, or used Git LFS only to wait hours for each command to complete, you know that traditional software tooling doesn't scale to work with large files. If your S3 buckets are full of fragile naming conventions and accidental overwrites, you know that object stores weren't made for affordable versioning.

XetHub bridges the gap between software versioning and object store scale.

ML teams have much better things to do than to learn yet another set of versioning commands, so XetHub is fully Git-integrated: all your normal Git commands will just work with larger files.

How does it work?

DVC and Git LFS replace large files with pointers to files stored on remotes, but do nothing to optimize the storage of the files themselves. XetHub uses pointers as well, but also invisibly chunks the files into blocks for more efficient storage and transfer with no hit to developer experience. Our compute backend also enables rich views and context on top of large files that no other versioning system can support.

XetHub is free for public use and private repositories of 20GB and less. Install our Git extension now to get started with your first Xet repository.