Trusted by ML teams across industries
So you have an ML project.
Your code is versioned, your data lives in a warehouse, and your compute generates artifacts that are saved to a bucket. No problem!
Multiple teams and projects start referencing and modifying data. More teams leads to more files, more tools, and more security considerations. Understanding provenance is no longer a given, and tracking changes across data sources and tools is a nightmare.
XetHub brings software development best practices to ML by creating consolidated, versioned project views across all your tools — no workflow changes needed. Connect your sources and let XetHub provide fast access to assets across your stack while guaranteeing full reproducibility and lineage.
Your team's assets are spread across Git, object stores, and data lakes, while dependencies tracked in yet another tool.
You review models, datasets, and notebooks with collaborators over Slack and email because there’s no easier way.
Your team’s datasets and models are constantly growing in size, leading to increasingly longer transfer and training times.
Flexible features to fit your team’s needs
Instant access
Stream files and mount repos without waiting for downloads.
Diff tracking
Easily see how your work has evolved over time.
APIs
Programmatically interact with files for easy workflow access.
Git-integrated
Use the Git commands you know to manage files of any size.
Apps
Deploy Streamlit and Gradio apps for interactive exploration.
Actions
Automate your workflows with triggers and schedules.
Deduplication
Save on storage and transfer with automatic block-level dedupe.
Issues
Track concerns and review changes with issue and pull requests.
Daniel Maturana
Co-founder and Chief ML Scientist
40%
reduction in repository size and transfer time
4
data silos eliminated by switching to XetHub
51%
cost savings over using EBS, Git LFS, and DVC