Pythonic access through PyXet
Python is the language of ML and XetHub is here for it. Import PyXet, our fsspec-compatible interface, to access files directly from your Python interpreter with libraries such as Pandas, Polars and DuckDB, and take advantage of transactions to write back to XetHub repositories with commit messages.
Install PyXet
-
Using a supported version of Python (3.7+), set up a virtualenv:
python -m venv .venv
source .venv/bin/activate -
Install PyXet with:
pip install pyxet
-
Create your personal access token (PAT) by clicking "Create Token" from your Settings page.
-
Authenticate by copying the login string, starting with
xet login
, and running it from your terminal:xet login -u <username> -e <email> -p <PAT>
Importing and authenticating from Python
If in a notebook or other non-persistent environment, you may need to authenticate for access to your repositories.
import pyxet
pyxet.login(<username>, <PAT>, <email>)
Common operations
Read a CSV file with Pandas
import pyxet # make xet:// protocol available
import pandas as pd # assumes pip install pandas has been run
df = pd.read_csv('xet://XetHub/titanic/main/titanic.csv')
Full repository downloads
import pyxet
fs = pyxet.XetFS()
fs.get('xet://user/repo/branch', <dest>, recursive=True)
File download
import pyxet
fs = pyxet.XetFS()
fs.get('xet://user/repo/branch/path/to/file', <dest>, recursive=False)
File upload
import pyxet
local_file = 'path/to/file'
fs = pyxet.XetFS()
with fs.transaction as tr:
tr.set_commit_message("Uploading a local file")
with open(local_file, 'r') as file:
contents = file.read()
fs.open("<user_name>/<repo_name>/main/foo", 'w').write(contents)
Reads and writes
import pyxet
fs = pyxet.XetFS()
files = fs.ls('xet://XetHub/Flickr30k/main')
f = fs.open('xet://XetHub/Flickr30k/main/results.csv')
contents = f.read()
with fs.transaction as tr:
tr.set_commit_message("Writing things to a file")
fs.open("<user_name>/<repo_name>/main/foo", 'w').write("Hello world!")
Time travel
Like Xet CLI, PyXet also supports time travel using Git revision selection syntax for files and repositories.
import pyxet
fs = pyxet.XetFS()
fs.get('xet://user/repo/branch@{2.days.ago}/path/to/file', <dest>, recursive=False)