Skip to main content

Pythonic access through PyXet

Python is the language of ML and XetHub is here for it. Import PyXet, our fsspec-compatible interface, to access files directly from your Python interpreter with libraries such as Pandas, Polars and DuckDB, and take advantage of transactions to write back to XetHub repositories with commit messages.


Install PyXet

  1. Using a supported version of Python (3.7+), set up a virtualenv:

    python -m venv .venv
    source .venv/bin/activate
  2. Install PyXet with:

    pip install pyxet
  3. Create your personal access token (PAT) by clicking "Create Token" from your Settings page.

  4. Authenticate by copying the login string, starting with xet login, and running it from your terminal:

    xet login -u <username> -e <email> -p <PAT>

Importing and authenticating from Python

If in a notebook or other non-persistent environment, you may need to authenticate for access to your repositories.

import pyxet
pyxet.login(<username>, <PAT>, <email>)

Common operations

Read a CSV file with Pandas

import pyxet            # make xet:// protocol available
import pandas as pd # assumes pip install pandas has been run

df = pd.read_csv('xet://XetHub/titanic/main/titanic.csv')

Full repository downloads

import pyxet

fs = pyxet.XetFS()
fs.get('xet://user/repo/branch', <dest>, recursive=True)

File download

import pyxet

fs = pyxet.XetFS()
fs.get('xet://user/repo/branch/path/to/file', <dest>, recursive=False)

File upload

import pyxet

local_file = 'path/to/file'

fs = pyxet.XetFS()
with fs.transaction as tr:
tr.set_commit_message("Uploading a local file")
with open(local_file, 'r') as file:
contents = file.read()
fs.open("<user_name>/<repo_name>/main/foo", 'w').write(contents)

Reads and writes

import pyxet

fs = pyxet.XetFS()
files = fs.ls('xet://XetHub/Flickr30k/main')

f = fs.open('xet://XetHub/Flickr30k/main/results.csv')
contents = f.read()

with fs.transaction as tr:
tr.set_commit_message("Writing things to a file")
fs.open("<user_name>/<repo_name>/main/foo", 'w').write("Hello world!")

Time travel

Like Xet CLI, PyXet also supports time travel using Git revision selection syntax for files and repositories.

import pyxet

fs = pyxet.XetFS()
fs.get('xet://user/repo/branch@{2.days.ago}/path/to/file', <dest>, recursive=False)