Skip to main content

Pythonic access through PyXet

Python is the language of ML and XetHub is here for it. Import PyXet, our fsspec-compatible interface, to access files directly from your Python interpreter with libraries such as Pandas, Polars and DuckDB, and take advantage of transactions to write back to XetHub repositories with commit messages.


Install PyXet

  1. Using a supported version of Python (3.7+), set up a virtualenv:

    python -m venv .venv
    source .venv/bin/activate
  2. Install PyXet with:

    pip install pyxet
  3. Create your personal access token (PAT) by clicking "Create Token" from your Settings page.

  4. Authenticate by copying the login string, starting with xet login, and running it from your terminal:

    xet login -u <username> -e <email> -p <PAT>

Importing and authenticating from Python

If in a notebook or other non-persistent environment, you may need to authenticate for access to your repositories.

import pyxet
pyxet.login(<username>, <PAT>, <email>)

Common operations

Read a CSV file with Pandas

import pyxet            # make xet:// protocol available
import pandas as pd # assumes pip install pandas has been run

df = pd.read_csv('xet://XetHub/titanic/main/titanic.csv')

Full repository downloads

import pyxet

fs = pyxet.XetFS()
fs.get('xet://user/repo/branch', <dest>, recursive=True)

Single file downloads

import pyxet

fs = pyxet.XetFS()
fs.get('xet://user/repo/branch/path/to/file', <dest>, recursive=False)

Reads and writes

import pyxet

fs = pyxet.XetFS()
files = fs.ls('xet://XetHub/Flickr30k/main')

f = fs.open('xet://XetHub/Flickr30k/main/results.csv')
contents = f.read()

with fs.transaction as tr:
tr.set_commit_message("Writing things")
fs.open("<user_name>/<repo_name>/main/foo", 'w').write("Hello world!")

Time travel

import pyxet

fs = pyxet.XetFS()
fs.get('xet://user/repo/branch@{2.days.ago}/path/to/file', <dest>, recursive=False)