Skip to main content

Import from S3

Tired of switching between multiple tools to answer simple questions about your data and code? Move your data to XetHub for instant read access and efficient development iteration.

note

While these instructions are written for S3, they are easily adaptable for other object store products such as Google Cloud and Microsoft Azure.

Create a new Xet repository

Skip this section if you already have a Xet repository that you'd like to use.

Initialize a new empty Xet repository from the XetHub UI and clone it to your local machine.

Navigate to your newly cloned Xet repository and make sure that you're on the main branch:

cd <new-xet-repo>
git checkout main

Migrate data from S3

To bring data to your Xet repository, add it like you would any other directory or file, commit, and push the changes.

caution

If your S3 data is larger than your local disk, please contact us for help to ingest your data in batches.

Import Method 1: Git-xet

If you are accessing a public S3 bucket, or have your AWS credentials available in your local configuration, you may use git-xet to import data directly from s3 into your repository.

  1. Inside your repository, use this command to download your s3 bucket to the directory data/

    git xet s3 import s3://<my-bucket>/<my-folder> data/

Import Method 2: AWS Command Line

  1. Install awscli on your local machine.

  2. Configure awscli with the appropriate AWS credentials.

  3. Create a data directory in your new Xet repository and download S3 data into it.

    cd <new-xet-repo>
    mkdir data
    aws s3 cp s3://<my-bucket>/<my-folder> data --recursive

Add, Commit and Push the Data

After importing the data using git-xet or awscli, commit the data to the repository and push it the XetHub remote.

  1. Stage and commit the data directory:

    git add data
    git commit -m "Adding data from S3"
  2. Push the data directory to the XetHub remote.

    git push
  3. Navigate to the XetHub UI and find your Xet repository. Confirm that all your expected code and data are there, along with their full commit history.