Import from S3
Tired of switching between multiple tools to answer simple questions about your data and code? Move your data to XetHub for instant read access and efficient development iteration.
note
While these instructions are written for S3, they are easily adaptable for other object store products such as Google Cloud and Microsoft Azure.
Create a new Xet repository
Skip this section if you already have a Xet repository that you'd like to use.
Initialize a new empty Xet repository from the XetHub UI and clone it to your local machine.
Navigate to your newly cloned Xet repository and make sure that you're on the main
branch:
cd <new-xet-repo>
git checkout main
Migrate data from S3
To bring data to your Xet repository, add it like you would any other directory or file, commit, and push the changes.
caution
If your S3 data is larger than your local disk, please contact us for help to ingest your data in batches.
Import Method 1: Git-xet
If you are accessing a public S3 bucket, or have your AWS credentials available in your local configuration, you may use git-xet to import data directly from s3 into your repository.
Inside your repository, use this command to download your s3 bucket to the directory
data/
git xet s3 import s3://<my-bucket>/<my-folder> data/
Import Method 2: AWS Command Line
Install awscli on your local machine.
Configure awscli with the appropriate AWS credentials.
Create a
data
directory in your new Xet repository and download S3 data into it.cd <new-xet-repo>
mkdir data
aws s3 cp s3://<my-bucket>/<my-folder> data --recursive
Add, Commit and Push the Data
After importing the data using git-xet
or awscli
, commit the data to the repository and push it the XetHub remote.
Stage and commit the
data
directory:git add data
git commit -m "Adding data from S3"Push the data directory to the XetHub remote.
git push
Navigate to the XetHub UI and find your Xet repository. Confirm that all your expected code and data are there, along with their full commit history.