Importing from S3
Move your large files into XetHub for efficiently versioned storage with streaming access and guaranteed reproducibility.
Import from the XetHub UI
Evaluating XetHub as a S3 replacement? Import an existing S3 bucket into XetHub as a new repository and optionally sync updates from S3 on a regular cadence for easy side-by-side comparison.
-
Click the top right
+
button in the XetHub toolbar and select the "Import from S3" option. -
Configure your import.
-
The S3 bucket section covers important information and credentials:
-
Bucket and prefix: Fill in your S3 URL and any optional prefix to import into the new repository, e.g.
s3://myawsbucket/prefix/
. -
Region: AWS region for the bucket, e.g.
us-west-2
. -
AWS Access Key: AWS access key to your bucket. The access key must have the following permissions:
s3:ListBucket
ands3:GetObject
-
AWS Secret Key: AWS secret access key associated with your access key.
-
-
(Optional) Sync: Configure background sync processes to move data from your S3 bucket to the repository at specified frequency.
-
-
Name your repository and set its visibility, then click "Create Repository". XetHub will show each sync as new commits on the repository.
Move files with awscli
-
Install awscli on your local machine.
-
Configure awscli with the appropriate AWS credentials.
-
Create a
data
directory in your new Xet repository and download S3 data into it.cd <new-xet-repo>
mkdir data
aws s3 cp s3://<my-bucket>/<my-folder> data --recursive
Add, commit, and push
After importing the data using git-xet
or awscli
, commit the data to the repository and push it the XetHub remote.
-
Stage and commit the
data
directory:git add data
git commit -m "Adding data from S3" -
Push the data directory to the XetHub remote.
git push
-
Navigate to the XetHub UI and find your Xet repository. Confirm that all your expected code and data are there, along with their full commit history.