Skip to main content

Importing from S3

Move your large files into XetHub for efficiently versioned storage with streaming access and guaranteed reproducibility.

Import from the XetHub UI

Evaluating XetHub as a S3 replacement? Import an existing S3 bucket into XetHub as a new repository and optionally sync updates from S3 on a regular cadence for easy side-by-side comparison.

  1. Click the top right + button in the XetHub toolbar and select the "Import from S3" option. Screenshot of the Import from S3 button in the XetHub UI

  2. Configure your import.

    • The S3 bucket section covers important information and credentials:

      • Bucket and prefix: Fill in your S3 URL and any optional prefix to import into the new repository, e.g. s3://myawsbucket/prefix/.

      • Region: AWS region for the bucket, e.g. us-west-2.

      • AWS Access Key: AWS access key to your bucket. The access key must have the following permissions: s3:ListBucket and s3:GetObject

      • AWS Secret Key: AWS secret access key associated with your access key.

    • (Optional) Sync: Configure background sync processes to move data from your S3 bucket to the repository at specified frequency.

  3. Name your repository and set its visibility, then click "Create Repository". XetHub will show each sync as new commits on the repository.

Move files with awscli

  1. Install awscli on your local machine.

  2. Configure awscli with the appropriate AWS credentials.

  3. Create a data directory in your new Xet repository and download S3 data into it.

    cd <new-xet-repo>
    mkdir data
    aws s3 cp s3://<my-bucket>/<my-folder> data --recursive

Add, commit, and push

After importing the data using git-xet or awscli, commit the data to the repository and push it the XetHub remote.

  1. Stage and commit the data directory:

    git add data
    git commit -m "Adding data from S3"
  2. Push the data directory to the XetHub remote.

    git push
  3. Navigate to the XetHub UI and find your Xet repository. Confirm that all your expected code and data are there, along with their full commit history.