README.md
Quick Ubuntu Docker Image Analysis
This xet repository does a quick analysis of Ubuntu Docker images.
Results: 9.6GiB -> 5.2GiB (~2x smaller, 46% deduplication)
Simply adding Docker images to a Xet repo deduplicates the Docker images by 46% over the size reported by Docker. The methodology was to download all Ubuntu Docker Images to a local directory and then add those images to Xet repository.
Setup
# daemon.json data-root configured to ~/docker-images/ dir
docker image pull --all-tags ubuntu
docker image ls ubuntu | tee data/ubuntu-tags.txt
Docker Reported Total Size - 9.6 GiB
# install pandas
python3 code/process.py
python3 code/size.py
> python3 code/size.py
Total size: 9773.3MB.
Added to Xet repository (this repo)
# git xet clone https://xethub.com/ylow/ubuntu-images-analysis.git ubuntu-image-analysis
cd ubuntu-image-analysis
mkdir images
cd images
cp -R ~/docker-images/overlay2/* .
chown -R rajat:users *
# create tar files for each hash dir
for dir in *; do
test -d "$dir" && test ! -L "$dir" || continue
tar_file_name="$dir".tar
test -f "$tar_file_name" || tar -cf "$tar_file_name" "$dir"
done
cd ..
# add to repo
git add images
git commit -m'add a bunch of images'
git push
File List | Total items: 5 | ||
---|---|---|---|
Name | Last Commit | Size | Last Modified |
code | |||
data | |||
images | |||
.gitattributes | |||
README.md |