README.md

Quick Ubuntu Docker Image Analysis

This xet repository does a quick analysis of Ubuntu Docker images.

Results: 9.6GiB -> 5.2GiB (~2x smaller, 46% deduplication)

Simply adding Docker images to a Xet repo deduplicates the Docker images by 46% over the size reported by Docker. The methodology was to download all Ubuntu Docker Images to a local directory and then add those images to Xet repository.

Setup

# daemon.json data-root configured to ~/docker-images/ dir
docker image pull --all-tags ubuntu
docker image ls ubuntu | tee data/ubuntu-tags.txt

Docker Reported Total Size - 9.6 GiB

# install pandas 
python3 code/process.py
python3 code/size.py
> python3 code/size.py
Total size: 9773.3MB.

Added to Xet repository (this repo)

# git xet clone https://xethub.com/ylow/ubuntu-images-analysis.git ubuntu-image-analysis
cd ubuntu-image-analysis
mkdir images
cd images
cp -R ~/docker-images/overlay2/* .
chown -R rajat:users *

# create tar files for each hash dir
for dir in *; do
    test -d "$dir" && test ! -L "$dir" || continue
    tar_file_name="$dir".tar
    test -f "$tar_file_name" || tar -cf "$tar_file_name" "$dir"
done

cd ..

# add to repo
git add images
git commit -m'add a bunch of images'
git push
File List Total items: 5
Name Last Commit Size Last Modified
code added results README 8 months ago
data Added plot to README 8 months ago
images add a bunch of images 8 months ago
.gitattributes Initial commit 79 B 8 months ago
README.md Bigger plot 1.4 KiB 8 months ago

Repository Size

Loading repo size...

Commits 8 commits

Collaborators

File Types