Querying docker image sizes via the command line

TLDR;
Querying and comparing the size of different docker image tags from the command line can be as easy as follows:

docker run --rm schnatterer/docker-image-size <docker image name> [<extended grep regex on docker tag>]

# e.g.
docker run --rm schnatterer/docker-image-size adoptopenjdk '^11.0.4' \
   | grep 'amd64 linux'

# A single image size can be queried faster like so
docker run --rm --entrypoint docker-image-size-curl.sh schnatterer/docker-image-size adoptopenjdk:11.0.4_11-jre-hotspot

Unfortunately, it’s takes minutes.


When starting a new project, building a CI/CD pipeline or when deploying an app off the shelf we often have the challenge of selecting not only a suitable docker / OCI image but also which tag of the image to use.

The trouble at Docker Hub

The first stop is usually on hub.docker.com where it is confusing to select among the different versions, variants and architectures, e.g. JDK, node, ruby or .net (which does not even have tags on docker hub).

reg to the rescue

In situations such as these genuinetools/reg is very helpful to get a textual overview of all tags of a repo on the command line which can easily be grepped for what is needed. For example:

reg tags adoptopenjdk | grep 11.0.4
# We can apply more complex regex
reg tags adoptopenjdk | grep -P '^(?!.*windows)11.0.4'

# Actually we don't even need to install reg, we can just user docker for distribution
docker run r.j3ss.co/reg tags openjdk | grep -e '^11.0.4-'

# BTW I use:
alias reg='docker run r.j3ss.co/reg'
# Every now and then I update like so
docker pull r.j3ss.co/reg

Image size

One thing that is missing though, is the image size. It’s not that important but it gives a hint which tags point to the same image and also might be another fact to choose or not to choose one image over the other.

While preparing one of my docker training courses, I answered a question on stackoverflow on how Docker Hub sums up the image size. After that I found myself returning to my own answer every now and then because I wanted to reuse the script for finding out image sizes via the command line.

docker-images-size

Having done this too often, I decided to implement a script that does this more convenient: schnatterer/docker-image-size.

It turned out to become four scripts, actually:

  • Three for the different ways of querying docker manifests (reg manifest, docker manifest and via curl)
  • and another one that uses the above one and combines it with the reg tags command for comparing image sizes.

They also come neatly packed into a docker image, and can be used like so:

docker run --rm schnatterer/docker-image-size  <docker image name> [<extended grep regex on docker tag>]

# If used more often, an alias provides more convenience:
alias docker-image-sizes='docker run --rm -e DIS_IMPL schnatterer/docker-image-size'

docker-image-sizes <docker image name> [<extended grep regex on docker tag>]

Different Implementations

While implementing, I completely underestimated the peculiarities of the docker distribution protocol (standardized as OCI distribution spec), with all those different implementations in registries (Docker Hub, MCR, GCR, quay.io, etc.), multiple architectures, repo digests, manifest versions (V1 vs V2) and docker login. This lead to the different implementations providing different features:

Docker Image Size features

Retrospectively, using docker manifest seems like the most versatile solution, being compatible with most repos. Unfortunately, querying a single manifest might take more than 10 seconds.

Still, we can now easily compare different tags of docker image using the following commands, for example:


# Match all tags containing '11' (a whole lot. Will take ages!)
docker-image-sizes adoptopenjdk 11
# More accurate (and faster) output
docker-image-sizes adoptopenjdk '^11.0.4'
# Multi arg results can be filtered using grep
docker-image-sizes adoptopenjdk '^11.0.4' | grep 'amd64 linux'

The results will take some minutes (as mentioned above) and look like this:

adoptopenjdk:11.0.4_11-jdk-hotspot amd64 linux : 235 MB
adoptopenjdk:11.0.4_11-jdk-hotspot-bionic amd64 linux : 235 MB
adoptopenjdk:11.0.4_11-jdk-openj9-0.15.1 amd64 linux : 236 MB
adoptopenjdk:11.0.4_11-jdk-openj9-0.15.1-bionic amd64 linux : 236 MB
adoptopenjdk:11.0.4_11-jre-hotspot amd64 linux : 80 MB
adoptopenjdk:11.0.4_11-jre-hotspot-bionic amd64 linux : 80 MB
adoptopenjdk:11.0.4_11-jre-openj9-0.15.1 amd64 linux : 79 MB
adoptopenjdk:11.0.4_11-jre-openj9-0.15.1-bionic amd64 linux : 79 MB

Faster responses

If we don’t care about multi-arch anyway, we could use the curl implementation like so:

DIS_IMPL=curl docker-image-sizes adoptopenjdk '^11.0.4' 2>/dev/null

This results in a much faster response (about a minute), but fails for the windows variants (which we generously pipe to /dev/null).

Querying the size of a single image

If you care about size of one image only you can use the implementations directly, which for the curl implementation responds in about a second. Note that it takes only one parameter, similar to docker run:

$ docker-image-size-curl.sh adoptopenjdk:11.0.4_11-jre-hotspot
# Or using the docker image
$ docker run --rm --entrypoint docker-image-size-curl.sh schnatterer/docker-image-size adoptopenjdk:11.0.4_11-jre-hotspot
adoptopenjdk:11.0.4_11-jre-hotspot: 80 MB
# If used more often, an alias provides more convenience:
$ alias docker-image-size='docker run --rm --entrypoint docker-image-size-curl.sh schnatterer/docker-image-size'

The largest image ever

Fun fact at the end: Using docker-images-sizes, I found the largest image I ever (?):

adoptopenjdk:11.0.4_11-jdk-hotspot amd64 windows 10.0.14393.3204: 6100 MB

It still makes me smirk that a “leightweight” docker image for Windows is 67x the size of a whole Linux VM:

$ docker-image-sizes weaveworks/ignite-kernel 4.19.47
weaveworks/ignite-kernel:4.19.47: 14 MB

$ docker-image-sizes weaveworks/ignite-ubuntu 19.04-v0.5.2
weaveworks/ignite-ubuntu:19.04-v0.5.2: 77 MB

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.