Mechanics of Docker Pull

Recently I’ve been in the process of building an offline repository of software for a project at work. The idea is that we’ll be able to completely install all required software completely disconnected from the Internet. This has required me to do a lot of work understanding a variety of software sources. I plan on writing additional articles about various software sources, but today we’re going to be talking about Docker containers.

If you’re not familiar with the concept of “containers”, Docker describes them as “a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries, and settings.” You’re likely familiar with the concept of virtual machines, where the system runs on top of another but has its own independent operating system and resources. Likewise, a container runs on top of another system, but instead of having its own independent resources, it utilizes the host system’s kernel and some other resources. That said though, the software within the container is able to run independently of any other software on either the host system or other containers.

A container image is made up of one or more layer files, with each layer building on top of the previous ones. In order to use a container, you have to have all of the parts of it to include the layers and the manifest. Normally a tool like the Docker client is used to pull all of the parts down, but it does it in a way that doesn’t allow for ready archiving of everything. I came across a script that did a lot of what I needed it to do, but not everything which got my going about trying to reverse engineer what was actually occurring on the backend and marrying it up with the script so I could modify it. Unfortunately, everything that occurs happens within an SSL connection making it a lot hard to inspect. I ended up setting up a proxy so I could do a break and inspect on the traffic (that’s another article). For this example, I was working with the Sealed Secrets Controller hosted on quay.io. Normally to get it, I would just issue the command docker pull quay.io/bitnami/sealed-secrets-controller:v0.17.2. So let’s see what happens.

Anatomy of a Pull

The first thing that happens is there is a HTTP GET request to quay.io/v2/. As you can see, we’re met with a status code of 401 UNATHORIZED. Even though this is a public repository and a username/password isn’t required to download it, the registry still requires authentication of the session to get additional information. The final line of the reply (line 14 below) is the www-authenticate header. The Bearer Realm tells us the URL of the site that we have to authenticate against, and the service field tells us the name of the service that can be authenticated there.

GET /v2/ HTTP/1.1
Host: quay.io
User-Agent: docker/20.10.7 go/go1.13.8 git-commit/20.10.7-0ubuntu5~18.04.3 kernel/5.4.0-92-generic os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.7 \(linux\))
Accept-Encoding: gzip
Connection: close

HTTP/1.1 401 UNAUTHORIZED
Server: nginx/1.12.1
Date: Wed, 12 Jan 2022 21:30:53 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 4
Connection: close
Docker-Distribution-API-Version: registry/2.0
WWW-Authenticate: Bearer realm="https://quay.io/v2/auth",service="quay.io"

The next thing that happens is another GET request, but this time to quay.io/v2/auth?scope=repository:bitnami/sealed-secrets-controller:pull&service=quay.io. You’ll notice that this is the same URL that we received in the first packet. The request specifies that we want to go to the repo bitnami/sealed-secrets-controller and that we want to pull the image. This time, we get a status code of 200 and have a json blob returned to us. In the token field is the authentication token that the repository has issued to us for this action. We’ll use that token throughout the rest of the process.

GET /v2/auth?scope=repository:bitnami/sealed-secrets-controller:pull&service=quay.io HTTP/1.1
Host: quay.io
User-Agent: docker/20.10.7 go/go1.13.8 git-commit/20.10.7-0ubuntu5~18.04.3 kernel/5.4.0-92-generic os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.7 \(linux\))
Accept-Encoding: gzip
Connection: close

HTTP/1.1 200 OK
Server: nginx/1.12.1
Date: Wed, 12 Jan 2022 21:30:53 GMT
Content-Type: application/json
Content-Length: 909
Connection: close
Cache-Control: no-cache, no-store, must-revalidate
X-Frame-Options: DENY
Strict-Transport-Security: max-age=63072000; preload

{"token":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6ImM1ZjNiMTU0ZGIyM2U0MTE4MjdlODNiZmU4NDQ4ZWUxNzdlZDRiZTIxOGNkYmU2MjMxZTM1M2VhMTc0NjlmYmIifQ.eyJhY2Nlc3MiOlt7InR5cGUiOiJyZXBvc2l0b3J5IiwibmFtZSI6ImJpdG5hbWkvc2VhbGVkLXNlY3JldHMtY29udHJvbGxlciIsImFjdGlvbnMiOlsicHVsbCJdfV0sImNvbnRleHQiOnsiY29tLmFwb3N0aWxsZS5yb290cyI6eyJiaXRuYW1pL3NlYWxlZC1zZWNyZXRzLWNvbnRyb2xsZXIiOiIkZGlzYWJsZWQifSwiY29tLmFwb3N0aWxsZS5yb290IjoiJGRpc2FibGVkIn0sImF1ZCI6InF1YXkuaW8iLCJleHAiOjE2NDIwMjY2NTMsImlzcyI6InF1YXkiLCJpYXQiOjE2NDIwMjMwNTMsIm5iZiI6MTY0MjAyMzA1Mywic3ViIjoiKGFub255bW91cykifQ.U8iCnAeDTGBYQnWV10FyyPDrvejj71k-WiN3x7AlRyYJ55_SgUm4a8wU1754UQpE-qQsd5ZYpxrTwEy8_mWwSr3aRNxzMrIdvkbP8_sr9I9B89O_gM_ZYMFD_CNFzDeB79Tvzty0scsMTzcArbKcozXemPgg1ZNIxIiCqCZtUPEosN_j0ushafUmQDqEqoHdq9diZTFHHH4RigeLiMF3K2GjdvY-3YVJGH2tL0GhWSbMh_StVvYPtyjtTNpnxdDb6uW9QRhUFGg8KyepJrzpgu09S4c0wLD8Z6Z6a6NG22mdClQIr9k7VTvzyOuiLyAFI4dzNHZ2c9zLJY8jcaFPEA"}

The next thing that occurs is we issue a HEAD command to quay.io/v2/bitnami/sealed-secrets-controller/manifests/v0.17.1. If you’re not familiar, the HEAD command tells the webserver to simply return the header and not any content. As you may notice, we have two new items attached to the HEAD request. The first is a list of accept formats. These are manifest formats that we’re capable of handling. You can find more information about these various specifications by looking at https://docs.docker.com/registry/spec/manifest-v2-2/. The second addition is the Authorization field that contains the token that we received in the previous API call. Again, we get a 200 status code but this time we receive the docker-content-digest field which contains the SHA256 hash of the manifest file that we need to download.

HEAD /v2/bitnami/sealed-secrets-controller/manifests/v0.17.1 HTTP/1.1
Host: quay.io
User-Agent: docker/20.10.7 go/go1.13.8 git-commit/20.10.7-0ubuntu5~18.04.3 kernel/5.4.0-92-generic os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.7 \(linux\))
Accept: application/vnd.docker.distribution.manifest.v2+json
Accept: application/vnd.docker.distribution.manifest.v1+prettyjws
Accept: application/json
Accept: application/vnd.docker.distribution.manifest.list.v2+json
Accept: application/vnd.oci.image.index.v1+json
Accept: application/vnd.oci.image.manifest.v1+json
Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6ImM1ZjNiMTU0ZGIyM2U0MTE4MjdlODNiZmU4NDQ4ZWUxNzdlZDRiZTIxOGNkYmU2MjMxZTM1M2VhMTc0NjlmYmIifQ.eyJhY2Nlc3MiOlt7InR5cGUiOiJyZXBvc2l0b3J5IiwibmFtZSI6ImJpdG5hbWkvc2VhbGVkLXNlY3JldHMtY29udHJvbGxlciIsImFjdGlvbnMiOlsicHVsbCJdfV0sImNvbnRleHQiOnsiY29tLmFwb3N0aWxsZS5yb290cyI6eyJiaXRuYW1pL3NlYWxlZC1zZWNyZXRzLWNvbnRyb2xsZXIiOiIkZGlzYWJsZWQifSwiY29tLmFwb3N0aWxsZS5yb290IjoiJGRpc2FibGVkIn0sImF1ZCI6InF1YXkuaW8iLCJleHAiOjE2NDIwMjY2NTMsImlzcyI6InF1YXkiLCJpYXQiOjE2NDIwMjMwNTMsIm5iZiI6MTY0MjAyMzA1Mywic3ViIjoiKGFub255bW91cykifQ.U8iCnAeDTGBYQnWV10FyyPDrvejj71k-WiN3x7AlRyYJ55_SgUm4a8wU1754UQpE-qQsd5ZYpxrTwEy8_mWwSr3aRNxzMrIdvkbP8_sr9I9B89O_gM_ZYMFD_CNFzDeB79Tvzty0scsMTzcArbKcozXemPgg1ZNIxIiCqCZtUPEosN_j0ushafUmQDqEqoHdq9diZTFHHH4RigeLiMF3K2GjdvY-3YVJGH2tL0GhWSbMh_StVvYPtyjtTNpnxdDb6uW9QRhUFGg8KyepJrzpgu09S4c0wLD8Z6Z6a6NG22mdClQIr9k7VTvzyOuiLyAFI4dzNHZ2c9zLJY8jcaFPEA
Connection: close

HTTP/1.1 200 OK
Server: nginx/1.12.1
Date: Wed, 12 Jan 2022 21:30:53 GMT
Content-Type: application/vnd.docker.distribution.manifest.list.v2+json
Content-Length: 741
Connection: close
Docker-Content-Digest: sha256:6c10080d6cf09cc823b333a9085d80799c95ddbd1ccac9a5db04daa89285c374
X-Frame-Options: DENY
Strict-Transport-Security: max-age=63072000; preload

Now that we know that we have a valid authentication and the hash for the manifest file that we’re looking for, issue a new GET request for the specific hash of the manifest that we’re looking for. The information contained in the GET is the same as before but this time instead of specifying the tag in the URL, we’re specifying the hash (6c10080d6cf09cc823b333a9085d80799c95ddbd1ccac9a5db04daa89285c374) in this case (I will remove most of the header from the requesting side as it doesn’t change at this point). This time in reply, we get a json blob that includes the manifest file that we requested.

GET /v2/bitnami/sealed-secrets-controller/manifests/sha256:6c10080d6cf09cc823b333a9085d80799c95ddbd1ccac9a5db04daa89285c374 HTTP/1.1

HTTP/1.1 200 OK
Server: nginx/1.12.1
Date: Wed, 12 Jan 2022 21:30:53 GMT
Content-Type: application/vnd.docker.distribution.manifest.list.v2+json
Content-Length: 741
Connection: close
Docker-Content-Digest: sha256:6c10080d6cf09cc823b333a9085d80799c95ddbd1ccac9a5db04daa89285c374
X-Frame-Options: DENY
Strict-Transport-Security: max-age=63072000; preload

{
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "schemaVersion": 2,
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "digest": "sha256:e121ad7f4863747e902d0398c096d2bbfab89e7ff1641b6cefedf1f14ea20ea2",
         "size": 739,
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "digest": "sha256:cf3af15f6b2300684804dcc64a63157c62a0ca98521f3a303145667371da4dae",
         "size": 739,
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
}

While the details of the manifest file can be found in the link I posted above, the important part is the manifests key. You’ll notice that it contains two more manifests. Within each manifest, it specifies the platform (architecture and operating system) that that particular image of the container is designed to run on. Docker identifies what type of system you’re running and will attempt to find an image that meets those requirements. Once it finds, one, it uses the digest specified for that image to pull the complete manifest. For my test, I was running on a Centos 7 VM which would be considered Linux amd64.

Another GET is issued, this time for the manifest with the hash specified for our particular image. This manifest contains the details of that particular image, to include the information for each of the layers.

GET /v2/bitnami/sealed-secrets-controller/manifests/sha256:e121ad7f4863747e902d0398c096d2bbfab89e7ff1641b6cefedf1f14ea20ea2 HTTP/1.1

HTTP/1.1 200 OK
Server: nginx/1.12.1
Date: Wed, 12 Jan 2022 21:30:53 GMT
Content-Type: application/vnd.docker.distribution.manifest.v2+json
Content-Length: 739
Connection: close
Docker-Content-Digest: sha256:e121ad7f4863747e902d0398c096d2bbfab89e7ff1641b6cefedf1f14ea20ea2
X-Frame-Options: DENY
Strict-Transport-Security: max-age=63072000; preload

{
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "schemaVersion": 2,
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "digest": "sha256:abc40f456f761e253ac1392f87cf1cefeca55d0d9b758390e857ada8b8b03f29",
      "size": 1510
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:9ff2acc3204b4093126adab3fed72de8f7bbfe332255b199c30b8b185fcf6923",
         "size": 655317
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:5208478a2ac4a7b3655148e156e510a7ebd3e168ac71e1ebb001f44ad90d8ecc",
         "size": 16460207
      }
   ]
}

Now Docker will go through and issue a series of GET requests for each of the layer blobs, as well as the config file. Again, these files are called based on their hash.

GET /v2/bitnami/sealed-secrets-controller/blobs/sha256:5208478a2ac4a7b3655148e156e510a7ebd3e168ac71e1ebb001f44ad90d8ecc HTTP/1.1
Host: quay.io
GET /v2/bitnami/sealed-secrets-controller/blobs/sha256:9ff2acc3204b4093126adab3fed72de8f7bbfe332255b199c30b8b185fcf6923 HTTP/1.1
Host: quay.io
GET /v2/bitnami/sealed-secrets-controller/blobs/sha256:abc40f456f761e253ac1392f87cf1cefeca55d0d9b758390e857ada8b8b03f29 HTTP/1.1
Host: quay.io

On the backside, Docker moves these various parts to different areas of the file system for ingest into the local library. On the other hand, though, you can easily package all of the parts into a .tar file where it can be easily imported into the library and also saved.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>