NGC Replicator

Clones nvcr.io using the either DGX (compute.nvidia.com) or NGC (ngc.nvidia.com) API keys.

The replicator will make an offline clone of the NGC/DGX container registry. In its current form, the replicator will download every CUDA container image as well as each Deep Learning framework image in the NVIDIA project.

Tarfiles will be saved in /output inside the container, so be sure to volume mount that directory. In the following example, we will collect our images in/tmp on the host.

Use --min-version to limit the number of versions to download.  In the example below, we will only clone versions 17.10 and later DL framework images.

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/output 
    deepops/replicator --project=nvidia --min-version=17.12 

You can also filter on specific images.  If you only wanted Tensorflow, PyTorch and TensorRT, you would simply add --image for each option, e.g.

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/output 
    deepops/replicator --project=nvidia --min-version=17.12 
                       --image=tensorflow --image=pytorch --image=tensorrt 

Note: the --dry-run option lets you see what will happen without committing to a lengthy download.

Note: a state.yml file will be created the output directory.  This saved state will be used to avoid pulling images that were previously pulled.  If you wish to repull and save an image, just delete the entry in state.yml corresponding to the image_name and tag you wish to refresh.

Kubernetes Deployment

If you don't already have a deepops namespace, create one now.

kubectl create namespace deepops

Next, create a secret with your NGC API Key

kubectl -n deepops create secret generic  ngc-secret

Next, create a persistent volume claim that will life outside the lifecycle of the CronJob. If you are using DeepOps you can use a Rook/Ceph PVC similar to:

apiVersion: v1
kind: PersistentVolumeClaim
  name: ngc-replicator-pvc
  namespace: deepops
    app: ngc-replicator
  storageClassName: rook-raid0-retain  # <== Replace with your StorageClass
    - ReadWriteOnce
      storage: 32Mi

Finally, create a CronJob that executes the replicator on a schedule.  This eample run the replicator every hour.  Note: This example usedRook block storage to provide a persistent volume to hold thestate.yml between executions.  This ensures you will only download new container images. For more details, see our DeepOps project.

apiVersion: v1
kind: ConfigMap
  name: replicator-config
  namespace: deepops
  ngc-update.sh: |
      --min-version=$(date +"%y.%m" -d "1 month ago")     
      --image=tensorflow --image=pytorch --image=tensorrt 
      --registry-url=registry.local  # <== Replace with your local repo
apiVersion: batch/v1beta1
kind: CronJob
  name: ngc-replicator
  namespace: deepops
    app: ngc-replicator
  schedule: "0 4 * * *"
            node-role.kubernetes.io/master: ""
            - name: replicator
              image: deepops/replicator
              imagePullPolicy: Always
              command: [ "/bin/sh", "-c", "/ngc-update/ngc-update.sh" ]
              - name: NGC_REPLICATOR_API_KEY
                    name: ngc-secret
                    key: apikey
              - name: registry-config
                mountPath: /ngc-update
              - name: docker-socket
                mountPath: /var/run/docker.sock
              - name: ngc-replicator-storage
                mountPath: /output
            - name: registry-config
                name: replicator-config
                defaultMode: 0777
            - name: docker-socket
                path: /var/run/docker.sock
                type: File
            - name: ngc-replicator-storage
                claimName: ngc-replicator-pvc
          restartPolicy: Never

Developer Quickstart

make dev


  • save markdown readmes for each image.  these are not version controlled

  • test local registry push service.  coded, beta testing

  • add templater to workflow





