ngc-container-replicator
Clones nvcr.io using the either DGX (compute.nvidia.com) or NGC (ngc.nvidia.com) API keys.
The replicator will make an offline clone of the NGC/DGX container registry. In its current form, the replicator will download every CUDA container image as well as each Deep Learning framework image in the NVIDIA project.
Tarfiles will be saved in /output inside the container, so be sure to volume
mount that directory. In the following example, we will collect our images in/tmp on the host.
Use --min-version to limit the number of versions to download. In the example
below, we will only clone versions 17.10 and later DL framework images.
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/output deepops/replicator --project=nvidia --min-version=17.12 --api-key=<your-dgx-or-ngc-api-key>
You can also filter on specific images. If you only wanted Tensorflow, PyTorch
and TensorRT, you would simply add --image for each option, e.g.
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/output deepops/replicator --project=nvidia --min-version=17.12 --image=tensorflow --image=pytorch --image=tensorrt --dry-run --api-key=<your-dgx-or-ngc-api-key>
Note: the --dry-run option lets you see what will happen without committing
to a lengthy download.
Note: a state.yml file will be created the output directory. This saved state will be used to
avoid pulling images that were previously pulled. If you wish to repull and save an image, just
delete the entry in state.yml corresponding to the image_name and tag you wish to refresh.
If you don't already have a deepops namespace, create one now.
kubectl create namespace deepops
Next, create a secret with your NGC API Key
kubectl -n deepops create secret generic ngc-secret --from-literal=apikey=<your-api-key-goes-here>
Next, create a persistent volume claim that will life outside the lifecycle of the CronJob. If you are using DeepOps you can use a Rook/Ceph PVC similar to:
--- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ngc-replicator-pvc namespace: deepops labels: app: ngc-replicator spec: storageClassName: rook-raid0-retain # <== Replace with your StorageClass accessModes: - ReadWriteOnce resources: requests: storage: 32Mi
Finally, create a CronJob that executes the replicator on a schedule. This
eample run the replicator every hour. Note: This example usedRook block storage to provide a persistent volume to hold thestate.yml between executions. This ensures you will only download new
container images. For more details, see our DeepOps
project.
--- apiVersion: v1 kind: ConfigMap metadata: name: replicator-config namespace: deepops data: ngc-update.sh: | #!/bin/bash ngc_replicator --project=nvidia --min-version=$(date +"%y.%m" -d "1 month ago") --py-version=py3 --image=tensorflow --image=pytorch --image=tensorrt --no-exporter --registry-url=registry.local # <== Replace with your local repo --- apiVersion: batch/v1beta1 kind: CronJob metadata: name: ngc-replicator namespace: deepops labels: app: ngc-replicator spec: schedule: "0 4 * * *" jobTemplate: spec: template: spec: nodeSelector: node-role.kubernetes.io/master: "" containers: - name: replicator image: deepops/replicator imagePullPolicy: Always command: [ "/bin/sh", "-c", "/ngc-update/ngc-update.sh" ] env: - name: NGC_REPLICATOR_API_KEY valueFrom: secretKeyRef: name: ngc-secret key: apikey volumeMounts: - name: registry-config mountPath: /ngc-update - name: docker-socket mountPath: /var/run/docker.sock - name: ngc-replicator-storage mountPath: /output volumes: - name: registry-config configMap: name: replicator-config defaultMode: 0777 - name: docker-socket hostPath: path: /var/run/docker.sock type: File - name: ngc-replicator-storage persistentVolumeClaim: claimName: ngc-replicator-pvc restartPolicy: Never
make dev py.test
save markdown readmes for each image. these are not version controlled
test local registry push service. coded, beta testing
add templater to workflow
上一篇:kubernetes
下一篇:nvscic2c
还没有评论,说两句吧!
热门资源
DuReader_QANet_BiDAF
Machine Reading Comprehension on DuReader Usin...
ETD_cataloguing_a...
ETD catalouging project using allennlp
allennlp_extras
allennlp_extras Some utilities build on top of...
allennlp-dureader
An Apache 2.0 NLP research library, built on Py...
honk-honk-motherf...
honk-honk-motherfucker
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com