gpu-feature-discovery
NVIDIA GPU Feature Discovery for Kubernetes is a software that allows you to automatically generate label depending on the GPU available on the node. It uses the Node Feature Discoveryfrom Kubernetes to label nodes.
This tool is in beta version, we may break the API. However we will setup a deprecation policy.
The list of prerequisites for running the NVIDIA GPU Feature Discovery is described below:
nvidia-docker version > 2.0 (see how to installand it's prerequisites)
docker configured with nvidia as the default runtime.
Kubernetes version >= 1.10
NVIDIA device plugin for Kubernetes (see how to setup)
NFD deployed on each node you want to label with the local source configured (see how to setup)
Available options:
gpu-feature-discovery: Usage: gpu-feature-discovery [--oneshot | --sleep-interval=<seconds>] [--output-file=<file> | -o <file>] gpu-feature-discovery -h | --help gpu-feature-discovery --version Options: -h --help Show this help message and exit --version Display version and exit --oneshot Label once and exit --sleep-interval=<seconds> Time to sleep between labeling [Default: 60s] -o <file> --output-file=<file> Path to output file [Default: /etc/kubernetes/node-feature-discovery/features.d/gfd]
You can also use environment variables:
Env Variable | Option | Example |
---|---|---|
GFD_ONESHOT | --oneshot | TRUE |
GFD_OUTPUT_FILE | --output-file | output |
GFD_SLEEP_INTERVAL | --sleep-interval | 10s |
Environment variables override the command line options if they conflict.
The first step is to make sure the Node Feature Discoveryis running on every node you want to label. NVIDIA GPU Feature Discovery use
the local
source so be sure to mount volumes. Seehttps://github.com/kubernetes-sigs/node-feature-discovery for more details.
You also need to configure the Node Feature Discovery
to only expose vendor
IDs in the PCI source. To do so, please refer to the Node Feature Discovery
documentation.
Be sure that nvidia-docker2 is
installed on your GPU nodes and Docker default runtime is set to nvidia
. Seehttps://github.com/NVIDIA/nvidia-docker/wiki/Advanced-topics#default-runtime.
The next step is to run NVIDIA GPU Feature Discovery on each node as a Deamonset or as a Job.
$ kubectl apply -f gpu-feature-discovery-daemonset.yaml
The GPU Feature Discovery should be running on each nodes and generating labels for the Node Feature Discovery.
You must change the NODE_NAME
value in the template to match the name of the
node you want to label:
$ export NODE_NAME=<your-node-name>$ sed "s/NODE_NAME/${NODE_NAME}/" gpu-feature-discovery-job.yaml.template > gpu-feature-discovery-job.yaml $ kubectl apply -f gpu-feature-discovery-job.yaml
The GPU Feature Discovery should be running on the node and generating labels for the Node Feature Discovery.
This is the list of the labels generated by NVIDIA GPU Feature Discovery and their meaning:
Label Name | Value Type | Meaning | Example |
---|---|---|---|
nvidia.com/cuda.runtime.major | Integer | Major of the version of CUDA | 10 |
nvidia.com/cuda.runtime.minor | Integer | Minor of the version of CUDA | 1 |
nvidia.com/cuda.driver.major | Integer | Major of the version of NVIDIA driver | 418 |
nvidia.com/cuda.driver.minor | Integer | Minor of the version of NVIDIA driver | 30 |
nvidia.com/cuda.driver.rev | Integer | Revision of the version of NVIDIA driver | 40 |
nvidia.com/gpu.family | String | Architecture family of the GPU | kepler |
nvidia.com/gpu.machine | String | Machine type | DGX-1 |
nvidia.com/gpu.product | String | Model of the GPU | GeForce-GT-710 |
nvidia.com/gpu.memory | Integer | Memory of the GPU in Mb | 2048 |
nvidia.com/gpu.compute.major | Integer | Major of the compute capabilities | 3 |
nvidia.com/gpu.compute.minor | Integer | Minor of the compute capabilities | 3 |
nvidia.com/gfd.timestamp | Integer | Timestamp of the generated labels | 1555019244 |
Download the source code:
git clone https://github.com/NVIDIA/gpu-feature-discovery
Build the docker image:
export GFD_VERSION=$(git describe --tags --dirty --always) docker build . --build-arg GFD_VERSION=$GFD_VERSION -t gpu-feature-discovery:${GFD_VERSION}
Run it:
mkdir -p output-dir docker run -v ${PWD}/output-dir:/etc/kubernetes/node-feature-discovery/features.d gpu-feature-discovery:${GFD_VERSION}
You should have set the default runtime of Docker to nvidia
on your host or
you can also use the --runtime=nvidia
option:
docker run --runtime=nvidia gpu-feature-discovery:${GFD_VERSION}
Download the source code:
git clone https://github.com/NVIDIA/gpu-feature-discovery
Get dependies:
dep ensure
Build it:
export GFD_VERSION=$(git describe --tags --dirty --always) go build -ldflags "-X main.Version=${GFD_VERSION}"
You can also use the Dockerfile.devel:
docker build . -f Dockerfile.devel -t gfd-devel docker run -it gfd-devel go build -ldflags "-X main.Version=devel"
上一篇:ansible-role-nvidia-docker
下一篇:cbootimage
还没有评论,说两句吧!
热门资源
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
shih-styletransfer
shih-styletransfer Code from Style Transfer ...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com