gpu-feature-discovery

2019-12-24 |

45 |

0 |

gpu-feature-discovery

NVIDIA GPU feature discovery

NVIDIA GPU feature discovery

Overview

NVIDIA GPU Feature Discovery for Kubernetes is a software that allows you to automatically generate label depending on the GPU available on the node. It uses the Node Feature Discoveryfrom Kubernetes to label nodes.

Beta Version

This tool is in beta version, we may break the API. However we will setup a deprecation policy.

Prerequisites

The list of prerequisites for running the NVIDIA GPU Feature Discovery is described below:

nvidia-docker version > 2.0 (see how to installand it's prerequisites)
docker configured with nvidia as the default runtime.
Kubernetes version >= 1.10
NVIDIA device plugin for Kubernetes (see how to setup)
NFD deployed on each node you want to label with the local source configured (see how to setup)

Command line interface

Available options:

gpu-feature-discovery:
Usage:
  gpu-feature-discovery [--oneshot | --sleep-interval=<seconds>] [--output-file=<file> | -o <file>]
  gpu-feature-discovery -h | --help
  gpu-feature-discovery --version

Options:
  -h --help                       Show this help message and exit
  --version                       Display version and exit
  --oneshot                       Label once and exit
  --sleep-interval=<seconds>      Time to sleep between labeling [Default: 60s]
  -o <file> --output-file=<file>  Path to output file
                                  [Default: /etc/kubernetes/node-feature-discovery/features.d/gfd]

You can also use environment variables:

Env Variable	Option	Example
GFD_ONESHOT	--oneshot	TRUE
GFD_OUTPUT_FILE	--output-file	output
GFD_SLEEP_INTERVAL	--sleep-interval	10s

Environment variables override the command line options if they conflict.

Quick Start

Node Feature Discovery

The first step is to make sure the Node Feature Discoveryis running on every node you want to label. NVIDIA GPU Feature Discovery use the local source so be sure to mount volumes. Seehttps://github.com/kubernetes-sigs/node-feature-discovery for more details.

You also need to configure the Node Feature Discovery to only expose vendor IDs in the PCI source. To do so, please refer to the Node Feature Discovery documentation.

Preparing your GPU Nodes

Be sure that nvidia-docker2 is installed on your GPU nodes and Docker default runtime is set to nvidia. Seehttps://github.com/NVIDIA/nvidia-docker/wiki/Advanced-topics#default-runtime.

Deploy NVIDIA GPU Feature Discovery

The next step is to run NVIDIA GPU Feature Discovery on each node as a Deamonset or as a Job.

Deamonset

$ kubectl apply -f gpu-feature-discovery-daemonset.yaml

The GPU Feature Discovery should be running on each nodes and generating labels for the Node Feature Discovery.

Job

You must change the NODE_NAME value in the template to match the name of the node you want to label:

$ export NODE_NAME=<your-node-name>$ sed "s/NODE_NAME/${NODE_NAME}/" gpu-feature-discovery-job.yaml.template > gpu-feature-discovery-job.yaml
$ kubectl apply -f gpu-feature-discovery-job.yaml

The GPU Feature Discovery should be running on the node and generating labels for the Node Feature Discovery.

Labels

This is the list of the labels generated by NVIDIA GPU Feature Discovery and their meaning:

Label Name	Value Type	Meaning	Example
nvidia.com/cuda.runtime.major	Integer	Major of the version of CUDA	10
nvidia.com/cuda.runtime.minor	Integer	Minor of the version of CUDA	1
nvidia.com/cuda.driver.major	Integer	Major of the version of NVIDIA driver	418
nvidia.com/cuda.driver.minor	Integer	Minor of the version of NVIDIA driver	30
nvidia.com/cuda.driver.rev	Integer	Revision of the version of NVIDIA driver	40
nvidia.com/gpu.family	String	Architecture family of the GPU	kepler
nvidia.com/gpu.machine	String	Machine type	DGX-1
nvidia.com/gpu.product	String	Model of the GPU	GeForce-GT-710
nvidia.com/gpu.memory	Integer	Memory of the GPU in Mb	2048
nvidia.com/gpu.compute.major	Integer	Major of the compute capabilities	3
nvidia.com/gpu.compute.minor	Integer	Minor of the compute capabilities	3
nvidia.com/gfd.timestamp	Integer	Timestamp of the generated labels	1555019244

Run locally

Download the source code:

git clone https://github.com/NVIDIA/gpu-feature-discovery

Build the docker image:

export GFD_VERSION=$(git describe --tags --dirty --always)
docker build . --build-arg GFD_VERSION=$GFD_VERSION -t gpu-feature-discovery:${GFD_VERSION}

Run it:

mkdir -p output-dir
docker run -v ${PWD}/output-dir:/etc/kubernetes/node-feature-discovery/features.d gpu-feature-discovery:${GFD_VERSION}

You should have set the default runtime of Docker to nvidia on your host or you can also use the --runtime=nvidia option:

docker run --runtime=nvidia gpu-feature-discovery:${GFD_VERSION}

Building from source

Download the source code:

git clone https://github.com/NVIDIA/gpu-feature-discovery

Get dependies:

dep ensure

Build it:

export GFD_VERSION=$(git describe --tags --dirty --always)
go build -ldflags "-X main.Version=${GFD_VERSION}"

You can also use the Dockerfile.devel:

docker build . -f Dockerfile.devel -t gfd-devel
docker run -it gfd-devel
go build -ldflags "-X main.Version=devel"

上一篇：ansible-role-nvidia-docker

下一篇：cbootimage

用户评价

全部评价

还没有评论，说两句吧！

热门资源

TensorFlow-Course

This repository aims to provide simple and read...
seetafaceJNI

项目介绍基于中科院seetaface2进行封装的JAVA...
mxnet_VanillaCNN

This is a mxnet implementation of the Vanilla C...
DuReader_QANet_BiDAF

Machine Reading Comprehension on DuReader Usin...
Klukshu-Sockeye-...

KLUKSHU SOCKEYE PROJECTS 2016 This repositor...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com