资源算法gpu-monitoring-tools

gpu-monitoring-tools

2019-12-24 | |  65 |   0 |   0

NVIDIA GPU Monitoring Tools

NVML Go Bindi

NVIDIA GPU Monitoring Tools

NVML Go Bindings

NVIDIA Management Library (NVML) is a C-based API for monitoring and managing NVIDIA GPU devices. NVML go bindings are taken from nvidia-docker 1.0 with some improvements and additions. NVML headers are also added to the package to make it easy to use and build.

NVML Samples

Three samples are included to demonstrate how to use the NVML API.

DCGM Go Bindings

NVIDIA Data Center GPU Manager (DCGM) is a set of tools for managing and monitoring NVIDIA GPUs in cluster environments. It's a low overhead tool suite that performs a variety of functions on each host system including active health monitoring, diagnostics, system validation, policies, power and clock management, group configuration and accounting.

DCGM go bindings makes administering and monitoring containerized GPU applications easy.

DCGM Samples

DCGM can be run in different modes, seven samples and a REST API are included for showing how to use the DCGM API and run it in different modes.

DCGM exporter

GPU metrics exporter for Prometheus leveraging NVIDIA Data Center GPU Manager (DCGM) is a simple shell script that starts nv-hostengine, reads GPU metrics every 1 second and converts it to a standard Prometheus format.

Find the installation and run instructions here.

Issues and Contributing

Checkout the Contributing document!

ngs

NVIDIA Management Library (NVML) is a C-based API for monitoring and managing NVIDIA GPU devices. NVML go bindings are taken from nvidia-docker 1.0 with some improvements and additions. NVML headers are also added to the package to make it easy to use and build.

NVML Samples

Three samples are included to demonstrate how to use the NVML API.

DCGM Go Bindings

NVIDIA Data Center GPU Manager (DCGM) is a set of tools for managing and monitoring NVIDIA GPUs in cluster environments. It's a low overhead tool suite that performs a variety of functions on each host system including active health monitoring, diagnostics, system validation, policies, power and clock management, group configuration and accounting.

DCGM go bindings makes administering and monitoring containerized GPU applications easy.

DCGM Samples

DCGM can be run in different modes, seven samples and a REST API are included for showing how to use the DCGM API and run it in different modes.

DCGM exporter

GPU metrics exporter for Prometheus leveraging NVIDIA Data Center GPU Manager (DCGM) is a simple shell script that starts nv-hostengine, reads GPU metrics every 1 second and converts it to a standard Prometheus format.

Find the installation and run instructions here.

Issues and Contributing

Checkout the Contributing document!


上一篇:kubevirt-gpu-device-plugin

下一篇:ansible-role-nvidia-driver

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...