cuml

cuML - GPU Machine Learning Algorithms

cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects.

cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn.

For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.

As an example, the following Python snippet loads input and computes DBSCAN clusters, all on GPU:

import cudffrom cuml.cluster import DBSCAN# Create and populate a GPU DataFramegdf_float = cudf.DataFrame() gdf_float['0'] = [1.0, 2.0, 5.0] gdf_float['1'] = [4.0, 2.0, 1.0] gdf_float['2'] = [4.0, 2.0, 1.0]# Setup and fit clustersdbscan_float = DBSCAN(eps=1.0, min_samples=1) dbscan_float.fit(gdf_float)print(dbscan_float.labels_)

Output:

0 0 1 1 2 2 dtype: int32

cuML also features multi-GPU and multi-node-multi-GPU operation, using Dask, for a growing list of algorithms. The following Python snippet reads input from a CSV file and performs a NearestNeighbors query across a cluster of Dask workers, using multiple GPUs on a single node:

# Create a Dask CUDA cluster w/ one worker per devicefrom dask_cuda import LocalCUDACluster cluster = LocalCUDACluster()# Read CSV file in parallel across workersimport dask_cudf df = dask_cudf.read_csv("/path/to/csv")# Fit a NearestNeighbors model and query itfrom cuml.dask.neighbors import NearestNeighbors nn = NearestNeighbors(n_neighbors = 10) nn.fit(df) neighbors = nn.kneighbors(df)

For additional examples, browse our complete API documentation, or check out our introductory walkthrough notebooks. Finally, you can find complete end-to-end examples in the notebooks-contrib repo.

Supported Algorithms

Category	Algorithm	Notes
Clustering	Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
	K-Means	Multi-Node Multi-GPU
Dimensionality Reduction	Principal Components Analysis (PCA)
	Truncated Singular Value Decomposition (tSVD)	Multi-GPU version available (CUDA 10 only)
	Uniform Manifold Approximation and Projection (UMAP)
	Random Projection
	t-Distributed Stochastic Neighbor Embedding (TSNE)
Linear Models for Regression or Classification	Linear Regression (OLS)	Multi-GPU available in conda CUDA 10 package
	Linear Regression with Lasso or Ridge Regularization
	ElasticNet Regression
	Logistic Regression
	Stochastic Gradient Descent (SGD), Coordinate Descent (CD), and Quasi-Newton (QN) (including L-BFGS and OWL-QN) solvers for linear models
Nonlinear Models for Regression or Classification	Random Forest (RF) Classification	Experimental multi-node, multi-GPU version available via Dask integration
	Random Forest (RF) Regression	Experimental multi-node, multi-GPU version available via Dask integration
	K-Nearest Neighbors (KNN)	Multi-GPU Uses Faiss
	Support Vector Machine Classifier (SVC)
Time Series	Linear Kalman Filter
	Holt-Winters Exponential Smoothing

Category

Algorithm

Notes

Clustering

Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

K-Means

Multi-Node Multi-GPU

Dimensionality Reduction

Principal Components Analysis (PCA)

Truncated Singular Value Decomposition (tSVD)

Multi-GPU version available (CUDA 10 only)

Uniform Manifold Approximation and Projection (UMAP)

Random Projection

t-Distributed Stochastic Neighbor Embedding (TSNE)

Linear Models for Regression or Classification

Linear Regression (OLS)

Multi-GPU available in conda CUDA 10 package

Linear Regression with Lasso or Ridge Regularization

ElasticNet Regression

Logistic Regression

Stochastic Gradient Descent (SGD), Coordinate Descent (CD), and Quasi-Newton (QN) (including L-BFGS and OWL-QN) solvers for linear models

Nonlinear Models for Regression or Classification

Random Forest (RF) Classification

Experimental multi-node, multi-GPU version available via Dask integration

Random Forest (RF) Regression

Experimental multi-node, multi-GPU version available via Dask integration

K-Nearest Neighbors (KNN)

Multi-GPU
Uses Faiss

Support Vector Machine Classifier (SVC)

Time Series

Linear Kalman Filter

Holt-Winters Exponential Smoothing

More ML algorithms in cuML and more ML primitives in ml-prims are planned for future releases, including: spectral embedding, spectral clustering, support vector machines, and additional time series methods. Future releases will also expand support for multi-node, multi-GPU algorithms.

Open GPU Data Science

The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com

cuML - GPU Machine Learning Algorithms

Supported Algorithms

Installation

Build/Install from Source

Contributing

Contact

Open GPU Data Science