Analyzing Machine Learning Models with Yellowbrick

Visualization thus has a critical role to play throughout the analytical process and is a, frankly, a must-have for any effective analysis, for model selection, and for evaluation. This article aims to discuss a diagnostic platform called Yellowbrick that allows data scientists to visualize the entire model selection process to steer us towards better, more explainable models—and avoid pitfalls and traps along the way.

Yellowbrick

Yellowbrick is an open source, Python project that extends the scikit-learn API with visual analysis and diagnostic tools. The Yellowbrick API also wraps matplotlib to create interactive data explorations.

It extends the scikit-learn API with a new core object: the Visualizer. Visualizers allow visual models to be fit and transformed as part of the scikit-learn pipeline process, providing visuals throughout the transformation of high-dimensional data.

Advantages

Yellowbrick isn’t a replacement for other data visualization libraries but helps to achieve the following:

Model Visualization
Data visualization for machine learning
Visual Diagnostics
Visual Steering

Installation

Yellowbrick can either be installed through pip or through conda distribution. For detailed instructions, you may want to refer the documentation.

via pip

pip install yellowbrick

via conda

conda install -c districtdatalabs yellowbrick

Usage

The Yellowbrick API should appear easy if you are familiar with the scikit-learn interface.

The primary interface is a Visualizer – an object that learns from data to produce a visualization. In order to use visualizers, import the visualizer, instantiate it, call the visualizer’s fit() method, and then, in order to render the visualization, call the visualizer’s poof() method, which does the magic!