Analyzing Machine Learning Models with Yellowbrick
Visualization thus has a critical role to play throughout the
analytical process and is a, frankly, a must-have for any effective
analysis, for model selection, and for evaluation. This article aims to
discuss a diagnostic platform called Yellowbrick that
allows data scientists to visualize the entire model selection process
to steer us towards better, more explainable models—and avoid pitfalls
and traps along the way.
Yellowbrick
Yellowbrick
is an open source, Python project that extends the scikit-learn API
with visual analysis and diagnostic tools. The Yellowbrick API also
wraps matplotlib to create interactive data explorations.
It extends the scikit-learn API with a new core object: the
Visualizer. Visualizers allow visual models to be fit and transformed as
part of the scikit-learn pipeline process, providing visuals throughout
the transformation of high-dimensional data.
Advantages
Yellowbrick isn’t a replacement for other data visualization libraries but helps to achieve the following:
Model Visualization
Data visualization for machine learning
Visual Diagnostics
Visual Steering
Installation
Yellowbrick can either be installed through pip or through conda
distribution. For detailed instructions, you may want to refer the documentation.
via pip
pip install yellowbrick
via conda
conda install -c districtdatalabs yellowbrick
Usage
The Yellowbrick API should appear easy if you are familiar with the scikit-learn interface.
The primary interface is a Visualizer – an object that learns from
data to produce a visualization. In order to use visualizers, import the
visualizer, instantiate it, call the visualizer’s fit() method, and
then, in order to render the visualization, call the visualizer’s poof()
method, which does the magic!