Abstract Dimensionality reduction is an essential step in high-dimensional data analysis. Many dimensionality reduction algorithms have been applied successfully to multi-class and multi-label problems. They are commonly applied as a separate data preprocessing step before classifification algorithms. In this paper, we study a joint learning framework in which we perform dimensionality reduction and multi-label classifification simultaneously. We show that when the least squares loss is used in classififi- cation, this joint learning decouples into two separate components, i.e., dimensionality reduction followed by multi-label classifification. This analysis partially justififies the current practice of a separate application of dimensionality reduction for classi- fification problems. We extend our analysis using other loss functions, including the hinge loss and the squared hinge loss. We further extend the formulation to the more general case where the input data for different class labels may differ, overcoming the limitation of traditional dimensionality reduction algorithms. Experiments on benchmark data sets have been conducted to evaluate the proposed joint formulations.