Abstract
A fundamental task in machine learning is to compare the performance of multiple algorithms. This is usually performed by the frequentist Friedman test followed by multiple comparisons. This implies dealing with the well-known shortcomings of null hypothesis significance tests. We propose a Bayesian approach to overcome these problems. We provide three main contributions. First, we propose a nonparametric Bayesian version of the Friedman test using a Dirichlet process (DP) based prior. We show that, from a Bayesian perspective, the Friedman test is an inference for a multivariate mean based on an ellipsoid inclusion test. Second, we derive a joint procedure for the multiple comparisons which accounts for their dependencies and which is based on the posterior probability computed through the DP. The proposed approach allows verifying the null hypothesis, not only rejecting it. Third, as a practical application we show the results in our algorithm for racing, i.e. identifying the best algorithm among a large set of candidates sequentially assessed. Our approach consistently outperforms its frequentist counterpart.