Abstract Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classififier requires more computation as the number of classes grows. The label tree model integrates classifification with the traversal of the tree so that complexity grows logarithmically. In this paper, we show how the parameters of the label tree can be found using maximum likelihood estimation. This new probabilistic learning technique produces a label tree with signifificantly improved recognition accuracy.