Abstract
Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets. However, P LANET, the standard distributed tree learning algorithm implemented in systems such as XGB OOST and Spark ML LIB, scales poorly as data dimensionality and tree depths grow. We present Y GGDRASIL, a new distributed tree learning method that outperforms existing methods by up to 24× Unlike P LANET, Y GGDRASIL is based on vertical partitioning of the data (i.e., partitioning by feature), along with a set of optimized data structures to reduce the CPU and communication costs of training. Y GGDRASIL (1) trains directly on compressed data for compressible features and labels; (2) introduces efficient data structures for training on uncompressed data; and (3) minimizes communication between nodes by using sparse bitvectors. Moreover, while P LANET approximates split points through feature binning, Y G GDRASIL does not require binning, and we analytically characterize the impact of this approximation. We evaluate Y GGDRASIL against the MNIST 8M dataset and a high-dimensional dataset at Yahoo; for both, Y GGDRASIL is faster by up to an order of magnitude.