资源论文Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

2020-02-05 | |  207 |   58 |   0

Abstract 

Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets. However, P LANET, the standard distributed tree learning algorithm implemented in systems such as XGB OOST and Spark ML LIB, scales poorly as data dimensionality and tree depths grow. We present Y GGDRASIL, a new distributed tree learning method that outperforms existing methods by up to 24× Unlike P LANET, Y GGDRASIL is based on vertical partitioning of the data (i.e., partitioning by feature), along with a set of optimized data structures to reduce the CPU and communication costs of training. Y GGDRASIL (1) trains directly on compressed data for compressible features and labels; (2) introduces efficient data structures for training on uncompressed data; and (3) minimizes communication between nodes by using sparse bitvectors. Moreover, while P LANET approximates split points through feature binning, Y G GDRASIL does not require binning, and we analytically characterize the impact of this approximation. We evaluate Y GGDRASIL against the MNIST 8M dataset and a high-dimensional dataset at Yahoo; for both, Y GGDRASIL is faster by up to an order of magnitude.

上一篇:Backprop KF: Learning Discriminative Deterministic State Estimators

下一篇:Learning in Games: Robustness of Fast Convergence

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...