Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

资源分类

2020-02-05 |

207 |

58 |

Abstract

Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets. However, P LANET, the standard distributed tree learning algorithm implemented in systems such as XGB OOST and Spark ML LIB, scales poorly as data dimensionality and tree depths grow. We present Y GGDRASIL, a new distributed tree learning method that outperforms existing methods by up to 24× Unlike P LANET, Y GGDRASIL is based on vertical partitioning of the data (i.e., partitioning by feature), along with a set of optimized data structures to reduce the CPU and communication costs of training. Y GGDRASIL (1) trains directly on compressed data for compressible features and labels; (2) introduces efficient data structures for training on uncompressed data; and (3) minimizes communication between nodes by using sparse bitvectors. Moreover, while P LANET approximates split points through feature binning, Y G GDRASIL does not require binning, and we analytically characterize the impact of this approximation. We evaluate Y GGDRASIL against the MNIST 8M dataset and a high-dimensional dataset at Yahoo; for both, Y GGDRASIL is faster by up to an order of magnitude.

上一篇：Backprop KF: Learning Discriminative Deterministic State Estimators

下一篇：Learning in Games: Robustness of Fast Convergence

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com