资源论文Gradient Boosted Decision Trees for High Dimensional Sparse Output

Gradient Boosted Decision Trees for High Dimensional Sparse Output

2020-03-10 | |  94 |   50 |   0

Abstract

In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a L-dimensional 0/1 vector, where L is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, GBDT-S PARSE, to resolve this problem by employing L0 regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, computing the sparse residual, and predicting in sublinear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-S PARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.

上一篇:Learning Hawkes Processes from Short Doubly-Censored Event Sequences

下一篇:Stochastic Variance Reduction Methods for Policy Evaluation

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...