A Communication-Efficient Parallel Algorithm for Decision Tree

资源分类

2020-02-05 |

93 |

105 |

Abstract

Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and model interpretability. With the emergence of big data, there is an increasing need to parallelize the training process of decision tree. However, most existing attempts along this line suffer from high communication costs. In this paper, we propose a new algorithm, called Parallel Voting Decision Tree (PV-Tree), to tackle this challenge. After partitioning the training data onto a number of (e.g., M ) machines, this algorithm performs both local voting and global voting in each iteration. For local voting, the top-k attributes are selected from each machine according to its local data. Then, globally top-2k attributes are determined by a majority voting among these local candidates. Finally, the full-grained histograms of the globally top-2k attributes are collected from local machines in order to identify the best (most informative) attribute and its split point. PV-Tree can achieve a very low communication cost (independent of the total number of attributes) and thus can scale out very well. Furthermore, theoretical analysis shows that this algorithm can learn a near optimal decision tree, since it can find the best attribute with a large probability. Our experiments on real-world datasets show that PV-Tree significantly outperforms the existing parallel decision tree algorithms in the trade-off between accuracy and efficiency.

上一篇：Orthogonal Random Features

下一篇：Improved Error Bounds for Tree Representations of Metric Spaces

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Deep Cross-media ...

Cross-media retrieval is a research hotspot in ...
Regularizing RNNs...

Recently, caption generation with an encoder-de...
The Variational S...

Unlike traditional images which do not offer in...
Supervised Descen...

Many computer vision problems (e.
Visual Reinforcem...

For an autonomous agent to fulfill a wide range...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com