Fast Retraining

In this repo we compare two of the fastest boosted decision tree libraries: XGBoost and LightGBM. We will evaluate them across datasets of several domains and different sizes.

On July 25, 2017, we published a blog post evaluating both libraries and discussing the benchmark results. The post is Lessons Learned From Benchmarking Fast Machine Learning Algorithms.

Installation and Setup

The installation instructions can be found here.

Project

In the folder experiments you can find the different experiments of the project. We developed 6 experiments with the CPU and GPU versions of the libraries.

Airline
BCI
Football
Planet Kaggle
Fraud Detection
HIGGS

In the folder experiment/libs there is the common code for the project.

Benchmark

In the following table there are summarized the time results (in seconds) and the ratio of the benchmarks performed in the experiments:

Dataset	Experiment	Data size	Features	xgb time: CPU (GPU)	xgb_hist time: CPU (GPU)	lgb time: CPU (GPU)	ratio xgb/lgb: CPU (GPU)	ratio xgb_hist/lgb: CPU (GPU)
Football	Link CPU Link GPU	19673	46	2.27 (7.09)	2.47 (4.58)	0.58 (0.97)	3.90 (7.26)	4.25 (4.69)
Fraud Detection	Link CPU Link GPU	284807	30	4.34 (5.80)	2.01 (1.64)	0.66 (0.29)	6.58 (19.74)	3.04 (5.58)
BCI	Link CPU Link GPU	20497	2048	11.51 (12.93)	41.84 (42.69)	7.31 (2.76)	1.57 (4.67)	5.72 (15.43)
Planet Kaggle	Link CPU Link GPU	40479	2048	313.89 (-)	2115.28 (2028.43)	194.57 (317.68)	1.61 (-)	10.87 (6.38)
HIGGS	Link CPU Link GPU	11000000	28	2996.16 (-)	121.21 (114.88)	119.34 (71.87)	25.10 (-)	1.01 (1.59)
Airline	Link CPU Link GPU	115069017	13	- (-)	1242.09 (1271.91)	1056.20 (645.40)	- (-)	1.17 (1.97)

In the next table we summarize the performance results using the F1-Score.

Dataset	Experiment	Data size	Features	xgb F1: CPU (GPU)	xgb_hist F1: CPU (GPU)	lgb F1: CPU (GPU)
Football	Link Link	19673	46	0.458 (0.470)	0.460 (0.472)	0.459 (0.470)
Fraud Detection	Link Link	284807	30	0.824 (0.821)	0.802 (0.814)	0.813 (0.811)
BCI	Link Link	20497	2048	0.110 (0.093)	0.142 (0.120)	0.137 (0.138)
Planet Kaggle	Link Link	40479	2048	0.805 (-)	0.822 (0.822)	0.822 (0.821)
HIGGS	Link Link	11000000	28	0.763 (-)	0.767 (0.767)	0.768 (0.767)
Airline	Link Link	115069017	13	- (-)	0.741 (0.745)	0.732 (0.745)

The experiments were run on an Azure NV24 VM with 24 cores and 224 GB memory. The machine has 4 NVIDIA M60 GPUs. In both cases we used Ubuntu 16.04.

Contributing

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

上一篇：ClusterGCN_google-reseach

下一篇：object_detection_retraining

用户评价

全部评价

还没有评论，说两句吧！

热门资源

TensorFlow-Course

This repository aims to provide simple and read...
seetafaceJNI

项目介绍基于中科院seetaface2进行封装的JAVA...
mxnet_VanillaCNN

This is a mxnet implementation of the Vanilla C...
tensorflow-sketch...

Discrlaimer: This is not an official Google pro...
vsepp_tensorflow

Improving Visual-Semantic Embeddings with Hard ...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com