资源算法fast_retraining

fast_retraining

2020-02-10 | |  39 |   0 |   0

Fast Retraining

In this repo we compare two of the fastest boosted decision tree libraries: XGBoost and LightGBM. We will evaluate them across datasets of several domains and different sizes.

On July 25, 2017, we published a blog post evaluating both libraries and discussing the benchmark results. The post is Lessons Learned From Benchmarking Fast Machine Learning Algorithms.

Installation and Setup

The installation instructions can be found here.

Project

In the folder experiments you can find the different experiments of the project. We developed 6 experiments with the CPU and GPU versions of the libraries.

  • Airline

  • BCI

  • Football

  • Planet Kaggle

  • Fraud Detection

  • HIGGS

In the folder experiment/libs there is the common code for the project.

Benchmark

In the following table there are summarized the time results (in seconds) and the ratio of the benchmarks performed in the experiments:

DatasetExperimentData sizeFeaturesxgb time:
CPU (GPU)
xgb_hist time:
CPU (GPU)
lgb time:
CPU (GPU)
ratio xgb/lgb:
CPU (GPU)
ratio xgb_hist/lgb:
CPU
(GPU)
FootballLink CPU
Link GPU
19673462.27 (7.09)2.47 (4.58)0.58 (0.97)3.90
(7.26)
4.25
(4.69)
Fraud DetectionLink CPU
Link GPU
284807304.34 (5.80)2.01 (1.64)0.66 (0.29)6.58
(19.74)
3.04
(5.58)
BCILink CPU
Link GPU
20497204811.51 (12.93)41.84 (42.69)7.31 (2.76)1.57
(4.67)
5.72
(15.43)
Planet KaggleLink CPU
Link GPU
404792048313.89 (-)2115.28 (2028.43)194.57 (317.68)1.61
(-)
10.87
(6.38)
HIGGSLink CPU
Link GPU
11000000282996.16 (-)121.21 (114.88)119.34 (71.87)25.10
(-)
1.01
(1.59)
AirlineLink CPU
Link GPU
11506901713- (-)1242.09 (1271.91)1056.20 (645.40)-
(-)
1.17
(1.97)

In the next table we summarize the performance results using the F1-Score.

DatasetExperimentData sizeFeaturesxgb F1:
CPU (GPU)
xgb_hist F1:
CPU (GPU)
lgb F1:
CPU (GPU)
FootballLink
Link
19673460.458 (0.470)0.460 (0.472)0.459 (0.470)
Fraud DetectionLink
Link
284807300.824 (0.821)0.802 (0.814)0.813 (0.811)
BCILink
Link
2049720480.110 (0.093)0.142 (0.120)0.137 (0.138)
Planet KaggleLink
Link
4047920480.805 (-)0.822 (0.822)0.822 (0.821)
HIGGSLink
Link
11000000280.763 (-)0.767 (0.767)0.768 (0.767)
AirlineLink
Link
11506901713- (-)0.741 (0.745)0.732 (0.745)

The experiments were run on an Azure NV24 VM with 24 cores and 224 GB memory. The machine has 4 NVIDIA M60 GPUs. In both cases we used Ubuntu 16.04.

Contributing

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.


上一篇:ClusterGCN_google-reseach

下一篇:object_detection_retraining

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...