资源算法DeepRecommender

DeepRecommender

2019-12-24 | |  48 |   0 |   0

Deep AutoEncoders for Collaborative Filtering

This is not an official NVIDIA product. It is a research project described in: "Training Deep AutoEncoders for Collaborative Filtering"(https://arxiv.org/abs/1708.01715)

The model

The model is based on deep AutoEncoders.

Requirements

  • Python 3.6

  • Pytorch: pipenv install

  • CUDA (recommended version >= 8.0)

Training using mixed precision with Tensor Cores

Getting Started

Run unittests first

The code is intended to run on GPU. Last test can take a minute or two.

$ python -m unittest test/data_layer_tests.py
$ python -m unittest test/test_model.py

Tutorial

Checkout this tutorial by miguelgfierro.

Get the data

Note: Run all these commands within your DeepRecommender folder

Netflix prize

  • Download from here into your DeepRecommender folder

$ tar -xvf nf_prize_dataset.tar.gz
$ tar -xf download/training_set.tar
$ python ./data_utils/netflix_data_convert.py training_set Netflix

Data stats

DatasetNetflix 3 monthsNetflix 6 monthsNetflix 1 yearNetflix full
Ratings train13,675,40229,179,00941,451,83298,074,901
Users train311,315390,795345,855477,412
Items train17,73617,75716,90717,768
Time range train2005-09-01 to 2005-11-312005-06-01 to 2005-11-312004-06-01 to 2005-05-311999-12-01 to 2005-11-31
-----------------------------------------------
Ratings test2,082,5592,175,5353,888,6842,250,481
Users test160,906169,541197,951173,482
Items test17,26117,29016,50617,305
Time range test2005-12-01 to 2005-12-312005-12-01 to 2005-12-312005-06-01 to 2005-06-312005-12-01 to 2005-12-31

Train the model

In this example, the model will be trained for 12 epochs. In paper we train for 102.

python run.py --gpu_ids 0 
--path_to_train_data Netflix/NF_TRAIN 
--path_to_eval_data Netflix/NF_VALID 
--hidden_layers 512,512,1024 
--non_linearity_type selu 
--batch_size 128 
--logdir model_save 
--drop_prob 0.8 
--optimizer momentum 
--lr 0.005 
--weight_decay 0 
--aug_step 1 
--noise_prob 0 
--num_epochs 12 
--summary_frequency 1000

Note that you can run Tensorboard in parallel

$ tensorboard --logdir=model_save

Run inference on the Test set

python infer.py 
--path_to_train_data Netflix/NF_TRAIN 
--path_to_eval_data Netflix/NF_TEST 
--hidden_layers 512,512,1024 
--non_linearity_type selu 
--save_path model_save/model.epoch_11 
--drop_prob 0.8 
--predictions_path preds.txt

Compute Test RMSE

python compute_RMSE.py --path_to_predictions=preds.txt

After 12 epochs you should get RMSE around 0.927. Train longer to get below 0.92

Results

It should be possible to achieve the following results. Iterative output re-feeding should be applied once during each iteration.

(exact numbers will vary due to randomization)

DataSetRMSEModel Architecture
Netflix 3 months0.9373n,128,256,256,dp(0.65),256,128,n
Netflix 6 months0.9207n,256,256,512,dp(0.8),256,256,n
Netflix 1 year0.9225n,256,256,512,dp(0.8),256,256,n
Netflix full0.9099n,512,512,1024,dp(0.8),512,512,n


上一篇:cbootimage-configs

下一篇:cnmem

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...