

2019-09-19 | |  55 |   0 |   0


Collaborative Filtering NN and CNN based recommender implemented with MXNet

The dataset is taken from ml-latest-small (MovieLens)

The trained models an be found in demo/models

Deep Learning Models

Collaborative Filtering Models

Content-based Filtering Models


The following code samples provide an illustration on both training and prediction using a deep learning model in the mxnet_recommender/library. Other deep learning models follow the similar training and prediction patterns.

Train CF model

To train a CF model, say CollaborativeFilteringV1, run the following commands:

pip install requirements.txtcd demo
python collaborative_filtering_v1.py

The training code in collaborative_filtering_v1.py is quite straightforward and illustrated below:

from sklearn.model_selection import train_test_splitimport pandas as pdfrom mxnet_recommender.library.cf import CollaborativeFilteringV1data_dir_path = './data/ml-latest-small' # refers to demo/data/ml-latest-small foldertrained_model_dir_path = './models' # refers to demo/models folderrecords = pd.read_csv(data_dir_path + '/ratings.csv')print(records.describe())ratings_train, ratings_test = train_test_split(records, test_size=0.2, random_state=0)user_id_train = ratings_train.as_matrix(columns=['userId'])item_id_train = ratings_train.as_matrix(columns=['movieId'])rating_train = ratings_train.as_matrix(columns=['rating'])user_id_test = ratings_test.as_matrix(['userId'])item_id_test = ratings_test.as_matrix(['movieId'])rating_test = ratings_test.as_matrix(['rating'])max_user_id = records['userId'].max()max_item_id = records['movieId'].max()# default context for the recommender is mxnet.cpu() which uses CPU for the model context and data context# change this line to cf = CollaborativeFilteringV1(model_ctx=mxnet.gpu(0)) if you want to use GPU insteadcf = CollaborativeFilteringV1() cf.max_user_id = max_user_idcf.max_item_id = max_item_idhistory = cf.fit(user_id_train=user_id_train,
                 model_dir_path=trained_model_dir_path)metrics = cf.evaluate_mae(user_id_test=user_id_test,

After the training is completed, the trained models will be saved as cf-v1-. in the demo/models.

Predict Rating using CF trained model

To use the trained CF model to predict the rating of an item by a user, you can use the following code:

from mxnet_recommender.library.cf import CollaborativeFilteringV1import pandas as pddata_dir_path = './data/ml-latest-small' # refers to demo/data/ml-latest-small foldertrained_model_dir_path = './models' # refers to demo/models folderrecords = pd.read_csv(data_dir_path + '/ratings.csv')print(records.describe())user_id_test = records['userId']item_id_test = records['movieId']cf = CollaborativeFilteringV1()cf.load_model(trained_model_dir_path)# batch predictionpredicted_ratings = cf.predict(user_id_test, item_id_test)print(predicted_ratings)# individual (user_id, item_id) predictionfor i in range(20):
    user_id = user_id_test[i]
    item_id = item_id_test[i]
    predicted_rating = cf.predict_single(user_id, item_id)
    print('predicted rating: ', predicted_rating)

Train CF model with Temporal Information

To train a CF model to also take timestamp into consideration, say CollaborativeFilteringWithTemporalInformation, run the following commands:

pip install requirements.txtcd demo
python collaborative_filtering_temporal.py

The training code in collaborative_filtering_temporal.py is illustrated below:

from sklearn.model_selection import train_test_splitimport pandas as pdfrom mxnet_recommender.library.cf import CollaborativeFilteringWithTemporalInformationdef main():
    data_dir_path = './data/ml-latest-small'
    output_dir_path = './models'

    records = pd.read_csv(data_dir_path + '/ratings.csv')

    ratings_train, ratings_test = train_test_split(records, test_size=0.2, random_state=0)

    timestamp_train = ratings_train.as_matrix(columns=['timestamp'])
    user_id_train = ratings_train.as_matrix(columns=['userId'])
    item_id_train = ratings_train.as_matrix(columns=['movieId'])
    rating_train = ratings_train.as_matrix(columns=['rating'])

    timestamp_test = ratings_test.as_matrix(columns=['timestamp'])
    user_id_test = ratings_test.as_matrix(columns=['userId'])
    item_id_test = ratings_test.as_matrix(columns=['movieId'])
    rating_test = ratings_test.as_matrix(columns=['rating'])

    max_user_id = records['userId'].max()
    max_item_id = records['movieId'].max()

    cf = CollaborativeFilteringWithTemporalInformation()
    cf.max_user_id = max_user_id
    cf.max_item_id = max_item_id
    history = cf.fit(user_id_train=user_id_train,

    metrics = cf.evaluate_mae(user_id_test=user_id_test,
                              rating_test=rating_test)if __name__ == '__main__':

After the training is completed, the trained models will be saved as temporal-cf-. in the demo/models.

Predict Rating with Temporal Information

To use the trained CF model to predict the rating of an item by a user at a particular time, you can use the following code:

import pandas as pdfrom mxnet_recommender.library.cf import CollaborativeFilteringWithTemporalInformationdef main():
    data_dir_path = './data/ml-latest-small'
    trained_model_dir_path = './models'

    records = pd.read_csv(data_dir_path + '/ratings.csv')

    timestamp_test = records.as_matrix(columns=['timestamp'])
    user_id_test = records.as_matrix(columns=['userId'])
    item_id_test = records.as_matrix(columns=['movieId'])
    rating_test = records.as_matrix(columns=['rating'])

    cf = CollaborativeFilteringWithTemporalInformation()

    predicted_ratings = cf.predict(user_id_test, item_id_test, timestamp_test)

    for i in range(20):
        user_id = user_id_test[i]
        item_id = item_id_test[i]
        timestamp = timestamp_test[i]
        rating = rating_test[i]
        predicted_rating = cf.predict_single(user_id, item_id, timestamp)
        print('predicted: ', predicted_rating, ' actual: ', rating)if __name__ == '__main__':

Train Content-based Filtering model with Temporal Information

To train a content-based filtering model, say TemporalContentBasedFiltering, run the following commands:

pip install requirements.txtcd demo
python temporal_content_based_filtering.py

The training code in temporal_content_based_filtering.py is illustrated below:

import pandas as pdfrom sklearn.model_selection import train_test_splitfrom mxnet_recommender.library.content_based_filtering import TemporalContentBasedFilteringdef main():
    data_dir_path = './data/ml-latest-small'
    output_dir_path = './models'

    records = pd.read_csv(data_dir_path + '/ratings.csv')

    ratings_train, ratings_test = train_test_split(records, test_size=0.2, random_state=0)

    timestamp_train = ratings_train.as_matrix(columns=['timestamp'])
    item_id_train = ratings_train.as_matrix(columns=['movieId'])
    rating_train = ratings_train.as_matrix(columns=['rating'])

    timestamp_test = ratings_test.as_matrix(columns=['timestamp'])
    item_id_test = ratings_test.as_matrix(columns=['movieId'])
    rating_test = ratings_test.as_matrix(columns=['rating'])

    max_item_id = records['movieId'].max()

    cf = TemporalContentBasedFiltering()
    cf.max_item_id = max_item_id
    history = cf.fit(timestamp_train=timestamp_train,

    metrics = cf.evaluate_mae(timestamp_test=timestamp_test,
                              rating_test=rating_test)if __name__ == '__main__':

After the training is completed, the trained models will be saved as temporal-cbf-. in the demo/models.

Predict Item Rating with Temporal Information

To use the trained CF model to predict the rating of an item at a particular time, you can use the following code:

import pandas as pdfrom mxnet_recommender.library.content_based_filtering import TemporalContentBasedFilteringdef main():
    data_dir_path = './data/ml-latest-small'
    trained_model_dir_path = './models'

    records = pd.read_csv(data_dir_path + '/ratings.csv')

    timestamp_test = records.as_matrix(columns=['timestamp'])
    item_id_test = records.as_matrix(columns=['movieId'])
    rating_test = records.as_matrix(columns=['rating'])

    max_item_id = records['movieId'].max()

    config = dict()
    config['max_item_id'] = max_item_id

    cf = TemporalContentBasedFiltering()

    predicted_ratings = cf.predict(item_id_test, timestamp_test)

    for i in range(20):
        date = timestamp_test[i]
        item_id = item_id_test[i]
        rating = rating_test[i]
        predicted_rating = cf.predict_single(item_id, date)
        print('predicted: ', predicted_rating, ' actual: ', rating)if __name__ == '__main__':


Note that the default training scripts in the demo folder use GPU for training, therefore, you must configure your graphic card for this (or remove the "model_ctx=mxnet.gpu(0)" in the training scripts).

  • Step 1: Download and install the CUDA Toolkit 9.0 (you should download CUDA Toolkit 9.0)

  • Step 2: Download and unzip the cuDNN 7.0.4 for CUDA@ Toolkit 9.0 and add the bin folder of the unzipped directory to the $PATH of your Windows environment





  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...