This code allows you to train the Visnet model. Visnet, trained on Flipkart's proprietary internal dataset, powers Visual Recommendations at Flipkart. On the publically available dataset, Street2Shop, Visnet achieves state-of-the-art results. Here is the link to the arXiv tech report.
In this Repo, we have open-sourced the following:
Training prototxts of Visnet
Triplet sampling code, to generate the training files
A CUDA based fast K-Nearest Neighbor Search library
Other auxillary scripts, such as code to process Street2Shop dataset, sampling triplets, etc.
We soon plan to add other useful scripts, such as:
Our useful modifications over Caffe - the image augmentation layer, and triplet accuracy layer to aid the training of Visnet
Visnet Architecture
VisNet is a Convolutional Neural Network (CNN) trained using triplet based deep ranking paradigm. It contains a deep CNN modelled after the VGG-16 network, coupled with parallel shallow convolution layers in order to capture both high-level and low-level image details simultaneously.
Training
In order to train you need a set of triplets <q,p,n>. For compatibility with Caffe's ImageData layer, you need 3 sets of triplet files (one each for q, p and n). The lines in those files should correspond to triplets, i.e. line#i in each file should correspond to the i'th triplet.
If you wish to train Visnet on Street2Shop dataset, you need to:
Download the Street2Shop dataset (This contains only the image URLs)
Download Street2Shop images (Have a look at scripts/image_downloader.py)
You can then format the data using scripts/create_structured_images.py and scripts/create_wtbi_crops.py
Use scripts/sampler.py to sample the triplet files
Change visnet/train.prototxt to include the location to your triplet files
Run training using Caffe
Feature extraction and NN Search
We provide PyCaffe code to do Feature Extraction (scripts/feature_extractor.py), and a CUDA-based fast NN computer (scripts/cuda_knn.py).