Conditional Similarity Networks (CSNs)

This repository contains a PyTorch implementation of the paper Conditional Similarity Networks presented at CVPR 2017.

The code is based on the PyTorch example for training ResNet on Imagenet and the Triplet Network example.

Introduction

What makes images similar? To measure the similarity between images, they are typically embedded in a feature-vector space, in which their distance preserve the relative dissimilarity. However, when learning such similarity embeddings the simplifying assumption is commonly made that images are only compared to one unique measure of similarity.

Conditional Similarity Networks address this shortcoming by learning a nonlinear embeddings that gracefully deals with multiple notions of similarity within a shared embedding. Different aspects of similarity are incorporated by assigning responsibility weights to each embedding dimension with respect to each aspect of similarity.

Images are passed through a convolutional network and projected into a nonlinear embedding such that different dimensions encode features for specific notions of similarity. Subsequent masks indicate which dimensions of the embedding are responsible for separate aspects of similarity. We can then compare objects according to various notions of similarity by selecting an appropriate masked subspace.

Usage

The detault setting for this repo is a CSN with fixed masks, an embedding dimension 64 and four notions of similarity.

You can download the Zappos dataset as well as the training, validation and test triplets used in the paper with

python get_data.py

The network can be simply trained with python main.py or with optional arguments for different hyperparameters:

$ python main.py --name {your experiment name} --learned --num_traintriplets 200000

Training progress can be easily tracked with visdom using the --visdom flag. It keeps track of the learning rate, loss, training and validation accuracy both for all triplets as well as separated for each notion of similarity, the embedding norm, mask norm as well as the masks.

By default the training code keeps track of the model with the highest performance on the validation set. Thus, after the model has converged, it can be directly evaluated on the test set as follows

$ python main.py --test --resume runs/{your experiment name}/model_best.pth.tar

Citing

If you find this helps your research, please consider citing:

@conference{Veit2017,
title = {Conditional Similarity Networks},
author = {Andreas Veit and Serge Belongie and Theofanis Karaletsos},
year = {2017},
journal = {Computer Vision and Pattern Recognition (CVPR)},
}