资源算法standalone-center-loss

standalone-center-loss

2020-04-03 | |  29 |   0 |   0

standalone-center-loss

Evaluating the effectiveness of using standalone center loss.

NOTE: Some of the code are following KaiyangZhou's implementation of center loss!!!

Introduction

In Wen et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016, the author proposed to train CNNs under the joint supervision of the softmax loss and center loss, with a hyper parameter to balance the two supervision loss. The softmax loss forces the deep features of different classes staying apart. The center loss efficiently pulls the deep features of the same class to their centers. With the joint supervision, not only the inter-class features differences are enlarged, but also the intra-class features variations are reduced. Here, I'd like to explore alternatives to the softmax loss term. Intuitively, the overall loss function should pull the intra-class samples together, and push the inter-class samples away. Following this simple idea, the intra loss is defined the same as the center loss, whereas, the inter loss is defined by the negative distance of two samples of different classes. This way, reducing the inter loss becomes directly enlarging the distance across different classes. However, this inter loss term has no global minima, which can go down rapidly to , and turn the weights of the model to nan. A way to address this problem is to truncate the inter loss by a margin. The distance across the deeply learned features should reflect the relationship across the dataset. Thus, introducing the manually selected margin may not maintain the relationship. Instead of truncating the inter loss by a margin, here the problem is addressed by applying logarithmic function to the inter loss. Theoretically, the log inter loss can also go toward , but it slows down declining as the value decrease. Furthermore, combining with the intra loss can also help to learn stable features since increasing the distance across different classes may also increasing the intra loss, reducing the intra loss helps to prevent distances across different classes going toward infinity. By doing the gradient analysis, applying logarithmic function to the inter loss is in fact equivalent to adding different weights to the inter loss, which forces the model to focus more on the hard samples.

The loss function used in this repo is:

where if  otherwise  and  are hyper parameters to balance the two supervision loss.

Experiment

lossdatasetfeat_dimacclrepochsbatch_sizeweight_centweight_intraweight_inter
xentmnist20.9910.01100128///
centmnist20.9900.011001281.0//
standalonemnist20.9940.01100128/1.00.1
xentmnist1280.9940.01100128///
centmnist1280.9960.011001281.0//
standalonemnist1280.9960.01100128/1.00.1
xentfashion-mnist20.9130.01100128///
centfashion-mnist20.9130.011001281.0//
standalonefashion-mnist20.9210.01100128/1.00.1
xentfashion-mnist1280.9260.01100128///
centfashion-mnist1280.9320.011001281.0//
standalonefashion-mnist1280.9220.01100128/1.00.1
xentcifar-1020.8150.01100128///
centcifar-1020.7750.011001281.0//
standalonecifar-1020.7870.01100128/1.00.1
xentcifar-101280.8660.01100128///
centcifar-101280.8580.011001281.0//
standalonecifar-101280.8060.01100128/1.00.1

Visualization

lossdatasetfeat_dimtrainval
xentmnist2
centmnist2
standalonemnist2
xentfashion-mnist2
centfashion-mnist2
standalonefashion-mnist2
xentcifar-102
centcifar-102
standalonecifar-102
lossdatasetfeat_dimtrainval
xentmnist128
centmnist128
standalonemnist128
xentfashion-mnist128
centfashion-mnist128
standalonefashion-mnist128
xentcifar-10128
centcifar-10128
standalonecifar-10128

Evaluation

Since no softmax loss is used in this implementation, using argmax to compute the accuracy is infeasible. Here, the evaluation is done by assigning each sample to the nearest center.

Installation

  1. install PyTorch

  2. run the following command:

pip3 install -r requirements.txt

Discussion

If you watch the train log of both standalone center loss and center loss with softmax loss, you‘ll find that the testset accuracy is much closer to the trainset accuracy under the supervision of standalone center loss than the center loss with softmax loss, does this mean that standalone center loss is less overfit prone, or it's just because the standalone center loss is inferior to the center loss with softmax loss?

As the gifs shown above, both two loss function really do a good job on mnist dataset, the reason for this could be that mnist is a fairly simple dataset. Therefore, next time you devise your new loss function, try it first on other dataset like fashion-mnist or cifar10 rather than on mnist would be a good choice, and further verify its effectiveness on mnist later.

Is it possible to build a much powerful model to turn a complicated dataset to a simple one, then the standalone center loss can learn a good feature for that dataset?

Misc

Alternatives I've tried are summarized as follows:

  1. directly enlarge the class centers.

where  is the scale factor, which is used for scaling the intra distance before applying the exponential function, this is required for numerical stabilization,  is the number of classes,if  otherwise  and  are hyper parameters to balance the two supervision loss. This loss term has the similar effect as the one used in this repo on the mnist dataset, but failed to learn a set of good represents on fashion-mnist dataset and cifar10 dataset.

  1. build an extra fully connected layer over the class centers and apply softmax loss.

This way, at least I can reduce the computation cost (batch_size vs num_classes), the final result is worse than the original center loss with softmax loss, probably it is because the centers are already separable but indeed they are closed to each other.

Rethinking Softmax Loss

Does softmax loss only focus on learning separable features? What is the relationship between increasing the inter class separability and increasing the inter class distance? Will softmax loss stop to increase inter class distance when the learned features are already well separated? Would it possible for the features learned by softmax loss to reflect the relationship across the dataset?

Acknowledgement

Thanks Google for providing free access to their Colab service, without this service I could not finish all the experiments in one month.


上一篇: tf-center-loss

下一篇:Center_Loss_in_MXNet

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • shih-styletransfer

    shih-styletransfer Code from Style Transfer ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...