shufflenet_v2_x0.50, Top-1 Acc = 58.93%. This accuracy is 1.37% lower compared with the result in the official paper.
Training Details
All ImageNet images are resized by a short edge size of 256 (bicubic
interpolation by PIL). And then each of them are pickled by Python and
stored in a LMDB dataset.
Training is done by PyTorch 0.4.0
data augmentation: 224x224 random crop and random horizontal flip.
No image mean extraction is used here, which is done automatically by
data/bn layers in the network.
As in my codes, networks are initialized by nn.init.kaiming_normal_(m.weight, mode='fan_out').
A SGD with nesterov momentum (0.9) is used for optimizing. The batch
size is 1024. Models are trained by 300000 iterations, while the
learning rate decayed linearly from 0.5 to 0.
Something you might have noticed
Models are trained by PyTorch and converted to Caffe. Thus, you
should use scale parameter in Caffe's data layer to make sure all input
images are rescaled from [0, 255] to [0, 1].
The RGB~BGR problem is not very crucial, you may just ignore the
difference if you are use these models as pretrained models for other
tasks.
Others
All these years, I barely achieved same or higher results of
different kinds of complex ImageNet models reported in papers. If you
got a better accuracy, please tell me.