CapsNet-tf

2020-03-27 |

44 |

0 |

CapsNet-tf

A partial implementation of Dynamic Routing Between Capsules
Currently this repo contains only the experiments on MNIST/Multi-MNIST. My experiment on Multi-MNIST wasn't successful; the reconstructed images were too blur.

The CapsuleNet has an amazingly simple architecture. In my opinion, they have at least 2 special features:

Iterative attention mechanism (they call it "dynamic routing).
Vector representation of concepts/objects, with the magnitude representing the probibility of an object's existence.

Dependency

Python 3.5
Tensorflow 1.4

Usage

Run python capsule.py and use tensorboard --logdir [logdir/train/datetime-Capsule] to see the results.

To run the Multi-MNIST test, you need to first build the dataset by
python build_multimnist
and then run the experiment by
python train_multimnist.py

Results

Their figure showing the result of dimension pertubation.
图片.png

Each column is a dimension of the Digit Capsule.
Each row is a perturbation [-.25, 0.25] with 0.05 increment.

图片.png

Indeed, there are always some dimensions that represent orientation and thickness.

The image, reconstruction, and latent representation of an input of digit 8.
图片.png

The 8th row is highly activated, so the prediction is naturally 8 (note that the competing digit is 3 but the magnitude is apparently lower). In addition, the highly activated dimensions (7 & 8) represent thickness and orientation. (use tensorboard to view the results above.)

Difference

I rescaled the input to [-1, 1] and used tanh as the output non-linearity of the reconstruction net.
Not all hyper-parameters were specified in their paper.
(Note that my Multi-MNIST failed).
In my Multi-MNIST implementation, I merge two images during runtime instead of precompute a 60M dataset. This differed from the paper's method.
How "accuracy" is defined in Multi-MNIST is unclear to me. Should it be an "unweighted recall" rate? Or should it be the "exact match" rate? I ended up adding both into the summary.
In my Multi-MNIST, the network seemed to loss its ability to infer from images with a single digit, let alone to reconstruct it. This wasn't something that I expected.

Discussion

Their experimental setting on Multi-MNIST isn't 100% reasonable in my opinion. They assumed a strongly supervised condition where the 2 original images were known, which is less practical in real applications. Besides, this condition may allow us to do some other tricks during training...

图片.png

上一篇：summarization-dpp-capsnet

下一篇：CapsNet-Gravitational-Lensing

用户评价

全部评价

还没有评论，说两句吧！

热门资源

TensorFlow-Course

This repository aims to provide simple and read...
seetafaceJNI

项目介绍基于中科院seetaface2进行封装的JAVA...
mxnet_VanillaCNN

This is a mxnet implementation of the Vanilla C...
DuReader_QANet_BiDAF

Machine Reading Comprehension on DuReader Usin...
Klukshu-Sockeye-...

KLUKSHU SOCKEYE PROJECTS 2016 This repositor...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com