Tiramisu combines DensetNet and U-Net
for high performance semantic segmentation. In this repository, we
attempt to replicate the authors' results on the CamVid dataset.
Tiramisu adopts the UNet design with downsampling, bottleneck, and
upsampling paths and skip connections. It replaces convolution and max
pooling layers with Dense blocks from the DenseNet architecture. Dense
blocks contain residual connections like in ResNet except they
concatenate, rather than sum, prior feature maps.
Authors' Results
Our Results
FCDenseNet67
We trained for 670 epochs (224x224 crops) with 100 epochs fine-tuning
(full-size images). The authors mention "global accuracy" of 90.8 for
FC-DenseNet67 on Camvid, compared to our 86.8. If we exclude the
'background' class, accuracy increases to ~89%. We think the authors did
this, but haven't confirmed.
Dataset
Loss
Accuracy
Validation
.209
92.5
Testset
.435
86.8
FCDenseNet103
We trained for 874 epochs with 50 epochs fine-tuning.
Predictions
Training
Hyperparameters
WeightInitialization = HeUniform
Optimizer = RMSProp
LR = .001 with exponential decay of 0.995 after each epoch
Data Augmentation = Random Crops, Vertical Flips
ValidationSet with early stopping based on IoU or MeanAccuracy with patience of 100 (50 during finetuning)
WeightDecay = .0001
Finetune with full-size images, LR = .0001
Dropout = 0.2
BatchNorm "we use current batch stats at training, validation, and test time"