Recurrent Saliency Transformation Network:
Incorporating Multi-Stage Visual Cues for Small Organ Segmentation
Abstract
We aim at segmenting small organs (e.g., the pancreas)
from abdominal CT scans. As the target often occupies
a relatively small region in the input image, deep neural
networks can be easily confused by the complex and variable background. To alleviate this, researchers proposed a
coarse-to-fine approach [46], which used prediction from
the first (coarse) stage to indicate a smaller input region
for the second (fine) stage. Despite its effectiveness, this
algorithm dealt with two stages individually, which lacked
optimizing a global energy function, and limited its ability
to incorporate multi-stage visual cues. Missing contextual
information led to unsatisfying convergence in iterations,
and that the fine stage sometimes produced even lower
segmentation accuracy than the coarse stage.
This paper presents a Recurrent Saliency Transformation Network. The key innovation is a saliency transformation module, which repeatedly converts the segmentation probability map from the previous iteration as spatial
weights and applies these weights to the current iteration.
This brings us two-fold benefits. In training, it allows joint
optimization over the deep networks dealing with different
input scales. In testing, it propagates multi-stage visual
information throughout iterations to improve segmentation
accuracy. Experiments in the NIH pancreas segmentation
dataset demonstrate the state-of-the-art accuracy, which
outperforms the previous best by an average of over 2%.
Much higher accuracies are also reported on several small
organs in a larger dataset collected by ourselves. In addition, our approach enjoys better convergence properties,
making it more efficient and reliable in practice