Abstract
This work is motivated by the mostly unsolved task of
parsing biological images with multiple overlapping articulated model organisms (such as worms or larvae). We
present a general approach that separates the two main
challenges associated with such data, individual object
shape estimation and object groups disentangling. At the
core of the approach is a deep feed-forward singling-out
network (SON) that is trained to map each local patch to
a vectorial descriptor that is sensitive to the characteristics (e.g. shape) of a central object, while being invariant
to the variability of all other surrounding elements. Given
a SON, a local image patch can be matched to a gallery of
isolated elements using their SON-descriptors, thus producing a hypothesis about the shape of the central element in
that patch. The image-level optimization based on integer
programming can then pick a subset of the hypotheses to
explain (parse) the whole image and disentangle groups of
organisms