Abstract
From human crowds to cells in tissue, the detection and
efficient tracking of multiple objects in dense configurations
is an important and unsolved problem. In the past, limitations of image analysis have restricted studies of dense
groups to tracking a single or subset of marked individuals, or to coarse-grained group-level dynamics, all of which
yield incomplete information. Here, we combine convolutional neural networks (CNNs) with the model environment
of a honeybee hive to automatically recognize all individuals in a dense group from raw image data. We create new,
adapted individual labeling and use the segmentation architecture U-Net with a loss function dependent on both object identity and orientation. We additionally exploit temporal regularities of the video recording in a recurrent manner and achieve near human-level performance while reducing the network size by 94% compared to the original
U-Net architecture. Given our novel application of CNNs,
we generate extensive problem-specific image data in which
labeled examples are produced through a custom interface
with Amazon Mechanical Turk. This dataset contains over
375,000 labeled bee instances across 720 video frames at
2 FPS, representing an extensive resource for the development and testing of tracking methods. We correctly detect
96% of individuals with a location error of ? 7% of a typical body dimension, and orientation error of 12?
, approximating the variability of human raters. Our results provide an important step towards efficient image-based dense
object tracking by allowing for the accurate determination
of object location and orientation across time-series image
data efficiently within one network architecture