Abstract
People detection methods are highly sensitive to occlusions between pedestrians, which are extremely frequent
in many situations where cameras have to be mounted
at a limited height. The reduction of camera prices allows for the generalization of static multi-camera set-ups.
Using joint visual information from multiple synchronized
cameras gives the opportunity to improve detection performance.
In this paper, we present a new large-scale and highresolution dataset. It has been captured with seven static
cameras in a public open area, and unscripted dense groups
of pedestrians standing and walking. Together with the
camera frames, we provide an accurate joint (extrinsic and
intrinsic) calibration, as well as 7 series of 400 annotated
frames for detection at a rate of 2 frames per second. This
results in over 40 000 bounding boxes delimiting every person present in the area of interest, for a total of more than
300 individuals.
We provide a series of benchmark results using baseline
algorithms published over the recent months for multi-view
detection with deep neural networks, and trajectory estimation using a non-Markovian model