- What is the problem?
Accurate vision-based people detection and tracking algorithms have been of interest for the past decades in applications like sport game analysis, video-surveillance (e.g. behavior analysis, automatic pedestrian counting). Isolated people, in an un-cluttered scene, are successfully detected with a single static or moving camera based on pattern recognition techniques. However, those algorithms fail to detect a group of people due to their mutual occlusions. For instance, in sport games such as basketball, players can strongly occlude each other and have abrupt changes of behavior. In order to handle the occlusion problem, several cameras should be fused to correctly detect and track all the people present in the scene.
- What is our solution?
A novel approach is proposed to locate a dense set of people with a network of heterogeneous cameras. We propose to recast the problem as a linear inverse problem. The proposed framework is generic to any scene, scalable in the number of cameras and versatile with respect to their geometry, e.g. planar or omnidirectional. It relies on deducing an occupancy vector, i.e. the discretized occupancy of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera.
This inverse problem is regularized by imposing a sparse occupancy vector, i.e. made of few non-zero elements, while a particular dictionary of silhouettes linearly maps these non-empty grid locations to the multiple silhouettes viewed by the cameras network. This constitutes a linearization of the problem, where non-linearities, such as occlusions, are treated as additional noise on the observed silhouettes. Mathematically, we express the final inverse problem either as Basis Pursuit DeNoise or Lasso convex optimization programs. The sparsity measure is reinforced by iteratively re-weighting the `1-norm of the occupancy vector for better approximating its `0 “norm”, and a new kind of “repulsive” sparsity is used to adapt further the Lasso procedure to the occupancy reconstruction.
Practically, an adaptive sampling process is proposed to reduce the computation cost and monitor a large occupancy area. Qualitative and quantitative results are presented on a basketball game.
- Why is our solution proposed?
The proposed algorithm successfully detects people occluding each other given severely degraded extracted features, while outperforming state-of-the-art people localization techniques.
Some examples given the PETS 2009 dataset.
Related publications:
M. Golbabaee, A. Alahi and P. Vandergheynst. SCOOP: A Real-Time Sparsity Driven People Localization Algorithm, submitted to Submitted to: Journal of Computer Vision and Image Understanding, 2012.
A. Alahi, Y. Boursier, L. Jacques, and P. Vandergheynst. Sparsity Driven People Localization with a Heterogeneous Network of Cameras, in *Journal of Mathematical Imaging and Vision*, 2011.
[ Details | Full Text ]
A. Alahi, Y. Boursier, L. Jacques and P. Vandergheynst, Sport Players Detection and Tracking With a Mixed Network of Planar and Omnidirectional Cameras, in *ACM/IEEE International Conference on Distributed Smart Cameras (Winner of the challenge Prize***)** , 2009.
[ detailed record ] [ bibtex ]
A. Alahi, Y. Boursier, L. Jacques and P. Vandergheynst, A Sparsity Constrained Inverse Problem to Locate People in a Network of Cameras, in* International Conference on Digital Signal Processing*, 2009.
[ detailed record ] [ bibtex ] |