Video-based multiple people tracking has been a very active research area for decades now. Yet, (partial) occlusions, motion model assumptions, and image ambiguities makes it very hard to accurately track all persons.
For example, it is often assumed that humans have low velocities and nearly zero accelerations, when being filmed. However, these assumptions are often violated, e.g. in sports. Also, several effect make it difficult to correctly interpret image information. Varying lighting conditions, a changing view on a person from a camera over time, or people changing their cloth violate the assumption of appearance constancy. Also humans that a similarly dressed, e.g. people in workwear, make the appearance information misleading or ambiguous.
To tackle this issue, we propose a new interesting problem called Video Inertial Multiple People Tracking:
A scene is filmed from a video camera, and each person to be tracked has an IMU (inertial measurement unit) attached to its back. The task is to simultanesouly perform Multiple People Tracking, and to assign each trajectory to the corresponding IMU device.
Conceptual benefits of the VIMPT task are:
We provide the first VIMPT dataset named VIMPT2019, which contains video and IMU information, as well as ground-truth data.