Video-based Motion Capture

TNT members involved in this project:

Motion Capture is the process of analyzing movements of objects or humans from video data. Potential application fields are animation for 3D-movie production, sports science and medical applications. Instead of using artificial markers attached to the body and expensive lab equipment we are interested in tracking humans from video streams without special preparation of the subject. This is even more challenging in the context of outdoor scenes, clothed people and people interaction.

The main goal is to reconstruct the three-dimensional pose of a person from image data only. It can be split in multiple subtasks, e.g. people detection/tracking, 3d reconstruction, human model building, and animation. Our research in this field focuses on either one of this subtasks or their combination.

Reconstructing 3D human motion from a single camera is inherently ill-posed due to depth ambiguities and occluded body parts. Rather than ignoring these ambiguities by estimating only a single solution, one of our recent research efforts focuses on modeling reconstruction uncertainties. Safety-critical tasks, such as autonomous driving, could benefit from access to multiple plausible solutions along with their associated uncertainties.

Selected Publications

RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation

Computer Vision and Pattern Recognition (CVPR)

Bastian Wandt, and Bodo Rosenhahn

Abstract: This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.

Links:

paper: pdf BibTeX

3D Reconstruction of Human Motion from Monocular Image Sequences

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Bastian Wandt, Hanno Ackermann, and Bodo Rosenhahn

Abstract: This article tackles the problem of estimating non-rigid human 3D shape and motion from image sequences taken by uncalibrated cameras. Similar to other state-of-the-art solutions we factorize 2D observations in camera parameters, base poses and mixing coefficients. Existing methods require sufficient camera motion during the sequence to achieve a correct 3D reconstruction. To obtain convincing 3D reconstructions from arbitrary camera motion, our method is based on a-priorly trained base poses. We show that strong periodic assumptions on the coefficients can be used to define an efficient and accurate algorithm for estimating periodic motion such as walking patterns. For the extension to non-periodic motion we propose a novel regularization term based on temporal bone length constancy. In contrast to other works, the proposed method does not use a predefined skeleton or anthropometric constraints and can handle arbitrary camera motion. We achieve convincing 3D reconstructions, even under the influence of noise and occlusions. Multiple experiments based on a 3D error metric demonstrate the stability of the proposed method. Compared to other state-of-the-art methods our algorithm shows a significant improvement.

Links:

paper: pdf BibTeX

Metric Regression Forests for Human Pose Estimation

British Machine Vision Conference (BMVC)

Gerard Pons-Moll, Jonathan Taylor, Jamie Shotton, Aaron Hertzmann, and Andrew Fitzgibbon

Abstract: We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective based on the classification of depth pixels to body parts. In contrast, we introduce Metric Space Information Gain (MSIG), a new decision forest training objective designed to directly optimize the entropy of distributions in a metric space. When applied to a model surface, viewed as a metric space defined by geodesic distances, MSIG aims to minimize image-to-model correspondence uncertainty. A naïve implementation of MSIG would scale quadratically with the number of training examples. As this is intractable for large datasets, we propose a method to compute MSIG in linear time. Our method is a principled generalization of the proxy classification objective, and does not require an extrinsic isometric embedding of the model surface in Euclidean space. Our experiments demonstrate that this leads to correspondences that are considerably more accurate than state of the art, using far fewer training images.

Links:

paper: pdf BibTeX

Publications

Show recent publications only

Conference Contributions
- Tom Wehrbein, Marco Rudolph, Bodo Rosenhahn, Bastian Wandt
  Utilizing Uncertainty in 2D Pose Detectors for Probabilistic 3D Human Mesh Recovery
  IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, February 2025
  (pdfGitHub) BibTeX
- Tom Wehrbein, Bodo Rosenhahn, Iain Matthews, Carsten Stoll
  Personalized 3D Human Pose and Shape Refinement
  International Conference on Computer Vision Workshops (ICCVW), IEEE, Paris, France, October 2023
  (pdf, pdfLink) BibTeX
- Tom Wehrbein, Marco Rudolph, Bodo Rosenhahn, Bastian Wandt
  Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
  International Conference on Computer Vision (ICCV), IEEE, October 2021
  (pdfarXiv.org, GitHub) BibTeX
- Bastian Wandt, Bodo Rosenhahn
  RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation
  Computer Vision and Pattern Recognition (CVPR), IEEE, June 2019
  (pdfGitHub, arXiv.org) BibTeX
- Bastian Wandt, Hanno Ackermann, Bodo Rosenhahn
  A Kinematic Chain Space for Monocular Motion Capture
  ECCV Workshops, September 2018
  (pdf) BibTeX
- Thiemo Alldieck, Marc Kassubeck, Bastian Wandt, Bodo Rosenhahn, Marcus Magnor
  Optical Flow-based 3D Human Motion Estimation from Monocular Video
  German Conference on Pattern Recognition (GCPR), September 2017
  (pdfDOI) BibTeX
- Petrissa Zell, Bastian Wandt, Bodo Rosenhahn
  Joint 3D Human Motion Capture and Physical Analysis from Monocular Videos
  The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017
  (pdfDOI) BibTeX
- Bastian Wandt, Hanno Ackermann, Bodo Rosenhahn
  3D Human Motion Capture from Monocular Image Sequences
  IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, June 2015
  (pdfDOI) BibTeX
- Gerard Pons-Moll⁺, Jonathan Taylor⁺, Jamie Shotton, Aaron Hertzmann, Andrew Fitzgibbon
  Metric Regression Forests for Human Pose Estimation
  British Machine Vision Conference ( BMVC ) (+ dennotes equal contribution)
  Best Science Paper Award, September 2013
  (pdf) BibTeX
- Gerard Pons-Moll, Andreas Baak, Juergen Gall, Laura Leal-Taixe, Meinard Mueller, Hans-Peter Seidel, Bodo Rosenhahn
  Outdoor Human Motion Capture using Inverse Kinematics and von Mises-Fisher Sampling
  IEEE International Conference on Computer Vision (ICCV), November 2011
  (pdf, pdf) BibTeX (paper page)
- Andreas Baak, Thomas Helten, Meinard Müller, Gerard Pons-Moll, Bodo Rosenhahn, Hans-Peter Seidel
  Analyzing and Evaluating Markerless Motion Tracking Using Inertial Sensors
  European Conference on Computer Vision (ECCV Workshops), September 2010
  (pdfDOI) BibTeX
- Gerard Pons-Moll, Andreas Baak, Thomas Helten, Meinard Müller, Hans-Peter Seidel, Bodo Rosenhahn
  Multisensor-Fusion for 3D Full-Body Human Motion Capture
  IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2010
  (pdfDOI) BibTeX (paper page)
- Nils Hasler, Thorsten Thormählen, Bodo Rosenhahn, Hans-Peter Seidel
  Learning Skeletons for Shape and Pose
  ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, Washington , February 2010
  (pdf) BibTeX
- Gerard Pons-Moll, Bodo Rosenhahn
  Ball Joints for Marker-less Human Motion Capture
  IEEE Workshop on Applications of Computer Vision (WACV), Snow Bird, Utah, USA, December 2009
  (pdf) BibTeX
- Nils Hasler, Bodo Rosenhahn, Thorsten Thormählen, Michael Wand, Hans-Peter Seidel
  Markerless Motion Capture with Unsynchronized Moving Cameras
  IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, USA, 2009
  (pdf) BibTeX
- Nils Hasler, Carsten Stoll, Bodo Rosenhahn, Thorsten Thormählen, H.-P. Seidel
  Estimating Body Shape of Dressed Humans
  Shape Modeling International, Beijing, 2009
  (pdf) BibTeX
- B. Rosenhahn, C. Schmaltz, T. Brox, J. Weickert, D. Cremers, H.-P. Seidel
  Markerless Motion Capture of Man-Machine Interaction
  IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, 2008
  (pdf) BibTeX
- J. Gall, B. Rosenhahn, H.-P. Seidel
  Drift-free Tracking of Rigid and Articulated Objects
  IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, 2008
  (pdf) BibTeX
- B. Rosenhahn, T. Brox, H.-P. Seidel
  Scaled Motion Dynamics for Markerless Motion Capture
  IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, USA., 2007
  (pdf) BibTeX
- T. Brox, B. Rosenhahn, D. Cremers, H.-P. Seidel
  Nonparametric Density Estimation with Adaptive Anisotropic Kemels for Human Motion Tracking
  2nd. Workshop on Human Motion, Springer-Verlag, Berlin Heidelberg, pp. 152-165, 2007, edited by Elgammal, A.; Rosenhahn, B. ; Klette, R.
  (pdf) BibTeX
- T. Brox, B. Rosenhahn, U. Kersting, D. Cremers
  Nonparametric Density Estimation for Human Tracking
  Pattern Recognition 2006, DAGM, Springer-Verlag, Berlin Heidelberg, pp. 546-555, Berlin, 2006, edited by Franke, K.; Mueller, R.;Nickolay, B.; Schaefer, R.
  (pdf) BibTeX
Journals
- Timo von Marcard, Gerard Pons-Moll, Bodo Rosenhahn
  Human Pose Estimation from Video and IMUs
  Transactions on Pattern Analysis and Machine Intelligence, IEEE, Vol. 38, No. 8, pp. 1533-1547, January 2016
  (pdfDOI) BibTeX
- Bastian Wandt, Hanno Ackermann, Bodo Rosenhahn
  3D Reconstruction of Human Motion from Monocular Image Sequences
  Transactions on Pattern Analysis and Machine Intelligence, IEEE, Vol. 38, No. 8, pp. 1505-1516, 2016
  (pdfDOI) BibTeX
- Nils Hasler, Carsten Stoll, Martin Sunkel, Bodo Rosenhahn, Seidel Hans-Peter
  A Statistical Model of Human Pose and Body Shape
  Computer Graphics Forum (Proc. Eurographics 2009), Munich, Germany, 2009
  (pdf) BibTeX
- B. Rosenhahn, U. Kersting, K. Powell, R. Klette, G. Klette, H.-P. Seidel
  A system for articulated tracking incorporating a clothing model
  Machine Vision and Applications, Springer Verlag, Berlin-Heidelberg, Vol. 18, No. 1, pp. 25-40, February 2007
  (pdf) BibTeX
Book Chapters
- B. Rosenhahn, Uwe G. Kersting, K. Powell, T. Brox, Hans-Peter Seidel
  Tracking Clothed People
  Human Motion - Understanding, Modelling, Capture and Animation, Springer Verlag, Dordrecht, The Netherlands, Vol. 36, pp. 295-317, 2007, edited by Rosenhahn B.; Klette R.; Metaxas D.
  (pdf) BibTeX