Download Video: HD (MP4, 66 MB)

Abstract

3D hand pose estimation from monocular videos is a long-standing and challenging problem, which is now seeing a strong upturn. In this work, we address it for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes. Our EventHands approach has characteristics previously not demonstrated with a single RGB or depth camera such as high temporal resolution at low data throughputs and real-time performance at 1000 Hz. Due to the different data modality of event cameras compared to classical cameras, existing methods cannot be directly applied to and re-trained for event streams. We thus design a new neural approach which accepts a new event stream representation suitable for learning, which is trained on newly-generated synthetic event streams and can generalise to real data. Experiments show that EventHands outperforms recent monocular methods using a colour (or depth) camera in terms of accuracy and its ability to capture hand motions of unprecedented speed. Our method, the event stream simulator and the dataset are publicly available.

Downloads


Citation

BibTeX, 1 KB

@inproceedings{rudnev2021eventhands, 
      title={EventHands: Real-Time Neural 3D Hand Pose Estimation from an Event Stream}, 
      author={Viktor Rudnev and Vladislav Golyanik and Jiayi Wang and Hans-Peter Seidel and Franziska Mueller and Mohamed Elgharib and Christian Theobalt}, 
      booktitle={International Conference on Computer Vision (ICCV)}, 
      year={2021} 
} 

Acknowledgments

This work was funded by the ERC Consolidator Grant 4DRepLy (770784). We thank Jalees Nehvi and Navami Kairanda for help with comparisons.

Contact

For questions, clarifications, please get in touch with:
Viktor Rudnev
vrudnev@mpi-inf.mpg.de
Vladislav Golyanik
golyanik@mpi-inf.mpg.de
Mohamed Elgharib
elgharib@mpi-inf.mpg.de

This page is Zotero translator friendly. Imprint. Data Protection.