Download Video: HD (MP4, 66 MB)


3D hand pose estimation from monocular videos is a long-standing and challenging problem, which is now seeing a strong upturn. In this work, we address it for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes. Our EventHands approach has characteristics previously not demonstrated with a single RGB or depth camera such as high temporal resolution at low data throughputs and real-time performance at 1000 Hz. Due to the different data modality of event cameras compared to classical cameras, existing methods cannot be directly applied to and re-trained for event streams. We thus design a new neural approach which accepts a new event stream representation suitable for learning, which is trained on newly-generated synthetic event streams and can generalise to real data. Experiments show that EventHands outperforms recent monocular methods using a colour (or depth) camera in terms of accuracy and its ability to capture hand motions of unprecedented speed. Our method, the event stream simulator and the dataset are publicly available.



BibTeX, 1 KB

      title={EventHands: Real-Time Neural 3D Hand Pose Estimation from an Event Stream}, 
      author={Viktor Rudnev and Vladislav Golyanik and Jiayi Wang and Hans-Peter Seidel and Franziska Mueller and Mohamed Elgharib and Christian Theobalt}, 
      booktitle={International Conference on Computer Vision (ICCV)}, 


This work was funded by the ERC Consolidator Grant 4DRepLy (770784). We thank Jalees Nehvi and Navami Kairanda for help with comparisons.


For questions, clarifications, please get in touch with:
Viktor Rudnev
Vladislav Golyanik
Mohamed Elgharib

This page is Zotero translator friendly. Imprint. Data Protection.