Our method excels in predicting the 3D pose even in challenging scenarios, including fast motion and low light conditions, where traditional RGB-based methods often struggle.

Abstract

Monocular egocentric 3D human motion capture is a challenging and actively researched problem. Existing methods use synchronously operating visual sensors (e.g. RGB cameras) and often fail under low lighting and fast motions, which can be restricting in many applications involving head-mounted devices. In response to the existing limitations, this paper 1) introduces a new problem, i.e. 3D human motion capture from an egocentric monocular event camera with a fisheye lens, and 2) proposes the first approach to it called EventEgo3D (EE3D). Event streams have high temporal resolution and provide reliable cues for 3D human motion capture under high-speed human motions and rapidly changing illumination. The proposed EE3D framework is specifically tailored for learning with event streams in the LNES representation, enabling high 3D reconstruction accuracy. We also design a prototype of a mobile head-mounted device with an event camera and record a real dataset with event observations and the ground-truth 3D human poses (in addition to the synthetic dataset). Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions across various challenging experiments while supporting real-time 3D pose update rates of 140Hz.

Main Video

Datasets

Method

Method

Additional Results

BibTeX

@inproceedings{Millerdurai_EventEgo3D_2024, 
        title={EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams}, 
        author={Christen Millerdurai and Hiroyasu Akada and Jian Wang and Diogo Luvizon and Christian Theobalt and Vladislav Golyanik}, 
        booktitle = {Computer Vision and Pattern Recognition (CVPR)}, 
        year={2024} 
    }