E-3DGS: Event-based Novel View Rendering of Large-scale Scenes Using 3D Gaussian Splatting

Zahid, Sohaib; Rudnev, Viktor; Ilg, Eddy; Golyanik, Vladislav

E-3DGS: Event-based Novel View Rendering of Large-scale Scenes Using 3D Gaussian Splatting

Sohaib Zahid^1,2 Viktor Rudnev^1,2 Eddy Ilg² Vladislav Golyanik¹
¹Max Planck Institute for Informatics, Saarland Informatics Campus ²Saarland University
3DV 2025
paper video code dataset poster

Abstract: Existing novel view synthesis techniques predominantly utilize RGB cameras, inheriting their limitations such as the need for sufficient lighting, susceptibility to motion blur, and restricted dynamic range. In contrast, event cameras, which are impervious to these limitations, have seen limited exploration in this domain, particularly in large-scale settings. Current methodologies primarily focus on front-facing or object-oriented (360-degree view) scenarios. For the first time, we introduce 3D Gaussians for event-based novel view synthesis. Our method allows to reconstruct high-quality large and unbounded scenes. We contribute the first real and synthetic event datasets tailored for this setting. Our method demonstrates superior novel view synthesis and consistently outperforms the baseline EventNeRF by a margin of 11-25% in PSNR (dB) while being orders of magnitude faster in reconstruction and rendering.

Overview

Download video (MP4, 150MB)

Overview of the proposed E-3DGS method. We use 3D Gaussians [1] as the scene representation and assume that initial noisy camera poses are available. We randomly initialize the scene with our frustum-based initialization and then optimize the Gaussians and the camera poses jointly. To obtain high-quality reconstruction of both low-frequency structure and high-frequency detail, we propose a strategy using a large event window from t_s₁ to t and a small one from t_s₂ to t. We then define the loss L_recon between renderings from our model at the current time t (indicated green) and previous times t_s₁ (indicated orange) and t_s₂ (indicated red), and the accumulated incoming events E(t_s₁, t) and E(t_s₂, t). We regularize the 3D Gaussians with the loss L_iso.

[1] Kerbl et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering", ACM TOG 2023

In 3D Gaussian Splatting (3DGS), Gaussians are unconstrained in the direction perpendicular to the image plane. This lack of constraint can result in elongated and overfitted Gaussians. While they may appear correct from the training views, they introduce significant artifacts when rendered from novel views, manifesting as floaters and distortions of object surfaces. Additionally, the lack of multi-view consistency and the tendency to overfit destabilize pose refinement. To mitigate these issues, we draw inspiration from Gaussian Splatting SLAM [1] and SplaTAM [2], applying isotropic regularization:

L_iso = (1 / |G|) * Σ_g∈G || S_g - S̄_g ||₁

where G is the set of Gaussians visible in the image. This equation imposes a soft constraint on the Gaussians to be as isotropic as possible. We find that it helps improve pose refinement, minimizes floaters, and enhances generalizability.

[1] Matsuki et al. "Gaussian Splatting SLAM", CVPR 2024
[2] Keetha et al. "SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM", CVPR 2024

In the original 3DGS, the Gaussians are initialized using a point cloud obtained from applying SfM on the input images. The authors also experimented with initializing the Gaussians at random locations within a cube. While this worked for them with a slight performance drop, it requires an assumption about the extent of the scene. As running SfM directly on event streams is impossible, we use the randomly initialized Gaussians and extend this approach to unbounded scenes. To this end, we initialize a specified number of Gaussians in the frustum of each camera. This gives two benefits:

All the initialized Gaussians are within the observable area.
We only need one loose assumption about the scene, which is the maximum depth z_far.

Rudnev et al. [1] demonstrated that using a fixed event window duration leads to suboptimal reconstruction, as larger windows are essential for capturing low-frequency structure and smaller ones for fine high-frequency details. However, their approach of randomly sampling window durations overlooks camera speed and event rate, potentially leading to overly dense or sparse event windows. To address this, we sample the number of events directly, selecting a target count from a range [N_min, N_max] for each time step. For stability, we propose using two event windows with empirically chosen ranges: [N_max/10, N_max] for broader context and [N_max/300, N_max/30] for finer details.

[1] Rudnev et al. "EventNeRF: Neural Radiance Fields from a Single Colour Event Camera", CVPR 2023

E-3DGS-Real

Event Stream

Camera Rig

Scene from External Cam 1

Scene from External Cam 2

E-3DGS-Synthetic

Company

Science Lab

Subway

Event Stream
for Training

Test Views

Deblur-GS achieves decent results in certain areas, like the chair, but suffers from overly smooth reconstructions in other parts. E2VID+3DGS provides good sharpness but struggles with too many floaters due to inconsistent color frames from E2VID confusing 3DGS. EventNeRF lacks overall detail. In contrast, our method preserves details across most regions while having consistent colors.

Deblur-GS

E2VID + 3DGS

EventNeRF

Ours

While E2VID+3DGS captures the edges and the general structure well, it struggles with color representation, and EventNeRF reconstruction is much noisier and blurrier compared to our method. In contrast, our E-3DGS outperforms them, showing clear and sharp novel views with accurate color representation. Some issues are still observable but are mostly in less supervised areas, e.g., on the roof in ScienceLab or Subway scenes.

E2VID + 3DGS

EventNeRF

Ours

Ground Truth

Company

ScienceLab

Subway

Training data: Events and pose data at 50HZ frequency.
[1] Kerbl et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering", ACM TOG 2023
[2] Rebecq et al. "High Speed and High Dynamic Range Video with an Event Camera", PAMI 2019
[3] Rudnev et al. "EventNeRF: Neural Radiance Fields from a Single Colour Event Camera", CVPR 2023

Ours

Ground Truth

mocap-1d-trans

mocap-desk2

To evaluate the effects of individual contributions, we do extensive qualitative and quantitative ablation studies. We primarily train different variants of our method on the E3DGS-Real and E-3DGS-Synthetic-Hard datasets, focusing on the effects of four key components: L_iso, L_pose, Pose Refinement (PR), and the Adaptive Event Window (AW).

E-3DGS-Real

Ours

w/o AW

w/o AW & L_iso

w/o L_iso

w/o L_pose & L_iso

w/o PR

E-3DGS-Synthetic-Hard

Ours

w/o AW

w/o AW & L_iso

w/o L_iso

w/o L_pose & L_iso

w/o PR

Company

Ours

w/o AW

w/o AW & L_iso

w/o L_iso

w/o L_pose & L_iso

w/o PR

Science Lab

Ours

w/o AW

w/o AW & L_iso

w/o L_iso

w/o L_pose & L_iso

w/o PR

Subway

@article{zahid2025e3dgs,
  title={E-3DGS: Event-based Novel View Rendering of Large-scale Scenes Using 3D Gaussian Splatting},
  author={Zahid, Sohaib and Rudnev, Viktor and Ilg, Eddy and Golyanik, Vladislav},
  journal={3DV},
  year={2025}
}

For questions, clarifications, please get in touch with:
Sohaib Zahid
sohaib023@gmail.com
Vladislav Golyanik
golyanik@mpi-inf.mpg.de