Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields

CVPR 2025

3D reconstruction of highly deformable surfaces (e.g. cloths) from monocular RGB videos is a challenging problem, and no solution provides a consistent and accurate recovery of fine-grained surface details. To account for the ill-posed nature of the setting, existing methods use deformation models with statistical, neural, or physical priors. They also predominantly rely on nonadaptive discrete surface representations (e.g. polygonal meshes), perform frame-by-frame optimisation leading to error propagation, and suffer from poor gradients of the mesh-based differentiable renderers. Consequently, fine surface details such as cloth wrinkles are often not recovered with the desired accuracy. In response to these limitations, we propose ThinShell-SfT, a new method for non-rigid 3D tracking that represents a surface as an implicit and continuous spatiotemporal neural field. We incorporate continuous thin shell physics prior based on the Kirchhoff-Love model for spatial regularisation, which starkly contrasts the discretised alternatives of earlier works. Lastly, we leverage 3D Gaussian splatting to differentiably render the surface into image space and optimise the deformations based on analysis-bysynthesis principles. Our Thin-Shell-SfT outperforms prior works qualitatively and quantitatively thanks to our continuous surface formulation in conjunction with a specially tailored simulation prior and surface-induced 3D Gaussians.

Full Video

Method

Our deformation model encodes the surface and its dynamics as neural fields. Given the template S₁, we first fit a reference field (NRF) from 2D parametric points ξ to the initial 3D positions x̅. In the main stage, we optimise the deformation field (NDF) u(ξ, t) by relating estimated surface states S_t/G_t to the input monocular views. We induce the dynamically tracked Gaussians to the surface by: (1) Computing their positions x as the sum of the initial position x̅ and NDF output u, (2) Setting their rotations a̅_j as the template’s local coordinate system, and (3) Fixing the normal scale ϵ, and optimising the colour, opacity and tangential scales (s₁, s₂) using only the template texture. For physical plausibility, we impose continuous Kirchhoff-Love physics constraints.

Citation

						
@inproceedings{kair2025thinshellsft, 
	title={Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields}, 
	author={Navami Kairanda and Marc Habermann and Shanthika Naik and Christian Theobalt and Vladislav Golyanik},
	booktitle = {Computer Vision and Pattern Recognition (CVPR)}, 
	year={2025} 
}

Contact

For questions, clarifications, please get in touch with:
Navami Kairanda
nkairand@mpi-inf.mpg.de
Vladislav Golyanik
golyanik@mpi-inf.mpg.de