ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction

1National Taiwan University, 2NVIDIA Research Taiwan

Abstract

NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthesis (HRNVS) with low-resolution (LR) optimization often results in oversmoothing. On the other hand, single-image super-resolution (SR) aims to enhance LR images to HR counterparts but lacks multi-view consistency. To address these challenges, we propose Arbitrary-Scale Super-Resolution NeRF (ASSR-NeRF), a novel framework for super-resolution novel view synthesis (SRNVS). We propose an attention-based VoxelGridSR model to directly perform 3D super-resolution (SR) on the optimized volume. Our model is trained on diverse scenes to ensure generalizability. For unseen scenes trained with LR views, we then can directly apply our VoxelGridSR to further refine the volume and achieve multi-view consistent SR. We demonstrate quantitative and qualitatively that the proposed method achieves significant performance in SRNVS.

Methods

Radiance Field Super-Resolution

Interpolation end reference image.

Given a radiance field reconstructed from low-resolution (LR) training views, we perform radiance field super-resolution (SR), where the volumetric representation of the scene is enhanced utilizing SR priors. The enhanced volumetric representation can then render high-resolution (HR) novel views with cleaner details.

Overview of ASSR-NeRF

Interpolation end reference image.

For any distilled feature field reconstructed from LR training views (grey part), VoxelGridSR module (orange part) performs self-attention on the volumetric representation and generate refined appearance feature and density for every sampled point. The refined features and densities are then aggregated to pixels, leading to novel view with rich and clean details.

Distilled Feature Field

Interpolation end reference image.

We distill features from pre-trained feature extractor into volumetric representation so that VoxelGridSR module can utilize the queried SR priors and benefit self-attention. In a student-teacher setting, features extracted from training views are distilled into a 3D student network. The student network is trained by minimizing the difference between rendered features and features from pre-trained image feature extractor, in addition to rendered colors and ground-truth pixel colors. FeatureNet turn voxel feature into view-dependent distilled features, and a pre-trained decoder maps view-dependent features RGB color

Results

Qualitative Results

Qualitative results on Synthetic-NeRF and BlendedMVS dataset: All other baselines are first reconstructed from LR training views, then perform HRNVS. For ASSR-NeRF, pre-trained VoxelGridSR model is applied to achieve SRNVS. The results show that ASSR-NeRF generates cleaner edges as well as richer details than other baselines.

Interpolation end reference image.

Rich details in 4K

In this experiment, we compare other NeRF-based methods with ASSR-NeRF. While both ASSR-NeRF and other methods reconstruct a scene of training views in 1K resolution, ASSR-NeRF render novel views in 4K resolution with richer details.

Trained with views in 1K, ASSR-NeRF renders 4K novel veiws.

Left: SOTA method Right: ASSR-NeRF

Trained with views in 1K, ASSR-NeRF renders 4K novel veiws.

Left: SOTA method Right: ASSR-NeRF

Trained with views in 1K, ASSR-NeRF renders 4K novel veiws.

Left: SOTA method Right: ASSR-NeRF

BibTeX

@misc{huang2024assrnerfarbitraryscalesuperresolutionvoxel,
      title={ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction}, 
      author={Ding-Jiun Huang and Zi-Ting Chou and Yu-Chiang Frank Wang and Cheng Sun},
      year={2024},
      eprint={2406.20066},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2406.20066}, 
}