Abstract
Reconstructing 3D scenes from unconstrained collections of in-the-wild photographs has consistently been a challenging problem. The main difficulty lies in different appearance conditions and transient occluders of uncontrolled image samples. With the advancement of Neural Radiance Fields (NeRF), previous works have developed some effective strategies to tackle this issue. However, limited by deep networks and volumetric rendering techniques, these methods generally require substantial time costs. Recently, the advent of 3D Gaussian Splatting (3DGS) has significantly accelerated the training and rendering speed of 3D reconstruction tasks. Nevertheless, vanilla 3DGS struggles to distinguish varying appearances of in-the-wild photo collections. To address the aforementioned problems, we propose Appearance-Aware 3D Gaussian Splatting (AAGS), a novel extension of 3DGS to unconstrained photo collections. Specifically, we employ an appearance extractor to capture global features for image samples, enabling the distinction of visual conditions, e.g., illumination and weather, across different observations. Furthermore, to mitigate the impact of transient occluders, we design a transient-removal module that adaptively learns a 2D visibility map to decompose the static target from complex real-world scenes. Extensive experiments are conducted to validate the effectiveness and superiority of our AAGS. Compared with previous works, our method not only achieves better reconstruction and rendering quality, but also significantly reduces both training and rendering overhead. Code will be released at https://github.com/Zhang-WenCong/AAGS.










Similar content being viewed by others
Data Availability
All datasets used in this study are publicly available. The data were obtained from open-access repositories, and detailed information, including links to the original datasets, is provided in the references section. There are no restrictions on data access, allowing for replication and further analysis.
References
Kaviani, H.R., Shirani, S.: An adaptive patch-based reconstruction scheme for view synthesis by disparity estimation using optical flow. IEEE TCSVT 28(7), 1540–1552 (2017)
Liu, B., Peng, B., Zhang, Z., Huang, Q., Ling, N., Lei, J.: Unsupervised single-view synthesis network via style guidance and prior distillation. IEEE TCSVT (2023)
Mildenhall, B., Srinivasan, P., Tancik, M., Barron, J., Ramamoorthi, R., Ng, R.: NeRF: Representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM TOG 42(4), 1–14 (2023)
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-NeRF: A multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: Unbounded anti-aliased neural radiance fields. In: CVPR, pp. 5470–5479 (2022)
Hu, W., Wang, Y., Ma, L., Yang, B., Gao, L., Liu, X., Ma, Y.: Tri-MipRF: Tri-mip representation for efficient anti-aliasing neural radiance fields. In: ICCV, pp. 19774–19783 (2023)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG 41(4), 1–15 (2022)
Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: NeRF in the wild: Neural radiance fields for unconstrained photo collections. In: CVPR, pp. 7210–7219 (2021)
Chen, X., Zhang, Q., Li, X., Chen, Y., Feng, Y., Wang, X., Wang, J.: Hallucinated neural radiance fields in the wild. In: CVPR, pp. 12943–12952 (2022)
Yang, Y., Zhang, S., Huang, Z., Zhang, Y., Tan, M.: Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections. In: ICCV (2023)
Dahmani, H., Bennehar, M., Piasco, N., Roldao, L., Tsishkou, D.: SWAG: Splatting in the wild images with appearance-conditioned gaussians. arXiv preprint arXiv:2403.10427 (2024)
Zhang, D., Wang, C., Wang, W., Li, P., Qin, M., Wang, H.: Gaussian in the wild: 3d gaussian splatting for unconstrained image collections. In: ECCV, pp. 341–359 (2025). Springer
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: Radiance fields without neural networks. In: CVPR, pp. 5501–5510 (2022)
Guo, Y.-C., Kang, D., Bao, L., He, Y., Zhang, S.-H.: NeRFReN: Neural radiance fields with reflections. In: CVPR, pp. 18409–18418 (2022)
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: Neural radiance fields for dynamic scenes. In: CVPR, pp. 10318–10327 (2021)
Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., Seitz, S.M., Martin-Brualla, R.: Nerfies: Deformable neural radiance fields. In: ICCV, pp. 5865–5874 (2021)
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: CVPR, pp. 6498–6508 (2021)
Wang, Z., Wu, S., Xie, W., Chen, M., Prisacariu, V.A.: NeRF\(--\): Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064 (2021)
Bian, W., Wang, Z., Li, K., Bian, J.-W., Prisacariu, V.A.: Nope-NeRF: Optimising neural radiance field with no pose prior. In: CVPR, pp. 4160–4169 (2023)
Chibane, J., Bansal, A., Lazova, V., Pons-Moll, G.: Stereo radiance fields (srf): Learning view synthesis from sparse views of novel scenes. In: CVPR (2021). IEEE
Irshad, M.Z., Zakharov, S., Liu, K., Guizilini, V., Kollar, T., Gaidon, A., Kira, Z., Ambrus, R.: Neo 360: Neural fields for sparse view synthesis of outdoor scenes. In: ICCV, pp. 9187–9198 (2023)
Guo, S., Wang, Q., Gao, Y., Xie, R., Li, L., Zhu, F., Song, L. IEEE TCSVT, 1–1 (2024) 10.1109/TCSVT.2024.3385360
Kim, I., Choi, M., Kim, H.J.: UP-NeRF: Unconstrained pose-prior-free neural radiance fields. In: NeurIPS (2023)
Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W., Liu, W., Tian, Q., Wang, X.: 4D gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713 (2023)
Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3D gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023)
Yu, Z., Chen, A., Huang, B., Sattler, T., Geiger, A.: Mip-Splatting: Alias-free 3D gaussian splatting. arXiv preprint arXiv:2311.16493 (2023)
Fan, Z., Wang, K., Wen, K., Zhu, Z., Xu, D., Wang, Z.: LightGaussian: Unbounded 3D gaussian compression with 15x reduction and 200+ fps. arXiv preprint arXiv:2311.17245 (2023)
Navaneet, K., Meibodi, K.P., Koohpayegani, S.A., Pirsiavash, H.: Compact3D: Compressing gaussian splat radiance field models with vector quantization. arXiv preprint arXiv:2311.18159 (2023)
Niedermayr, S., Stumpfegger, J., Westermann, R.: Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis (2023)
Liu, Y., Guan, H., Luo, C., Fan, L., Peng, J., Zhang, Z.: CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians (2024)
Kerbl, B., Meuleman, A., Kopanas, G., Wimmer, M., Lanvin, A., Drettakis, G.: A hierarchical 3d gaussian representation for real-time rendering of very large datasets. ACM TOG 43(4) (2024)
Yang, Z., Gao, X., Sun, Y., Huang, Y., Lyu, X., Zhou, W., Jiao, S., Qi, X., Jin, X.: Spec-gaussian: Anisotropic view-dependent appearance for 3d gaussian splatting. arXiv preprint arXiv:2402.15870 (2024)
Meng, J., Li, H., Wu, Y., Gao, Q., Yang, S., Zhang, J., Ma, S.: Mirror-3dgs: Incorporating mirror reflections into 3d gaussian splatting. arXiv preprint arXiv:2404.01168 (2024)
Fu, Y., Liu, S., Kulkarni, A., Kautz, J., Efros, A.A., Wang, X.: Colmap-free 3d gaussian splatting. arXiv preprint arXiv:2312.07504 (2023)
Zwicker, M., Pfister, H., Van Baar, J., Gross, M.: Ewa volume splatting. In: Proceedings Visualization, pp. 29–538 (2001). IEEE
Chen, G., Wang, W.: A survey on 3D gaussian splatting. arXiv preprint arXiv:2401.03890 (2024)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241 (2015). Springer
Jin, Y., Mishkin, D., Mishchuk, A., Matas, J., Fua, P., Yi, K.M., Trulls, E.: Image matching across wide baselines: From paper to practice. IJCV 129(2), 517–547 (2021)
Schonberger, J.L., Frahm, J.-M.: Structure-from-motion revisited. In: CVPR, pp. 4104–4113 (2016)
Rudnev, V., Elgharib, M., Smith, W., Liu, L., Golyanik, V., Theobalt, C.: Nerf for outdoor scene relighting. In: European Conference on Computer Vision, pp. 615–631 (2022). Springer
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp. 586–595 (2018)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: PyTorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)
Chandrasekar, A., Chakrabarty, G., Bardhan, J., Hebbalaguppe, R., AP, P.: Remove: A reference-free metric for object erasure. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7901–7910 (2024)
Author information
Authors and Affiliations
Contributions
Wencong Zhang contributed to the methodology, coding, experiments, and manuscript writing. Zhiyang Guo provided guidance on writing and reviewed the manuscript. Wengang Zhou and Houqiang Li supervised the research and provided overall guidance for the paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Additional information
Communicated by Bing-kun Bao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, W., Guo, Z., Zhou, W. et al. AAGS: Appearance-Aware 3D Gaussian Splatting with Unconstrained Photo Collections. Multimedia Systems 31, 173 (2025). https://doi.org/10.1007/s00530-025-01742-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-025-01742-4