Möbius Transform for Mitigating Perspective Distortions in Representation Learning

Chhipa, Prakash Chandra; Chippa, Meenakshi Subhash; De, Kanjar; Saini, Rajkumar; Liwicki, Marcus; Shah, Mubarak

doi:10.1007/978-3-031-73464-9_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15131))

Included in the following conference series:

European Conference on Computer Vision

240 Accesses
1 Citations

Abstract

Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Notably, our method shows improved performance on three PD-affected real-world applications—crowd counting, fisheye image recognition, and person re-identification—and one PD-affected challenging CV task: object detection. The source code, dataset, and models are available on the project webpage at https://prakashchhipa.github.io/projects/mpd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion

An improved deep multiscale crowd counting network with perspective awareness

Article 01 June 2021

Crowd counting via learning perspective for multi-scale multi-view Web images

Article 30 June 2018

References

Arnold, D.N., Rogness, J.P.: Möbius transformations revealed. Not. Am. Math. Soc. 55(10), 1226–1231 (2008)
Google Scholar
Ayala-Acevedo, A., Devgun, A., Zahir, S., Askary, S.: Vehicle re-identification: Pushing the limits of re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, vol. 2 (2019)
Google Scholar
Azizi, N., Possegger, H., Rodolà, E., Bischof, H.: 3D human pose estimation using möbius graph convolutional networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCs, vol. 13661, pp. 160–178. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_10
Chapter Google Scholar
Bai, S., He, Z., Qiao, Y., Hu, H., Wu, W., Yan, J.: Adaptive dilated network with self-correction supervision for counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4594–4603 (2020)
Google Scholar
Cao, Z., Ai, H., Cao, Y.P., Shan, Y., Qie, X., Wang, L.: OmniZoomer: learning to move and zoom in on sphere at high-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12897–12907 (2023)
Google Scholar
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of the International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Cheng, Z.Q., Li, J.X., Dai, Q., Wu, X., Hauptmann, A.G.: Learning spatial awareness to improve crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6152–6161 (2019)
Google Scholar
Cho, H., Cho, Y., Yu, J., Kim, J.: Camera distortion-aware 3D human pose estimation in video with optimization-based meta-learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11169–11178 (2021)
Google Scholar
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: practical automated data augmentation with a reduced search space. In: CVPR workshops, pp. 702–703 (2020)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations (2020)
Google Scholar
Fu, J., Bajić, I.V., Vaughan, R.G.: Datasets for face and object detection in fisheye images. Data Brief 27, 104752 (2019)
Article Google Scholar
Habel, K., Deuser, F., Oswald, N.: Clip-reident: contrastive training for player re-identification. In: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports, pp. 129–135 (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: Augmix: a simple data processing method to improve robustness and uncertainty. In: International Conference on Learning Representations
Google Scholar
Hendrycks, D., et al.: PixMix: dreamlike pictures comprehensively improve safety measures. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16783–16792 (2022)
Google Scholar
Howie, J.M.: Complex Analysis. Springer, Cham (2003)
Book Google Scholar
Hu, Y., et al.: NAS-Count: counting-by-density with neural architecture search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 747–766. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_45
Chapter Google Scholar
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2547–2554 (2013)
Google Scholar
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision, pp. 532–546 (2018)
Google Scholar
Idrissi, B.Y., et al.: ImageNet-x: understanding model mistakes with factor of variation annotations. In: Proceedings of the International Conference on Learning Representations (2023)
Google Scholar
Jiang, X., et al.: Attention scaling for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4706–4715 (2020)
Google Scholar
Jin, L., et al.: Perspective fields for single image camera calibration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17307–17316 (2023)
Google Scholar
Kocabas, M., Huang, C.H.P., Tesch, J., Müller, L., Hilliges, O., Black, M.J.: Spec: seeing people in the wild with an estimated camera. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11035–11045 (2021)
Google Scholar
Kumar, V.R., Eising, C., Witt, C., Yogamani, S.: Surround-view fisheye camera perception for automated driving: overview, survey & challenges. IEEE Trans. Intell. Transp. Syst. (2023)
Google Scholar
Li, X., Chen, Y., Zhu, Y., Wang, S., Zhang, R., Xue, H.: ImageNet-e: benchmarking neural network robustness via attribute editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20371–20381 (2023)
Google Scholar
Li, X., Zhang, B., Sander, P.V., Liao, J.: Blind geometric distortion correction on images through deep learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4855–4864 (2019)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014 Part V. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5099–5108 (2019)
Google Scholar
Liu, X., Yang, J., Ding, W., Wang, T., Wang, Z., Xiong, J.: Adaptive mixture regression network with local counting map for crowd counting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 241–257. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_15
Chapter Google Scholar
Ma, Z., Hong, X., Wei, X., Qiu, Y., Gong, Y.: Towards a universal model for cross-dataset crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3205–3214 (2021)
Google Scholar
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6142–6151 (2019)
Google Scholar
Miao, Y., Lin, Z., Ding, G., Han, J.: Shallow feature based dense attention network for crowd counting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11765–11772 (2020)
Google Scholar
Mitchel, T.W., Aigerman, N., Kim, V.G., Kazhdan, M.: Möbius convolutions for spherical CNNs. In: Proceedings of the ACM SIGGRAPH Conference, pp. 1–9 (2022)
Google Scholar
Olsen, J.: The Geometry of Möbius Transformations. University of Rochester, Rochester (2010)
Google Scholar
Papakipos, Z., Bitton, J.: Augly: data augmentations for adversarial robustness. In: CVPR, pp. 156–163 (2022)
Google Scholar
Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: Proceedings of the IEEE/CVF international Conference on Computer Vision, pp. 569–576 (2013)
Google Scholar
PyTorch: Vision: Datasets, transforms and models specific to computer vision (2023), https://github.com/pytorch/vision/tree/main/references/classification, original repository for PyTorch Vision Accessed 1 Aug 2023
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Rahman, T., Krouglicof, N.: An efficient camera calibration technique offering robustness and accuracy over a wide range of lens distortion. IEEE Trans. Image Process. 21(2), 626–637 (2011)
Article MathSciNet Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Shu, W., Wan, J., Tan, K.C., Kwong, S., Chan, A.B.: Crowd counting in the frequency domain. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19618–19627 (2022)
Google Scholar
Song, Q., et al.: Rethinking counting and localization in crowds: a purely point-based framework. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3365–3374 (2021)
Google Scholar
Tan, J., Zhao, S., Xiong, P., Liu, J., Fan, H., Liu, S.: Practical wide-angle portraits correction with deep structured models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3498–3506 (2021)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9626–9635 (2019). https://doi.org/10.1109/ICCV.2019.00972
Van Zandycke, G., Somers, V., Istasse, M., Don, C.D., Zambrano, D.: Deepsportradar-v1: computer vision dataset for sports understanding with high quality annotations. In: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports, pp. 1–8 (2022)
Google Scholar
Wang, B., Liu, H., Samaras, D., Nguyen, M.H.: Distribution matching for crowd counting. Proc. Adv. Neural Inf. Process. Syst. 33, 1595–1607 (2020)
Google Scholar
Wang, M., Cai, H., Dai, Y., Gong, M.: Dynamic mixture of counter network for location-agnostic crowd counting. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 167–177 (2023)
Google Scholar
Wang, W., et al.: Zolly: zoom focal length correctly for perspective-distorted human mesh reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3925–3935 (2023)
Google Scholar
Wang, Y., et al.: Pillar-based object detection for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 18–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_2
Chapter Google Scholar
Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., Shen, C.: From open set to closed set: counting objects by spatial divide-and-conquer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8362–8371 (2019)
Google Scholar
Yang, S., Lin, C., Liao, K., Zhao, Y.: Innovating real fisheye image correction with dual diffusion architecture. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12699–12708 (2023)
Google Scholar
Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N.: Weakly-supervised crowd counting learns from sorting rather than locations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 1–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_1
Chapter Google Scholar
Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: FisheyerecNet: a multi-context collaborative deep network for fisheye image rectification. In: Proceedings of the European Conference on Computer Vision, pp. 469–484 (2018)
Google Scholar
Yu, F., Salzmann, M., Fua, P., Rhodin, H.: PCLS: geometry-aware neural reconstruction of 3D pose with perspective crop layers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9064–9073 (2021)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)
Google Scholar
Zhang, J., et al.: A perspective transformation method based on computer vision. In: ICAICA, pp. 765–768. IEEE (2020)
Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)
Google Scholar
Zhang, Y., Song, J., Ding, Y., Yuan, Y., Wei, H.L.: FSD-BRIEF: a distorted BRIEF descriptor for fisheye image based on spherical perspective model. Sensors (Basel) (2021)
Google Scholar
Zhao, Yet al.: Learning perspective undistortion of portraits. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7849–7859 (2019)
Google Scholar
Zhou, S., Zhang, J., Jiang, H., Lundh, T., Ng, A.Y.: Data augmentation with mobius transformations. Mach. Learn. Sci. Technol. 2(2), 025016 (2021)
Article Google Scholar

Download references

Acknowledgment

The authors thank Sumit Rakesh, Luleå University of Technology, for his support with the Lotty Bruzelius cluster. We also thank the National Supercomputer Centre at Linköping University for the Berzelius supercomputing, supported by the Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

Luleå Tekniska Universitet, Luleå, Sweden
Prakash Chandra Chhipa, Meenakshi Subhash Chippa, Rajkumar Saini & Marcus Liwicki
Fraunhofer Heinrich-Hertz-Institut, Berlin, Germany
Kanjar De
University of Central Florida, Orlando, USA
Mubarak Shah

Authors

Prakash Chandra Chhipa
View author publications
You can also search for this author in PubMed Google Scholar
Meenakshi Subhash Chippa
View author publications
You can also search for this author in PubMed Google Scholar
Kanjar De
View author publications
You can also search for this author in PubMed Google Scholar
Rajkumar Saini
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Liwicki
View author publications
You can also search for this author in PubMed Google Scholar
Mubarak Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prakash Chandra Chhipa .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15150 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chhipa, P.C., Chippa, M.S., De, K., Saini, R., Liwicki, M., Shah, M. (2025). Möbius Transform for Mitigating Perspective Distortions in Representation Learning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15131. Springer, Cham. https://doi.org/10.1007/978-3-031-73464-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-73464-9_21
Published: 04 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73463-2
Online ISBN: 978-3-031-73464-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Möbius Transform for Mitigating Perspective Distortions in Representation Learning