Abstract
Social relationships (SRs) are the basis of human life. Hence, the ability to accurately recognize interpersonal relations in public spaces based on visual observations helps policymakers improve mental health programs and address social challenges. The key to image-based computer-vision research on SR recognition (SRR) is a deep-learning mechanism that can predict SRs based on the contents of visual scenery images. Current methods explore logical constraints using relatively simple scenes with small groups of people. However, this is insufficient when desiring to form relation graphs of multiple groups simultaneously from complex scenes. Generally, complex scenes contain a principal relationship that applies to the largest proportion of people, and secondary relationships apply to smaller proportions. To effectively explore relational situations in complex scenes, we propose a new distributed reasoning strategy that accounts for principal and secondary SRs. First, our novel model enhances principal relation component reasoning, and a new contrastive learning algorithm supplements the principal relationship with secondary types. A shifted-window transformer is applied to extract interactive human relation features and local-global features to support more accurate and comprehensive relation prediction. Extensive experiments demonstrate that each part of the proposed model improves the accuracy of SRR and that the whole model outperforms state-of-the-art methods on public datasets.
Graphical abstract
Similar content being viewed by others
Data availability and access
The People in Social Context (PISC) dataset is available on the website https://zenodo.org/record/1059155 and the People in Photo Album (PIPA) dataset is available on the website https://www.mpi-inf.mpg.de/social-relation.
References
Zhou C, Han M, Liang Q, Hu Y-F, Kuai S-G (2019) A social interaction field model accurately identifies static and dynamic social groupings. Nat Hum Behav 3(8):847–855
Guo X, Yang Y, Cheng Z, Wu Q, Li C, Lo T, Chen F (2022) Spatial social interaction: an explanatory framework of urban space vitality and its preliminary verification. Cities 121:103487
Niu T, Qing L, Han L, Long Y, Hou J, Li L, Tang W, Teng Q (2022) Small public space vitality analysis and evaluation based on human trajectory modeling using video data. Build Environ 225:109563
Li L, Qing L, Guo L, Peng Y (2023) Relationship existence recognition based social group detection in urban public spaces. Neurocomputing 516:92–105
Tang W, Ren J, Zhang Y (2018) Enabling trusted and privacy-preserving healthcare services in social media health networks. IEEE Trans Multimed 21(3):579–590
Shao Z, Cheng G, Ma J, Wang Z, Wang J, Li D (2021) Real-time and accurate uav pedestrian detection for social distancing monitoring in covid-19 pandemic. IEEE Trans Multimed 24:2069–2083
Teng Y, Song C, Wu B (2022) Toward jointly understanding social relationships and characters from videos. Appl Intell 52(5):5633–5645
Lu M, Dai Y, Zhang Z (2022) Social network alignment: a bi-layer graph attention neural networks based method. Appl Intell 1–24
Li C, Yang H, Sun J (2022) Intention-interaction graph based hierarchical reasoning networks for human trajectory prediction. IEEE Trans Multimed
Li J, Wong Y, Zhao Q, Kankanhalli MS (2017) Dual-glance model for deciphering social relationships. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2659
Zhang Z, Luo P, Loy C-C, Tang X (2015) Learning social relation traits from face images. In: Proceedings of the IEEE international conference on computer vision, pp 3631–3639
Wang Z, Chen T, Ren JSJ, Yu W, Cheng H, Lin L (2018) Deep reasoning with knowledge graph for social relationship understanding. In: International joint conference on artificial intelligence, pp 1021–1028
Zhang M, Liu X, Liu W, Zhou A, Ma H, Mei T (2019) Multi-granularity reasoning for social relation recognition from images. In: 2019 IEEE international conference on multimedia and expo (ICME), pp 1618–1623. IEEE
Goel A, Ma KT, Tan C (2019) An end-to-end network for generating social relationship graphs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11186–11195
Wang G, Gallagher A, Luo J, Forsyth D (2010) Seeing people in social context: recognizing people and social relationships. In: Computer vision–ECCV 2010: 11th European conference on computer vision, Heraklion, Crete, Greece, September 5–11, Proceedings, Part V 11, pp 169–182
Qing L, Li L, Wang Y, Cheng Y, Peng Y (2021) Srr-lgr: Local-global information-reasoned social relation recognition for human-oriented observation. Remote Sens 13(11):2038
Sun Q, Schiele B, Fritz M (2017) A domain based approach to social relation recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3481–3490
Li W, Duan Y, Lu J, Feng J, Zhou J (2020) Graph-based social relation reasoning. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, pp 18–34. Springer
Gao J, Qing L, Li L, Cheng Y, Peng Y (2021) Multi-scale features based interpersonal relation recognition using higher-order graph neural network. Neurocomputing 456:243–252
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using d windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Fang R, Tang KD, Snavely N, Chen T (2010) Towards computational models of kinship verification. In: 2010 IEEE international conference on image processing, pp 1577–1580
Dong G-N, Pun C-M, Zhang Z (2021) Kinship verification based on crossgeneration feature interaction learning. IEEE Trans Image Process 30:7391–7403
Wang M, Du X, Shu X, Wang X, Tang J (2020) Deep supervised feature selection for social relationship recognition. Pattern Recogn Lett 138:410–416
Li L, Qing L, Wang Y, Su J, Cheng Y, Peng Y (2022) Hf-srgr: a new hybrid feature–driven social relation graph reasoning model. Vis Comput 1–14
Huang L, Dai S, He Z (2022) Few–shot object detection with dense–global feature interaction and dual–contrastive learning. Appl Intell 1–18
Li L, Jin W, Huang Y (2022) Few-shot contrastive learning for image classification and its application to insulator identification. Appl Intell 52(6):6148–6163
Li H, Li Y, Zhang G, Liu R, Huang H, Zhu Q, Tao C (2022) Global and local contrastive self-supervised learning for semantic segmentation of hr remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 60:1–14
Yuan Z, Li G, Wang Z, Sun J, Cheng R (2022) Rl–csl: a combinatorial optimization method using reinforcement learning and contrastive self–supervised learning. IEEE transactions on emerging topics in computational intelligence
Li X, Shi D, Diao X, Xu H (2021) Scl-mlnet: boosting few-shot remote sensing scene classification via self-supervised contrastive learning. IEEE Trans Geosci Remote Sens 60:1–12
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Wang L, Li R, Wang D, Duan C, Wang T, Meng X (2021) Transformer meets convolution: a bilateral awareness network for semantic segmentation of very fine resolution urban scene images. Remote Sens 13(16):3065
Wu X, Tang B, Zhao M, Wang J, Guo Y (2023) Str transformer: a crossdomain transformer for scene text recognition. Appl Intell 53(3):3444–3458
Gao L, Liu H, Yang M, Chen L, Wan Y, Xiao Z, Qian Y (2021) Stransfuse: fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sens 14:10990–11003
Zhang C, Jiang W, Zhang Y, Wang W, Zhao Q, Wang C (2022) Transformer and cnn hybrid deep neural network for semantic segmentation of very-high-resolution remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–20
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 61871278.
Author information
Authors and Affiliations
Contributions
Wang Tang: Conceptualization, Methodology, Software, Writing - original draft. Linbo Qing: Supervision, Conceptualization, Writing - review and editing, Methodology. Lindong Li: Supervision, Writing - review. Li Guo: Writing - review and editing. Yonghong Peng: Writing - review and editing.
Corresponding author
Ethics declarations
Competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethics approval
All authors contributed to the conception and design of the study. All authors read and approved the final manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, W., Qing, L., Li, L. et al. Principal relation component reasoning-enhanced social relation recognition. Appl Intell 53, 28099–28113 (2023). https://doi.org/10.1007/s10489-023-05003-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05003-7