SRE-Net Model for Automatic Social Relation Extraction from Video

Zhou, Lili; Wu, Bin; Lv, Jinna

doi:10.1007/978-981-13-2922-7_30

Lili Zhou^13,14,
Bin Wu^13,14 &
Jinna Lv^13,14

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 945))

Included in the following conference series:

CCF Conference on Big Data

1941 Accesses
3 Citations

Abstract

Videos spread over the Internet contain a huge knowledge of human society. Diversified knowledge is demonstrated as the storyline of the video unfolds. Therefore, realization of automatically constructing social relation network from massive video data facilitates the deep semantics of mining big data, which includes face recognition and social relation recognition. For face recognition, previous studies are focus on high-level features of face and multiple body cues. However, these methods are mostly based on supervised learning and clustering need to specify clusters k, which cannot recognize characters when new video data is input and individual and its numbers are unknown. For social relation recognition, previous studies are concentrated on images and videos. However, these methods are only concentrated on social relations in same frame and incapable of extracting social relation of characters that are not present in the same frame. In this paper, a model named SRE-Net is proposed for building social relation network to address these challenges. First, MoCNR algorithm is introduced by clustering similar-appearing faces from different keyframes of video. As far as we know, it is the first algorithm to identify character nodes using unsupervised double-clustering methods. Second, we propose a scene based social relation recognition method to solve challenges that cannot recognize social relations of characters in different frames. Finally, comprehensive evaluations demonstrate that our model is effective for social relation network construction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gopalan, R.: Image clustering under domain shift. In: 2017 IEEE Third International Conference on Multimedia Big Data (BigMM), pp. 74–77. IEEE (2017)
Google Scholar
Lv, J., Liu, W., Zhou, L., Wu, B., Ma, H.: Multi-stream fusion model for social relation recognition from videos. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 355–368. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_29
Chapter Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Kumar, N., Rai, P., Pulla, C., Jawahar, C.V.: Video scene segmentation with a semantic similarity. In: IICAI (2014)
Google Scholar
Amato, F., Moscato, V., Picariello, A., Sperlí, G.: Recommendation in social media networks. In: 2017 IEEE Third International Conference on Multimedia Big Data (BigMM), pp. 213–216. IEEE (2017)
Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2014)
Article Google Scholar
Wang, M., Deng, W.: Deep visual domain adaptation: a survey (2018)
Article Google Scholar
Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing (2017)
Google Scholar
Schmidhuber, J.: Deep Learning in Neural Networks. Elsevier Science Ltd. (2015)
Google Scholar
Li, S., Ma, H.: A siamese inception architecture network for person re-identification. Mach. Vis. Appl. 28(7), 725–736 (2017)
Article Google Scholar
Zhang, N., Paluri, M., Taigman, Y., Fergus, R.: Beyond frontal faces: improving person recognition using multiple cues. In: Computer Vision and Pattern Recognition, pp. 4804–4813. IEEE (2015)
Google Scholar
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: IEEE International Conference on Image Processing, pp. 3645–3649. IEEE (2017)
Google Scholar
Sun, Q., Schiele, B., Fritz, M.: A domain based approach to social relation recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 435–444. IEEE Computer Society (2017)
Google Scholar
Zhang, Z., Luo, P., Chen, C.L., Tang, X.: From facial expression recognition to interpersonal relation prediction. Int. J. Comput. Vis. 126(5), 1–20 (2018)
Article MathSciNet Google Scholar
Rohrbach, A., Rohrbach, M., Tang, S., Oh, S.J., Schiele, B.: Generating descriptions with grounded and co-referenced people (2017)
Google Scholar
Minoi, J.L., Jupit, A.J.R., Gillies, D.F., Arnab, S.: Facial expressions reconstruction of 3D faces based on real human data. In: IEEE International Conference on Computational Intelligence and Cybernetics, pp. 185–189. IEEE (2012)
Google Scholar
Oh, S.J., Benenson, R., Fritz, M., Schiele, B.: Person recognition in personal photo collections. In: IEEE International Conference on Computer Vision, pp. 3862–3870. IEEE Computer Society (2015)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898. IEEE Computer Society (2014)
Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-Level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708. IEEE Computer Society (2014)
Google Scholar
Nan, C.J., Kim, K.M., Zhang, B.T.: Social network analysis of TV drama characters via deep concept hierarchies, pp. 831–836 (2015)
Google Scholar
Barr, J.R., Cament, L.A., Bowyer, K.W., Flynn, P.J.: Active clustering with ensembles for social structure extraction. In: Applications of Computer Vision, pp. 969–976. IEEE (2014)
Google Scholar
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning social relation traits from face images. In: IEEE International Conference on Computer Vision, pp. 3631–3639. IEEE (2015)
Google Scholar
Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.S.: Dual-glance model for deciphering social relationships. In: IEEE International Conference on Computer Vision, pp. 2669–2678. IEEE (2017)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreitet, J., Jones, L., Gomez, A.N., et al.: Attention is all you need (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks, pp. 4489–4497 (2014)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanetet, P., Reed, S., Angueloval, D., et al.: Going deeper with convolutions, pp. 1–9 (2014)
Google Scholar
Zhou, L., Lv, J., Wu, B.: Social network construction of the role relation in unstructured data based on multi-view. In: IEEE Second International Conference on Data Science in Cyberspace, pp. 382–388. IEEE Computer Society (2017)
Google Scholar
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar

Download references

Acknowledgment

This research is supported by the National Social Science Foundation of China under Grant 16ZDA055. We are grateful to the anonymous reviewers for their careful reading and valuable suggestions.

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, 100876, China
Lili Zhou, Bin Wu & Jinna Lv
Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing, China
Lili Zhou, Bin Wu & Jinna Lv

Authors

Lili Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jinna Lv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lili Zhou .

Editor information

Editors and Affiliations

School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
Zongben Xu
Xidian University, Xi'an, China
Xinbo Gao
Xidian University, Xi'an, Shaanxi, China
Qiguang Miao
Chinese Academy of Sciences, Beijing, China
Yunquan Zhang
Zhejiang University, Hangzhou, China
Jiajun Bu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, L., Wu, B., Lv, J. (2018). SRE-Net Model for Automatic Social Relation Extraction from Video. In: Xu, Z., Gao, X., Miao, Q., Zhang, Y., Bu, J. (eds) Big Data. Big Data 2018. Communications in Computer and Information Science, vol 945. Springer, Singapore. https://doi.org/10.1007/978-981-13-2922-7_30

Download citation

DOI: https://doi.org/10.1007/978-981-13-2922-7_30
Published: 11 October 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2921-0
Online ISBN: 978-981-13-2922-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)