Abstract
Videos spread over the Internet contain a huge knowledge of human society. Diversified knowledge is demonstrated as the storyline of the video unfolds. Therefore, realization of automatically constructing social relation network from massive video data facilitates the deep semantics of mining big data, which includes face recognition and social relation recognition. For face recognition, previous studies are focus on high-level features of face and multiple body cues. However, these methods are mostly based on supervised learning and clustering need to specify clusters k, which cannot recognize characters when new video data is input and individual and its numbers are unknown. For social relation recognition, previous studies are concentrated on images and videos. However, these methods are only concentrated on social relations in same frame and incapable of extracting social relation of characters that are not present in the same frame. In this paper, a model named SRE-Net is proposed for building social relation network to address these challenges. First, MoCNR algorithm is introduced by clustering similar-appearing faces from different keyframes of video. As far as we know, it is the first algorithm to identify character nodes using unsupervised double-clustering methods. Second, we propose a scene based social relation recognition method to solve challenges that cannot recognize social relations of characters in different frames. Finally, comprehensive evaluations demonstrate that our model is effective for social relation network construction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gopalan, R.: Image clustering under domain shift. In: 2017 IEEE Third International Conference on Multimedia Big Data (BigMM), pp. 74–77. IEEE (2017)
Lv, J., Liu, W., Zhou, L., Wu, B., Ma, H.: Multi-stream fusion model for social relation recognition from videos. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 355–368. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_29
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Kumar, N., Rai, P., Pulla, C., Jawahar, C.V.: Video scene segmentation with a semantic similarity. In: IICAI (2014)
Amato, F., Moscato, V., Picariello, A., SperlÃ, G.: Recommendation in social media networks. In: 2017 IEEE Third International Conference on Multimedia Big Data (BigMM), pp. 213–216. IEEE (2017)
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2014)
Wang, M., Deng, W.: Deep visual domain adaptation: a survey (2018)
Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing (2017)
Schmidhuber, J.: Deep Learning in Neural Networks. Elsevier Science Ltd. (2015)
Li, S., Ma, H.: A siamese inception architecture network for person re-identification. Mach. Vis. Appl. 28(7), 725–736 (2017)
Zhang, N., Paluri, M., Taigman, Y., Fergus, R.: Beyond frontal faces: improving person recognition using multiple cues. In: Computer Vision and Pattern Recognition, pp. 4804–4813. IEEE (2015)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: IEEE International Conference on Image Processing, pp. 3645–3649. IEEE (2017)
Sun, Q., Schiele, B., Fritz, M.: A domain based approach to social relation recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 435–444. IEEE Computer Society (2017)
Zhang, Z., Luo, P., Chen, C.L., Tang, X.: From facial expression recognition to interpersonal relation prediction. Int. J. Comput. Vis. 126(5), 1–20 (2018)
Rohrbach, A., Rohrbach, M., Tang, S., Oh, S.J., Schiele, B.: Generating descriptions with grounded and co-referenced people (2017)
Minoi, J.L., Jupit, A.J.R., Gillies, D.F., Arnab, S.: Facial expressions reconstruction of 3D faces based on real human data. In: IEEE International Conference on Computational Intelligence and Cybernetics, pp. 185–189. IEEE (2012)
Oh, S.J., Benenson, R., Fritz, M., Schiele, B.: Person recognition in personal photo collections. In: IEEE International Conference on Computer Vision, pp. 3862–3870. IEEE Computer Society (2015)
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898. IEEE Computer Society (2014)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-Level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708. IEEE Computer Society (2014)
Nan, C.J., Kim, K.M., Zhang, B.T.: Social network analysis of TV drama characters via deep concept hierarchies, pp. 831–836 (2015)
Barr, J.R., Cament, L.A., Bowyer, K.W., Flynn, P.J.: Active clustering with ensembles for social structure extraction. In: Applications of Computer Vision, pp. 969–976. IEEE (2014)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning social relation traits from face images. In: IEEE International Conference on Computer Vision, pp. 3631–3639. IEEE (2015)
Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.S.: Dual-glance model for deciphering social relationships. In: IEEE International Conference on Computer Vision, pp. 2669–2678. IEEE (2017)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreitet, J., Jones, L., Gomez, A.N., et al.: Attention is all you need (2017)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks, pp. 4489–4497 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanetet, P., Reed, S., Angueloval, D., et al.: Going deeper with convolutions, pp. 1–9 (2014)
Zhou, L., Lv, J., Wu, B.: Social network construction of the role relation in unstructured data based on multi-view. In: IEEE Second International Conference on Data Science in Cyberspace, pp. 382–388. IEEE Computer Society (2017)
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Acknowledgment
This research is supported by the National Social Science Foundation of China under Grant 16ZDA055. We are grateful to the anonymous reviewers for their careful reading and valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, L., Wu, B., Lv, J. (2018). SRE-Net Model for Automatic Social Relation Extraction from Video. In: Xu, Z., Gao, X., Miao, Q., Zhang, Y., Bu, J. (eds) Big Data. Big Data 2018. Communications in Computer and Information Science, vol 945. Springer, Singapore. https://doi.org/10.1007/978-981-13-2922-7_30
Download citation
DOI: https://doi.org/10.1007/978-981-13-2922-7_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2921-0
Online ISBN: 978-981-13-2922-7
eBook Packages: Computer ScienceComputer Science (R0)