Skip to main content

Multi-stream Fusion Model for Social Relation Recognition from Videos

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Included in the following conference series:

Abstract

Social relations are ubiquitous in people’s daily life. Especially, the widespread of video in social media and intelligent surveillance gives us a new chance to discover the social relations among people. Previous researches mostly focus on the recognition of social relations from texts, blogs, or images. However, these methods are only concentrated on limited social relations and incapable of dealing with video data. In this paper, we address the challenges of social relation recognition by employing a multi-stream model to exploit the abundant multimodal information in videos. First of all, we build a video dataset with 16 categories of social relations annotation according to psychology and sociology studies, named Social Relation In Videos (SRIV), which comprises of 3,124 videos. According to our knowledge, it is the first video dataset for the social relation recognition. Secondly, we propose a multi-stream deep learning model as a benchmark for recognizing social relations, which learns high level semantic information of spatial, temporal, and audio of people’s social interactions in videos. Finally, we fuse them with logical regression to achieve accurate recognition. Experimental results show that the multi-stream deep model is effective for social relation recognition on the proposed dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Luan, M.N.: Context-aware text representation for social relation aided sentiment analysis. In: WWW, pp. 85–86 (2016)

    Google Scholar 

  2. Xiang, L., Sang, J., Xu, C.: Demographic attribute inference from social multimedia behaviors: a cross-OSN approach. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 515–526. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_42

    Chapter  Google Scholar 

  3. Dai, Q., Carr, P., Sigal, L., Hoiem, D.: Family member identification from photo collections. In: Applications of Computer Vision, pp. 982–989 (2015)

    Google Scholar 

  4. Liu, W., Mei, T., Zhang, Y., Che, C., Luo, J.: Multi-task deep visual-semantic embedding for video thumbnail selection. In: CVPR, pp. 3707–3715 (2015)

    Google Scholar 

  5. Sun, Q., Schiele, B., Fritz, M.: A domain based approach to social relation recognition. In: CVPR, pp. 435–444 (2017)

    Google Scholar 

  6. Zhang, Z., Luo, P., Loy, C.-C., Tang, X.: Learning social relation traits from face images. In: ICCV, pp. 3631–3639 (2015)

    Google Scholar 

  7. Kiesler, D.J.: The 1982 interpersonal circle: a taxonomy for complementarity in human transactions. Psychol. Rev. 90(3), 185 (1983)

    Article  Google Scholar 

  8. Ho, D.Y.: Interpersonal relationships and relationship dominance: An analysis based on methodological relationism. Asian J. Soc. Psychol. 1(1), 1–16 (1998)

    Article  MathSciNet  Google Scholar 

  9. Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  10. Tanisik, G., Zalluhoglu, C., Ikizler-Cinbis, N.: Facial descriptors for human interaction recognition in still images. Pattern Recogn. Lett. 73, 44–51 (2016)

    Article  Google Scholar 

  11. Zurrida, S., Mazzarol, G., Galimberti, V., Renne, G., Bassi, F., Iafrate, F., Viale, G.: Automatic recognition of emergent social roles in small group interactions. IEEE Trans. Multimed. 17(5), 746–760 (2015)

    Article  Google Scholar 

  12. Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: CVPR, pp. 3043–3053 (2016)

    Google Scholar 

  13. Tran, Q.D., Jung, J.E.: Cocharnet: extracting social networks using character co-occurrence in movies. J. Univers. Comput. Sci. 21(6), 796–815 (2015)

    Google Scholar 

  14. Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: ICCV, pp. 2280–2287 (2013)

    Google Scholar 

  15. Petscharnig, S., Schöffmann, K.: Deep learning for shot classification in gynecologic surgery videos. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 702–713. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_57

    Chapter  Google Scholar 

  16. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)

    Google Scholar 

  17. Wu, Z., Jiang, Y.-G., Wang, X., Ye, H., Xue, X.: Multi-stream multi-class fusion of deep networks for video classification. In: MM, pp. 791–800 (2016)

    Google Scholar 

  18. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)

    Google Scholar 

  19. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)

    Google Scholar 

Download references

Acknowledgment

This research is supported in part by the National High-tech R&D Program (No. 2015AA050204), the Special Found for Beijing Common Construction Project, the National Natural Science Foundation of China (No. 61602049), and the Fundamental Research Funds for the Central Universities (No. 2016RCGD32).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lv, J., Liu, W., Zhou, L., Wu, B., Ma, H. (2018). Multi-stream Fusion Model for Social Relation Recognition from Videos. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73603-7_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73602-0

  • Online ISBN: 978-3-319-73603-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics