Abstract
Attachment is a deep and enduring emotional bond that connects one person to another across time and space. Our early attachment styles are established in childhood through the interaction between infants and caregivers. There are two attachment types, secure and insecure. The attachment experience affects personality development, particularly a sense of security, and research shows that it influences the ability to form stable relationships throughout life. It is also an important aspect of assessing the quality of parenting. Therefore, attachment has been widely studied in psychology research. It’s usually acquired by Ainsworth’s Strange Situation Assessment (SSA) through tedious observation. As far as we know, there is no computational method to predict infant attachment type. We try to use the Still-Face Paradigm (SFP) video and audio as input to predict attachment types through machine learning methods. In the present work, we recruited 64 infant-mother participants, collected videos of SFP when babies are 5–8 months of age and identified their attachment types including secure and insecure by SSA when those infants are almost 2 years old. For the visual part, we extract motion features and apply a RNN network with LSTM units model for classification. For the audio part, speech enhancement is conducted as data pre-processing, pitch frequency, short-time energy and Mel Frequency Cepstral Coefficient feature sequences are extracted. Then SVM is deployed to explore the patterns in it. The experiments show that our method is able to discriminate between the 2 classes of subjects with a good accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bowlby, J.: Attachment theory and its therapeutic implications. Adolesc. Psychiatry 6, 5–33 (1978)
Firestone, L.: Disorganized Attachment
Braungart-rieker, J., Garwood, M., Powers, B., Wang, X.: Parental sensitivity, infant affect, and affect regulation: predictors of later attachment. Child Dev. 72, 252–270 (2001)
Braungart-rieker, J., Zentall, S., Lickenbrock, D., Ekas, N., Oshio, T., Planalp, E.: Attachment in the making: mother and father sensitivity and infants’ responses during the still-face paradigm. J. Exp. Child Psychol. 125, 63–84 (2014)
Tronick, E., Als, H., Adamson, L., Wise, S., Brazelton, T.: The infant’s response to entrapment between contradictory messages in face-to-face interaction. Pediatrics 62, 403–403 (1978)
Cohn, J.: Additional components of the still-face effect: commentary on Adamson and Frick. Infancy 4, 493–497 (2003)
Ainsworth, M., Blehar, M., Waters, E., Wall, S.: Patterns of Attachment: A Psychological Study of the Strange Situation. Psychology Press, London (2015)
Ainsworth, M., Blehar, M., Waters, E., Wall, S.: Patterns of Attachment: Assessed in the Strange Situation and at Home. Erlbau, Hillsdale (1978)
Zeng, Z., et al.: Audio-visual affect recognition. IEEE Trans. Multimedia 9, 424–428 (2007)
Qu, J., Leerkes, E.: Patterns of RSA and observed distress during the still-face paradigm predict later attachment, compliance and behavior problems: a person-centered approach. Dev. Psychobiol. 60, 707–721 (2018)
Hammal, Z., Cohn, J., Messinger, D.: Head movement dynamics during play and perturbed mother-infant interaction. IEEE Trans. Affect. Comput. 6, 361–370 (2015)
Egmose, I., et al.: Relations between automatically extracted motion features and the quality of mother-infant interactions at 4 and 13 months. Front. Psychol. 8, 2178 (2017)
Messinger, D., Mahoor, M., Chow, S., Cohn, J.: Automated measurement of facial expression in infant-mother interaction: a pilot study. Infancy 14, 285–305 (2009)
Cohen, D., et al.: Do parentese prosody and fathers’ involvement in interacting facilitate social interaction in infants who later develop autism? Plos One 8, e61402 (2013)
Weisman, O., et al.: Dynamics of non-verbal vocalizations and hormones during father-infant interaction. IEEE Trans. Affect. Comput. 7, 337–345 (2015)
Leclère, C., et al.: Interaction and behaviour imaging: a novel method to measure mother-infant interaction using video 3D reconstruction. Transl. Psychiatry 6, e816 (2016)
Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description
Zunino, A., et al.: Video gesture analysis for autism spectrum disorder detection
Elayadi, M., Kamel, M., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and datasets. Pattern Recogn. 44, 572–587 (2011)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28, 357–366 (1980)
Osmani, A., Hamidi, M., Chibani, A.: Machine learning approach for infant cry interpretation
Zhang, S., Zhang, S., Huang, T., Gao, W., Tian, Q.: Learning affective features with a hybrid deep model for audio-visual emotion recognition. IEEE Trans. Circuits Syst. Video Technol. 28, 3030–3043 (2017)
Anbarjafari, G., Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S.: Audio-visual emotion recognition in video clips
Sharma, S., Kiros, R., Salakhutdinov, R.: Action recognition using visual attention. Arxiv Preprint Arxiv:1511.04119 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Arxiv Preprint Arxiv:1409.1556 (2014)
Noroozi, F., Sapiński, T., Kamińska, D., Anbarjafari, G.: Vocal-based emotion recognition using random forests and decision tree. Int. J. Speech Technol. 20, 239–246 (2017)
Park, S., Lee, J.: A fully convolutional neural network for speech enhancement. Arxiv Preprint Arxiv:1609.07132 (2016)
Zeng, Z., Hu, Y., Fu, Y., Huang, T., Roisman, G., Wen, Z.: Audio-visual emotion recognition in adult attachment interview
Kamińska, D., Sapiński, T., Anbarjafari, G.: Efficiency of chosen speech descriptors in relation to emotion recognition. EURASIP J. Audio Speech Music Process. 2017, 3 (2017)
Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 10, 60–75 (2017)
Haq, S., Jackson, P.: Multimodal Emotion Recognition. IGI Global, Hershey (2011)
Rabiner, L.: On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoust. Speech Sig. Process. 25, 24–33 (1977)
Chaudhry, R., Ran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance
Wang, H., Schmid, C.: Action recognition with improved trajectories
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 27 (2011)
Acknowledgment
This work was supported by National Key R&D Program of China (2017YFB1002503).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, H., Cui, J., Wang, L., Zha, H. (2020). Infant Attachment Prediction Using Vision and Audio Features in Mother-Infant Interaction. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-41299-9_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41298-2
Online ISBN: 978-3-030-41299-9
eBook Packages: Computer ScienceComputer Science (R0)