skip to main content
10.1145/3664647.3689141acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognize

Published: 28 October 2024 Publication History

Abstract

Micro-Expression is subtle facial movements that reveal hidden emotions, but their fleeting and involuntary nature poses significant challenges for detection. This paper introduces a novel approach addressing two critical tasks in Micro-Expression analysis: spotting and Recognize. We integrate the VideoMAE V2 framework with a temporal information adapter and multi-scale feature fusion to enhance the performance of Micro-Expression Spotting-then-Recognize. Our method leverages the temporal information adapter to capture local temporal context within video frames, improving feature extraction efficiency. Additionally, we construct a multi-scale image pyramid to capture a range of motion features, from broad movements to subtle details. By combining these multi-scale features, our approach strengthens the model's capabilities in Micro-Expression Spotting-then-Recognize. Our method effectively addresses issues related to environmental variations, involuntary facial movements, and dataset imbalance, leading to improved accuracy in Micro-Expression Spotting-then-Recognize.

References

[1]
Jeffrey F Cohn, Zara Ambadar, and Paul Ekman. 2007. Observer-based measurement of facial expression with the Facial Action Coding System. The handbook of emotion elicitation and assessment 1, 3 (2007), 203--221.
[2]
Adrian Davison,Walied Merghani, Cliff Lansley, Choon-Ching Ng, and Moi Hoon Yap. 2018. Objective micro-facial movement detection using facs-based regions and baseline evaluation. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, 642--649.
[3]
Adrian K Davison, Cliff Lansley, Nicholas Costen, Kevin Tan, and Moi Hoon Yap. 2016. Samm: A spontaneous micro-facial movement dataset. IEEE transactions on affective computing 9, 1 (2016), 116--129.
[4]
Adrian K Davison,Moi Hoon Yap, and Cliff Lansley. 2015. Micro-facial movement detection using individualised baselines and histogram-based descriptors. In 2015 IEEE international conference on systems, man, and cybernetics. IEEE, 1864--1869.
[5]
Paul Ekman. 2003. Darwin, deception, and facial expression. Annals of the new York Academy of sciences 1000, 1 (2003), 205--221.
[6]
Paul Ekman and Wallace V Friesen. 1969. Nonverbal leakage and clues to deception. Psychiatry 32, 1 (1969), 88--106.
[7]
Paul Ekman and Wallace V Friesen. 1971. Constants across cultures in the face and emotion. Journal of personality and social psychology 17, 2 (1971), 124.
[8]
Paul Ekman and Wallace V Friesen. 1976. Measuring facial movement. Environmental psychology and nonverbal behavior 1 (1976), 56--75.
[9]
Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).
[10]
Mark Frank, Malgorzata Herbasz, Kang Sinuk, A Keller, and Courtney Nolan. 2009. I see how you feel: Training laypeople and professionals to recognize fleeting emotions. In The annual meeting of the international communication association. Sheraton New York, New York City. 1--35.
[11]
Yee Siang Gan, Sze-Teng Liong, Wei-Chuen Yau, Yen-Chang Huang, and Lit-Ken Tan. 2019. OFF-ApexNet on micro-expression recognition system. Signal Processing: Image Communication 74 (2019), 129--139.
[12]
Kam Meng Goh, Chee How Ng, Li Li Lim, and Usman Ullah Sheikh. 2020. Microexpression recognition: an updated review of current trends, challenges and solutions. The Visual Computer 36 (2020), 445--468.
[13]
Jihun Hamm, Christian G Kohler, Ruben C Gur, and Ragini Verma. 2011. Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. Journal of neuroscience methods 200, 2 (2011), 237--256.
[14]
Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10 (2009), 1755--1758.
[15]
Xiaobai Li, Xiaopeng Hong, Antti Moilanen, Xiaohua Huang, Tomas Pfister, Guoying Zhao, and Matti Pietikäinen. 2017. Towards reading hidden emotions: A comparative study of spontaneous micro-expression spotting and recognition methods. IEEE transactions on affective computing 9, 4 (2017), 563--577.
[16]
Xiaobai Li, Tomas Pfister, Xiaohua Huang, Guoying Zhao, and Matti Pietikäinen. 2013. A spontaneous micro-expression database: Inducement, collection and baseline. In 2013 10th IEEE International Conference and Workshops on Automatic face and gesture recognition (fg). IEEE, 1--6.
[17]
Gen-Bing Liong, John See, and Chee-Seng Chan. 2023. Spot-then-Recognize: A Micro-Expression Analysis Network for Seamless Evaluation of Long Videos. Signal Processing: Image Communication 110 (2023), 116875. https://doi.org/10.1016/j.image.2022.116875
[18]
Gen-Bing Liong, John See, and Lai-Kuan Wong. 2021. Shallow optical flow threestream CNN for macro-and micro-expression spotting from long videos. In 2021 IEEE international conference on image processing (ICIP). IEEE, 2643--2647.
[19]
Shuming Liu, Chen-Lin Zhang, Chen Zhao, and Bernard Ghanem. 2024. Endto- end temporal action detection with 1b parameters across 1000 frames. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18591--18601.
[20]
Antti Moilanen, Guoying Zhao, and Matti Pietikäinen. 2014. Spotting rapid facial movements from videos using appearance-based feature difference analysis. In 2014 22nd international conference on pattern recognition. IEEE, 1722--1727.
[21]
Michael A Sayette, Jeffrey F Cohn, JoanMWertz, Michael A Perrott, and Dominic J Parrott. 2001. A psychometric evaluation of the facial action coding system for assessing spontaneous expression. Journal of nonverbal behavior 25 (2001), 167--185.
[22]
Matthew Shreve, Sridhar Godavarthy, Dmitry Goldgof, and Sudeep Sarkar. 2011. Macro-and micro-expression spotting in long videos using spatio-temporal strain. In 2011 IEEE international conference on automatic face & gesture recognition (FG). IEEE, 51--56.
[23]
Nguyen Van Quang, Jinhee Chun, and Takeshi Tokuyama. 2019. CapsuleNet for micro-expression recognition. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). IEEE, 1--7.
[24]
Michiel Verburg and Vlado Menkovski. 2019. Micro-expression detection in long videos using optical flow and recurrent neural networks. In 2019 14th IEEE International conference on automatic face & gesture recognition (FG 2019). IEEE, 1--6.
[25]
Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, and Yu Qiao. 2023. Videomae v2: Scaling video masked autoencoders with dual masking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14549--14560.
[26]
Su-Jing Wang, Ying He, Jingting Li, and Xiaolan Fu. 2021. MESNet: A convolutional neural network for spotting multi-scale micro-expression intervals in long videos. IEEE Transactions on Image Processing 30 (2021), 3956--3969.
[27]
Su-Jing Wang, Shuhang Wu, Xingsheng Qian, Jingxiu Li, and Xiaolan Fu. 2017. A main directional maximal difference analysis for spotting facial movements from long-term videos. Neurocomputing 230 (2017), 382--389.
[28]
Su-Jing Wang, Wen-Jing Yan, Xiaobai Li, Guoying Zhao, Chun-Guang Zhou, Xiaolan Fu, Minghao Yang, and Jianhua Tao. 2015. Micro-expression recognition using color spaces. IEEE Transactions on Image Processing 24, 12 (2015), 6034-- 6047.
[29]
Wen-Jing Yan, Xiaobai Li, Su-Jing Wang, Guoying Zhao, Yong-Jin Liu, Yu-Hsin Chen, and Xiaolan Fu. 2014. CASME II: An improved spontaneous microexpression database and the baseline evaluation. PloS one 9, 1 (2014), e86041.
[30]
Wen-Jing Yan, Qi Wu, Jing Liang, Yu-Hsin Chen, and Xiaolan Fu. 2013. How fast are the leaked facial expressions: The duration of micro-expressions. Journal of Nonverbal Behavior 37 (2013), 217--230.
[31]
Wen-Jing Yan, QiWu, Yong-Jin Liu, Su-JingWang, and Xiaolan Fu. 2013. CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, 1--7.
[32]
Chen-Lin Zhang, JianxinWu, and Yin Li. 2022. Actionformer: Localizing moments of actions with transformers. In European Conference on Computer Vision. Springer, 492--510.
[33]
Zhihao Zhang, Tong Chen, Hongying Meng, Guangyuan Liu, and Xiaolan Fu. 2018. SMEConvNet: A convolutional neural network for spotting spontaneous facial micro-expression from long videos. IEEE Access 6 (2018), 71143--71151.

Index Terms

  1. Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognize

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. micro-expression spotting-then-recognize
    2. multi-scale feature fusion
    3. videomae v2

    Qualifiers

    • Research-article

    Funding Sources

    • Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park
    • Dreams Foundation of Jianghuai Advance Technology Center
    • Natural Science Foundation of China
    • National Aviation Science Foundation

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 283
      Total Downloads
    • Downloads (Last 12 months)283
    • Downloads (Last 6 weeks)147
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media