Classroom Attention Estimation Method Based on Mining Facial Landmarks of Students

Chen, Liyan; Yang, Haoran; Liu, Kunhong

doi:10.1007/978-3-030-98355-0_22

Liyan Chen^15,16,
Haoran Yang¹⁵ &
Kunhong Liu^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13142))

Included in the following conference series:

International Conference on Multimedia Modeling

1977 Accesses
1 Citations

Abstract

Classroom attention estimation aims to capture the multi-modal semantic information contained in the teaching situation and analyze the level of concentration and participation of students in the classroom. However, it is a challenge to mine different modal information in non-experimental real teaching scenes to construct a unified attention mode. In order to advance these researches, this paper proposes a new method of automatically estimating attention through facial feature points. This method uses face detection and face alignment algorithms to capture 68 landmarks on student faces in classroom videos, and introduces face reference information to constrain landmarks and extract feature sets. The purpose is to reduce the sensitivity of the attention model to differences in different face information. The automatic evaluation module uses machine learning algorithms to train the classifier to estimate the individual student's attention level. In a large number of experiments conducted on multiple real classroom video data, our three-level attention classifier achieves an accuracy of 82.5%, which can achieve better results than other studies in the field of student participation analysis. The results show that the method based on facial landmark mining can more accurately predict the individual student's classroom attention level, and can be used as a non-intrusive automatic analysis method for real classroom multimedia data analysis.

L. Chen and H. Yang—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rivera-Pelayo, V., Munk, J., Zacharias, V., Braun, S.: Live interest meter – learning from quantified feedback in mass lectures. In: International Conference on Learning Analytics & Knowledge, pp. 23–27 (2013)
Google Scholar
Raca, M., Tormey, R., Dillenbourg, P.: Sleepers’ lag-study on motion and attention. In: Proceedings of the Fourth International Conference on Learning Analytics and Knowledge, pp. 36–43. ACM (2014)
Google Scholar
Zaletelj, J., Košir, A.: Predicting students’ attention in the classroom from Kinect facial and body features. J. Image Video Process. 2017, 80 (2017). https://doi.org/10.1186/s13640-017-0228-8
Monkaresi, H., Bosch, N., Calvo, R.A., D'Mello, S.K.: Automated detection of engagement using video-based estimation of facial expressions and heart rate. IEEE Trans. Affect. Comput. 8(1), 15–28 (2017). https://doi.org/10.1109/TAFFC.2016.2515084
Xu, X., Teng, X.: Classroom attention analysis based on multiple euler angles constraint and head pose estimation. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 329–340. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_27
Chapter Google Scholar
Zheng, R., Jiang, F., Shen, R.: Intelligent student behavior analysis system for real classrooms. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 9244–9248 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053457
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage headless face detector. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 4885–4894 (2017). https://doi.org/10.1109/ICCV.2017.522
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013). https://doi.org/10.1109/CVPR.2013.75
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014). https://doi.org/10.1109/CVPR.2014.241
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_7
Chapter Google Scholar
Wang, X., Li, X., Wu, S.: Graph structure reasoning network for face alignment and reconstruction. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12572, pp. 493–505. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67832-6_40
Chapter Google Scholar
Shao, Z., Ding, S., Zhu, H., Wang, C., Ma, L.: Face alignment by deep convolutional network with adaptive learning rate. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1283–1287 (2016). https://doi.org/10.1109/ICASSP.2016.7471883
Grafsgaard, J.F., et al.: The additive value of multimodal features for predicting engagement, frustration, and learning during tutoring. In: Proceedings of the 16th International Conference on Multimodal Interaction, pp. 42–49. ACM (2014)
Google Scholar
Whitehill, J., Serpell, Z., Lin, Y., Foster, A., Movellan, J.R.: The faces of engagement: automatic recognition of student engagementfrom facial expressions. IEEE Trans. Affect. Comput. 5(1), 86–98 (2014). https://doi.org/10.1109/TAFFC.2014.2316163
Yang, X., Kim, Y.-J., Taub, M., Azevedo, R., Chi, M.: PRIME: block-wise missingness handling for multi-modalities in intelligent tutoring systems. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 63–75. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_6
Chapter Google Scholar
Yang, S., Luo, P., Loy, C.C., Tang, X.: WIDER FACE: a face detection benchmark. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5525–5533 (2016). https://doi.org/10.1109/CVPR.2016.596
Zhu, J., Liu, Y., Zhang, L.: 3D face reconstruction based on geometric transformation. In: 2012 International Conference on Virtual Reality and Visualization, pp. 46–49 (2012). https://doi.org/10.1109/ICVRV.2012.10
Su, P., Drysdale, R.L.S.: A comparison of sequential delaunay triangulation algorithms. Comput. Geom. Theory Appl. 7, 361–358 (1997)
Google Scholar
Li, X., Chen, Z., Yang, F.: Exploring of clustering algorithm on class-imbalanced data. In: 2013 8th International Conference on Computer Science & Education, pp. 89–93 (2013). https://doi.org/10.1109/ICCSE.2013.6553890
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 146–155 (2016). https://doi.org/10.1109/CVPR.2016.23
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 397–403 (2013). https://doi.org/10.1109/ICCVW.2013.59

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61772023), National Key Research and Development Program of China (No. 2019QY1803), Fujian Science and Technology Plan Industry-University-Research Cooperation Project (No.2021H6015), the National College Student Innovation and Entrepreneurship Training Program of China (202110384258) and The Social Science Program of Fujian Province (FJ2020B062).

Author information

Authors and Affiliations

School of Informatics, Xiamen University, Xiamen, 361001, China
Liyan Chen, Haoran Yang & Kunhong Liu
School of Film, Xiamen University, Xiamen, 361001, China
Liyan Chen & Kunhong Liu

Authors

Liyan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kunhong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kunhong Liu .

Editor information

Editors and Affiliations

IT University of Copenhagen, Copenhagen, Denmark
Björn Þór Jónsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Minh-Triet Tran
University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
National Tsing Hua University, Hsinchu, Taiwan
Anita Min-Chun Hu
Hanoi University of Science and Technology, Hanoi, Vietnam
Binh Huynh Thi Thanh
Median Technologies, Valbonne, France
Benoit Huet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, L., Yang, H., Liu, K. (2022). Classroom Attention Estimation Method Based on Mining Facial Landmarks of Students. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-98355-0_22
Published: 15 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98354-3
Online ISBN: 978-3-030-98355-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics