Abstract
Classroom attention estimation aims to capture the multi-modal semantic information contained in the teaching situation and analyze the level of concentration and participation of students in the classroom. However, it is a challenge to mine different modal information in non-experimental real teaching scenes to construct a unified attention mode. In order to advance these researches, this paper proposes a new method of automatically estimating attention through facial feature points. This method uses face detection and face alignment algorithms to capture 68 landmarks on student faces in classroom videos, and introduces face reference information to constrain landmarks and extract feature sets. The purpose is to reduce the sensitivity of the attention model to differences in different face information. The automatic evaluation module uses machine learning algorithms to train the classifier to estimate the individual student's attention level. In a large number of experiments conducted on multiple real classroom video data, our three-level attention classifier achieves an accuracy of 82.5%, which can achieve better results than other studies in the field of student participation analysis. The results show that the method based on facial landmark mining can more accurately predict the individual student's classroom attention level, and can be used as a non-intrusive automatic analysis method for real classroom multimedia data analysis.
L. Chen and H. Yang—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rivera-Pelayo, V., Munk, J., Zacharias, V., Braun, S.: Live interest meter – learning from quantified feedback in mass lectures. In: International Conference on Learning Analytics & Knowledge, pp. 23–27 (2013)
Raca, M., Tormey, R., Dillenbourg, P.: Sleepers’ lag-study on motion and attention. In: Proceedings of the Fourth International Conference on Learning Analytics and Knowledge, pp. 36–43. ACM (2014)
Zaletelj, J., Košir, A.: Predicting students’ attention in the classroom from Kinect facial and body features. J. Image Video Process. 2017, 80 (2017). https://doi.org/10.1186/s13640-017-0228-8
Monkaresi, H., Bosch, N., Calvo, R.A., D'Mello, S.K.: Automated detection of engagement using video-based estimation of facial expressions and heart rate. IEEE Trans. Affect. Comput. 8(1), 15–28 (2017). https://doi.org/10.1109/TAFFC.2016.2515084
Xu, X., Teng, X.: Classroom attention analysis based on multiple euler angles constraint and head pose estimation. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 329–340. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_27
Zheng, R., Jiang, F., Shen, R.: Intelligent student behavior analysis system for real classrooms. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 9244–9248 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053457
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage headless face detector. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 4885–4894 (2017). https://doi.org/10.1109/ICCV.2017.522
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013). https://doi.org/10.1109/CVPR.2013.75
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014). https://doi.org/10.1109/CVPR.2014.241
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_7
Wang, X., Li, X., Wu, S.: Graph structure reasoning network for face alignment and reconstruction. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12572, pp. 493–505. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67832-6_40
Shao, Z., Ding, S., Zhu, H., Wang, C., Ma, L.: Face alignment by deep convolutional network with adaptive learning rate. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1283–1287 (2016). https://doi.org/10.1109/ICASSP.2016.7471883
Grafsgaard, J.F., et al.: The additive value of multimodal features for predicting engagement, frustration, and learning during tutoring. In: Proceedings of the 16th International Conference on Multimodal Interaction, pp. 42–49. ACM (2014)
Whitehill, J., Serpell, Z., Lin, Y., Foster, A., Movellan, J.R.: The faces of engagement: automatic recognition of student engagementfrom facial expressions. IEEE Trans. Affect. Comput. 5(1), 86–98 (2014). https://doi.org/10.1109/TAFFC.2014.2316163
Yang, X., Kim, Y.-J., Taub, M., Azevedo, R., Chi, M.: PRIME: block-wise missingness handling for multi-modalities in intelligent tutoring systems. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 63–75. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_6
Yang, S., Luo, P., Loy, C.C., Tang, X.: WIDER FACE: a face detection benchmark. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5525–5533 (2016). https://doi.org/10.1109/CVPR.2016.596
Zhu, J., Liu, Y., Zhang, L.: 3D face reconstruction based on geometric transformation. In: 2012 International Conference on Virtual Reality and Visualization, pp. 46–49 (2012). https://doi.org/10.1109/ICVRV.2012.10
Su, P., Drysdale, R.L.S.: A comparison of sequential delaunay triangulation algorithms. Comput. Geom. Theory Appl. 7, 361–358 (1997)
Li, X., Chen, Z., Yang, F.: Exploring of clustering algorithm on class-imbalanced data. In: 2013 8th International Conference on Computer Science & Education, pp. 89–93 (2013). https://doi.org/10.1109/ICCSE.2013.6553890
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 146–155 (2016). https://doi.org/10.1109/CVPR.2016.23
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 397–403 (2013). https://doi.org/10.1109/ICCVW.2013.59
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61772023), National Key Research and Development Program of China (No. 2019QY1803), Fujian Science and Technology Plan Industry-University-Research Cooperation Project (No.2021H6015), the National College Student Innovation and Entrepreneurship Training Program of China (202110384258) and The Social Science Program of Fujian Province (FJ2020B062).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, L., Yang, H., Liu, K. (2022). Classroom Attention Estimation Method Based on Mining Facial Landmarks of Students. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-98355-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98354-3
Online ISBN: 978-3-030-98355-0
eBook Packages: Computer ScienceComputer Science (R0)