Skip to main content

The Challenge of Noisy Classrooms: Speaker Detection During Elementary Students’ Collaborative Dialogue

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12748))

Included in the following conference series:

Abstract

Adaptive and intelligent collaborative learning support systems are effective for supporting learning and building strong collaborative skills. This potential has not yet been realized within noisy classroom environments, where automated speech recognition (ASR) is very difficult. A key challenge is to differentiate each learner’s speech from the background noise, which includes the teachers’ speech as well as other groups’ speech. In this paper, we explore a multimodal method to identify speakers by using visual and acoustic features from ten video recordings of children pairs collaborating in an elementary school classroom. The results indicate that the visual modality was better for identifying the speaker when in-group speech was detected, while the acoustic modality was better for differentiating in-group speech from background speech. Our analysis also revealed that recurrent neural network (RNN)-based models outperformed convolutional neural network (CNN)-based models with higher speaker detection F-1 scores. This work represents a critical step toward the classroom deployment of intelligent systems that support collaborative learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/yingbo-ma/The-Challenge-of-Noisy-Classrooms-AIED2021.

References

  1. Ahmed, I., et al.: Investigating help-giving behavior in a cross-platform learning environment. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 14–25. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_2

    Chapter  Google Scholar 

  2. Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.P.: Openface 2.0: facial behavior analysis toolkit. In: Proceedings of the International Conference on Automatic Face & Gesture Recognition, pp. 59–66. IEEE (2018)

    Google Scholar 

  3. Blanchard, N., et al.: A study of automatic speech recognition in noisy classroom environments for automated dialog analysis. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.F. (eds.) AIED 2015. LNCS (LNAI), vol. 9112, pp. 23–33. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19773-9_3

    Chapter  Google Scholar 

  4. Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 251–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_17

    Chapter  Google Scholar 

  5. Celepkolu, M., Wiggins, J.B., Galdo, A.C., Boyer, K.E.: Designing a visualization tool for children to reflect on their collaborative dialogue. Int. J. Child-Comput. Interact. 27, 100232 (2021)

    Article  Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. ELAN. https://archive.mpi.nl/tla/elan

  8. Ellamil, M., Susskind, J.M., Anderson, A.K.: Examinations of identity invariance in facial expression adaptation. Cogn. Affect. Behav. Neurosci. 8(3), 273–281 (2008). https://doi.org/10.3758/CABN.8.3.273

    Article  Google Scholar 

  9. Fadljević, L., Maitz, K., Kowald, D., Pammer-Schindler, V., Gasteiger-Klicpera, B.: Slow is good: the effect of diligence on student performance in the case of an adaptive learning system for health literacy. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 112–117 (2020)

    Google Scholar 

  10. FFmpeg. https://github.com/FFmpeg/FFmpeg

  11. Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_25

    Chapter  Google Scholar 

  12. Harsley, R., Green, N., Di Eugenio, B., Aditya, S., Fossati, D., Al Zoubi, O.: Collab-ChiQat: a collaborative remaking of a computer science intelligent tutoring system. In: Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion, pp. 281–284 (2016)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  14. Howard, C.S., Munro, K.J., Plack, C.J.: Listening effort at signal-to-noise ratios that are typical of the school classroom. Int. J. Audiol. 49(12), 928–932 (2010)

    Article  Google Scholar 

  15. Hu, Y., Ren, J.S., Dai, J., Yuan, C., Xu, L., Wang, W.: Deep multimodal speaker naming. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1107–1110 (2015)

    Google Scholar 

  16. ImageNet. http://www.image-net.org/

  17. Karakostas, A., Demetriadis, S.: Enhancing collaborative learning through dynamic forms of support: the impact of an adaptive domain-specific support strategy. J. Comput. Assist. Learn. 27(3), 243–258 (2011)

    Article  Google Scholar 

  18. Keras. https://keras.io/api/

  19. Kiktova, E., Lojka, M., Pleva, M., Juhar, J., Cizmar, A.: Comparison of different feature types for acoustic event detection system. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2013. CCIS, vol. 368, pp. 288–297. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38559-9_25

    Chapter  Google Scholar 

  20. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2014)

    Google Scholar 

  21. Kumar, R., Rosé, C.P., Wang, Y.C., Joshi, M., Robinson, A.: Tutorial dialogue as adaptive collaborative learning support. Front. Artif. Intell. Appl. 158, 383 (2007)

    Google Scholar 

  22. Li, H., Wang, Z., Tang, J., Ding, W., Liu, Z.: Siamese neural networks for class activity detection. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12164, pp. 162–167. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_30

    Chapter  Google Scholar 

  23. Liu, W., Wen, Y., Yu, Z., Yang, M.: Large-margin softmax loss for convolutional neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 507–516 (2016)

    Google Scholar 

  24. Lyu, F., et al.: EnseWing: creating an instrumental ensemble playing experience for children with limited music training. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 4326–4330 (2017)

    Google Scholar 

  25. Magnisalis, I., Demetriadis, S., Karakostas, A.: Adaptive and intelligent systems for collaborative learning support: a review of the field. IEEE Trans. Learn. Technol. 4(1), 5–20 (2011)

    Article  Google Scholar 

  26. Marcos-García, J.A., Martínez-Monés, A., Dimitriadis, Y.: DESPRO: a method based on roles to provide collaboration analysis support adapted to the participants in CSCL situations. Comput. Educ. 82, 335–353 (2015)

    Article  Google Scholar 

  27. Martínez-Monés, A., Harrer, A., Dimitriadis, Y.: An interaction-aware design process for the integration of interaction analysis into mainstream cscl practices. In: Puntambekar, S., Erkens, G., Hmelo-Silver, C. (eds.) Analyzing Interactions in CSCL. Computer-Supported Collaborative Learning Series, pp. 269–291. Springer, Boston (2011). https://doi.org/10.1007/978-1-4419-7710-6_13

    Chapter  Google Scholar 

  28. McFee, B., et al.: librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015)

    Google Scholar 

  29. Moreno, L., Popescu, B., Groenwald, C.: Teaching computer architecture using a collaborative approach: the Siena tool tutorial sessions and problem solving. Learning 2, 10 (2013)

    Google Scholar 

  30. Netsblox https://netsblox.org/

  31. Nguyen, V., Dang, H.H., Do, N.K., Tran, D.T.: Enhancing team collaboration through integrating social interactions in a web-based development environment. Comput. Appl. Eng. Educ. 24(4), 529–545 (2016)

    Article  Google Scholar 

  32. OpenCV. https://github.com/opencv/opencv

  33. OpenCV-Face-Detector. https://github.com/opencv/opencv/tree/master/samples/dnn/face_detector

  34. OpenCV-Optical-Flow. https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_video/py_lucas_kanade/py_lucas_kanade.html

  35. Ren, J., et al.: Look, listen and learn—a multimodal LSTM for speaker identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3581–3587 (2016)

    Google Scholar 

  36. Rodríguez, F.J., Boyer, K.E.: Discovering individual and collaborative problem-solving modes with hidden Markov models. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.F. (eds.) AIED 2015. LNCS (LNAI), vol. 9112, pp. 408–418. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19773-9_41

    Chapter  Google Scholar 

  37. Sancho, P., Fuentes-Fernández, R., Fernández-Manjón, B.: NUCLEO: adaptive computer supported collaborative learning in a role game based scenario. In: Proceedings of the IEEE International Conference on Advanced Learning Technologies, pp. 671–675. IEEE (2008)

    Google Scholar 

  38. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 27, pp. 568–576 (2014)

    Google Scholar 

  39. Soleymani, M., Stefanov, K., Kang, S.H., Ondras, J., Gratch, J.: Multimodal analysis and estimation of intimate self-disclosure. In: Proceedings of the International Conference on Multimodal Interaction, pp. 59–68 (2019)

    Google Scholar 

  40. Tan, Z.H., Lindberg, B.: Low-complexity variable frame rate analysis for speech recognition and voice activity detection. IEEE J. Sel. Topics Sig. Process. 4(5), 798–807 (2010)

    Article  Google Scholar 

  41. Tsompanoudi, D., Satratzemi, M., Xinogalos, S.: Evaluating the effects of scripted distributed pair programming on student performance and participation. IEEE Trans. Educ. 59(1), 24–31 (2015)

    Article  Google Scholar 

  42. Varatharaj, A., Botelho, A.F., Lu, X., Heffernan, N.T.: Supporting teacher assessment in Chinese language learning using textual and tonal features. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 562–573. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52237-7_45

    Chapter  Google Scholar 

  43. VGGish. https://github.com/tensorflow/models/tree/master/research/audioset/vggish

  44. Vizcaíno, A., Contreras, J., Favela, J., Prieto, M.: An adaptive, collaborative environment to develop good habits in programming. In: Gauthier, G., Frasson, C., VanLehn, K. (eds.) ITS 2000. LNCS, vol. 1839, pp. 262–271. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45108-0_30

    Chapter  Google Scholar 

  45. Walker, E., Rummel, N., Koedinger, K.R.: Adaptive intelligent support to improve peer tutoring in algebra. In: Proceedings of the International Conference on Artificial Intelligence in Education, vol. 24, no. 1, pp. 33–61 (2014)

    Google Scholar 

  46. Walker, E., Rummel, N., Koedinger, K.R., et al.: Modeling helping behavior in an intelligent tutor for peer tutoring. In: Proceedings of the International Conference on Artificial Intelligence in Education, pp. 341–348 (2009)

    Google Scholar 

  47. Yett, B., Hutchins, N., Snyder, C., Zhang, N., Mishra, S., Biswas, G.: Evaluating student learning in a synchronous, collaborative programming environment through log-based analysis of projects. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12164, pp. 352–357. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_64

    Chapter  Google Scholar 

Download references

Acknowledgments

This research is supported by the National Science Foundation through grant DRL-1721160. Any opinions, findings, conclusions, or recommendations expressed in this report are those of the participants, and do not necessarily represent the official views, opinions, or policy of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yingbo Ma or Kristy Elizabeth Boyer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, Y., Wiggins, J.B., Celepkolu, M., Boyer, K.E., Lynch, C., Wiebe, E. (2021). The Challenge of Noisy Classrooms: Speaker Detection During Elementary Students’ Collaborative Dialogue. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12748. Springer, Cham. https://doi.org/10.1007/978-3-030-78292-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78292-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78291-7

  • Online ISBN: 978-3-030-78292-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics