The Challenge of Noisy Classrooms: Speaker Detection During Elementary Students’ Collaborative Dialogue

Ma, Yingbo; Wiggins, Joseph B.; Celepkolu, Mehmet; Boyer, Kristy Elizabeth; Lynch, Collin; Wiebe, Eric

doi:10.1007/978-3-030-78292-4_22

Yingbo Ma¹³,
Joseph B. Wiggins¹³,
Mehmet Celepkolu¹³,
Kristy Elizabeth Boyer¹³,
Collin Lynch¹⁴ &
…
Eric Wiebe¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12748))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

3551 Accesses
5 Citations

Abstract

Adaptive and intelligent collaborative learning support systems are effective for supporting learning and building strong collaborative skills. This potential has not yet been realized within noisy classroom environments, where automated speech recognition (ASR) is very difficult. A key challenge is to differentiate each learner’s speech from the background noise, which includes the teachers’ speech as well as other groups’ speech. In this paper, we explore a multimodal method to identify speakers by using visual and acoustic features from ten video recordings of children pairs collaborating in an elementary school classroom. The results indicate that the visual modality was better for identifying the speaker when in-group speech was detected, while the acoustic modality was better for differentiating in-group speech from background speech. Our analysis also revealed that recurrent neural network (RNN)-based models outperformed convolutional neural network (CNN)-based models with higher speaker detection F-1 scores. This work represents a critical step toward the classroom deployment of intelligent systems that support collaborative learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Siamese Neural Networks for Class Activity Detection

Subject dependent speech verification approach for assistive special education

Article Open access 07 February 2024

Deep Learning Approaches for Classroom Audio Classification Using Mel Spectrograms

Notes

1.
https://github.com/yingbo-ma/The-Challenge-of-Noisy-Classrooms-AIED2021.

References

Ahmed, I., et al.: Investigating help-giving behavior in a cross-platform learning environment. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 14–25. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_2
Chapter Google Scholar
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.P.: Openface 2.0: facial behavior analysis toolkit. In: Proceedings of the International Conference on Automatic Face & Gesture Recognition, pp. 59–66. IEEE (2018)
Google Scholar
Blanchard, N., et al.: A study of automatic speech recognition in noisy classroom environments for automated dialog analysis. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.F. (eds.) AIED 2015. LNCS (LNAI), vol. 9112, pp. 23–33. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19773-9_3
Chapter Google Scholar
Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 251–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_17
Chapter Google Scholar
Celepkolu, M., Wiggins, J.B., Galdo, A.C., Boyer, K.E.: Designing a visualization tool for children to reflect on their collaborative dialogue. Int. J. Child-Comput. Interact. 27, 100232 (2021)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
ELAN. https://archive.mpi.nl/tla/elan
Ellamil, M., Susskind, J.M., Anderson, A.K.: Examinations of identity invariance in facial expression adaptation. Cogn. Affect. Behav. Neurosci. 8(3), 273–281 (2008). https://doi.org/10.3758/CABN.8.3.273
Article Google Scholar
Fadljević, L., Maitz, K., Kowald, D., Pammer-Schindler, V., Gasteiger-Klicpera, B.: Slow is good: the effect of diligence on student performance in the case of an adaptive learning system for health literacy. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 112–117 (2020)
Google Scholar
FFmpeg. https://github.com/FFmpeg/FFmpeg
Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_25
Chapter Google Scholar
Harsley, R., Green, N., Di Eugenio, B., Aditya, S., Fossati, D., Al Zoubi, O.: Collab-ChiQat: a collaborative remaking of a computer science intelligent tutoring system. In: Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion, pp. 281–284 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Howard, C.S., Munro, K.J., Plack, C.J.: Listening effort at signal-to-noise ratios that are typical of the school classroom. Int. J. Audiol. 49(12), 928–932 (2010)
Article Google Scholar
Hu, Y., Ren, J.S., Dai, J., Yuan, C., Xu, L., Wang, W.: Deep multimodal speaker naming. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1107–1110 (2015)
Google Scholar
ImageNet. http://www.image-net.org/
Karakostas, A., Demetriadis, S.: Enhancing collaborative learning through dynamic forms of support: the impact of an adaptive domain-specific support strategy. J. Comput. Assist. Learn. 27(3), 243–258 (2011)
Article Google Scholar
Keras. https://keras.io/api/
Kiktova, E., Lojka, M., Pleva, M., Juhar, J., Cizmar, A.: Comparison of different feature types for acoustic event detection system. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2013. CCIS, vol. 368, pp. 288–297. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38559-9_25
Chapter Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2014)
Google Scholar
Kumar, R., Rosé, C.P., Wang, Y.C., Joshi, M., Robinson, A.: Tutorial dialogue as adaptive collaborative learning support. Front. Artif. Intell. Appl. 158, 383 (2007)
Google Scholar
Li, H., Wang, Z., Tang, J., Ding, W., Liu, Z.: Siamese neural networks for class activity detection. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12164, pp. 162–167. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_30
Chapter Google Scholar
Liu, W., Wen, Y., Yu, Z., Yang, M.: Large-margin softmax loss for convolutional neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 507–516 (2016)
Google Scholar
Lyu, F., et al.: EnseWing: creating an instrumental ensemble playing experience for children with limited music training. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 4326–4330 (2017)
Google Scholar
Magnisalis, I., Demetriadis, S., Karakostas, A.: Adaptive and intelligent systems for collaborative learning support: a review of the field. IEEE Trans. Learn. Technol. 4(1), 5–20 (2011)
Article Google Scholar
Marcos-García, J.A., Martínez-Monés, A., Dimitriadis, Y.: DESPRO: a method based on roles to provide collaboration analysis support adapted to the participants in CSCL situations. Comput. Educ. 82, 335–353 (2015)
Article Google Scholar
Martínez-Monés, A., Harrer, A., Dimitriadis, Y.: An interaction-aware design process for the integration of interaction analysis into mainstream cscl practices. In: Puntambekar, S., Erkens, G., Hmelo-Silver, C. (eds.) Analyzing Interactions in CSCL. Computer-Supported Collaborative Learning Series, pp. 269–291. Springer, Boston (2011). https://doi.org/10.1007/978-1-4419-7710-6_13
Chapter Google Scholar
McFee, B., et al.: librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015)
Google Scholar
Moreno, L., Popescu, B., Groenwald, C.: Teaching computer architecture using a collaborative approach: the Siena tool tutorial sessions and problem solving. Learning 2, 10 (2013)
Google Scholar
Netsblox https://netsblox.org/
Nguyen, V., Dang, H.H., Do, N.K., Tran, D.T.: Enhancing team collaboration through integrating social interactions in a web-based development environment. Comput. Appl. Eng. Educ. 24(4), 529–545 (2016)
Article Google Scholar
OpenCV. https://github.com/opencv/opencv
OpenCV-Face-Detector. https://github.com/opencv/opencv/tree/master/samples/dnn/face_detector
OpenCV-Optical-Flow. https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_video/py_lucas_kanade/py_lucas_kanade.html
Ren, J., et al.: Look, listen and learn—a multimodal LSTM for speaker identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3581–3587 (2016)
Google Scholar
Rodríguez, F.J., Boyer, K.E.: Discovering individual and collaborative problem-solving modes with hidden Markov models. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.F. (eds.) AIED 2015. LNCS (LNAI), vol. 9112, pp. 408–418. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19773-9_41
Chapter Google Scholar
Sancho, P., Fuentes-Fernández, R., Fernández-Manjón, B.: NUCLEO: adaptive computer supported collaborative learning in a role game based scenario. In: Proceedings of the IEEE International Conference on Advanced Learning Technologies, pp. 671–675. IEEE (2008)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 27, pp. 568–576 (2014)
Google Scholar
Soleymani, M., Stefanov, K., Kang, S.H., Ondras, J., Gratch, J.: Multimodal analysis and estimation of intimate self-disclosure. In: Proceedings of the International Conference on Multimodal Interaction, pp. 59–68 (2019)
Google Scholar
Tan, Z.H., Lindberg, B.: Low-complexity variable frame rate analysis for speech recognition and voice activity detection. IEEE J. Sel. Topics Sig. Process. 4(5), 798–807 (2010)
Article Google Scholar
Tsompanoudi, D., Satratzemi, M., Xinogalos, S.: Evaluating the effects of scripted distributed pair programming on student performance and participation. IEEE Trans. Educ. 59(1), 24–31 (2015)
Article Google Scholar
Varatharaj, A., Botelho, A.F., Lu, X., Heffernan, N.T.: Supporting teacher assessment in Chinese language learning using textual and tonal features. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 562–573. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52237-7_45
Chapter Google Scholar
VGGish. https://github.com/tensorflow/models/tree/master/research/audioset/vggish
Vizcaíno, A., Contreras, J., Favela, J., Prieto, M.: An adaptive, collaborative environment to develop good habits in programming. In: Gauthier, G., Frasson, C., VanLehn, K. (eds.) ITS 2000. LNCS, vol. 1839, pp. 262–271. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45108-0_30
Chapter Google Scholar
Walker, E., Rummel, N., Koedinger, K.R.: Adaptive intelligent support to improve peer tutoring in algebra. In: Proceedings of the International Conference on Artificial Intelligence in Education, vol. 24, no. 1, pp. 33–61 (2014)
Google Scholar
Walker, E., Rummel, N., Koedinger, K.R., et al.: Modeling helping behavior in an intelligent tutor for peer tutoring. In: Proceedings of the International Conference on Artificial Intelligence in Education, pp. 341–348 (2009)
Google Scholar
Yett, B., Hutchins, N., Snyder, C., Zhang, N., Mishra, S., Biswas, G.: Evaluating student learning in a synchronous, collaborative programming environment through log-based analysis of projects. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12164, pp. 352–357. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_64
Chapter Google Scholar

Download references

Acknowledgments

This research is supported by the National Science Foundation through grant DRL-1721160. Any opinions, findings, conclusions, or recommendations expressed in this report are those of the participants, and do not necessarily represent the official views, opinions, or policy of the National Science Foundation.

Author information

Authors and Affiliations

University of Florida, Gainesville, FL, 32601, USA
Yingbo Ma, Joseph B. Wiggins, Mehmet Celepkolu & Kristy Elizabeth Boyer
North Carolina State University, Raleigh, NC, 27606, USA
Collin Lynch & Eric Wiebe

Authors

Yingbo Ma
View author publications
You can also search for this author in PubMed Google Scholar
Joseph B. Wiggins
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet Celepkolu
View author publications
You can also search for this author in PubMed Google Scholar
Kristy Elizabeth Boyer
View author publications
You can also search for this author in PubMed Google Scholar
Collin Lynch
View author publications
You can also search for this author in PubMed Google Scholar
Eric Wiebe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yingbo Ma or Kristy Elizabeth Boyer .

Editor information

Editors and Affiliations

Technion – Israel Institute of Technology, Haifa, Israel
Ido Roll
Arizona State University, Tempe, AZ, USA
Danielle McNamara
Utrecht University, Utrecht, The Netherlands
Sergey Sosnovsky
London Knowledge Lab, London, UK
Rose Luckin
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, Y., Wiggins, J.B., Celepkolu, M., Boyer, K.E., Lynch, C., Wiebe, E. (2021). The Challenge of Noisy Classrooms: Speaker Detection During Elementary Students’ Collaborative Dialogue. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12748. Springer, Cham. https://doi.org/10.1007/978-3-030-78292-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-78292-4_22
Published: 11 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78291-7
Online ISBN: 978-3-030-78292-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics