Abstract
Online course reviews have been an essential way in which course providers could get insights into students’ perceptions about the course quality, especially in the context of massive open online courses (MOOCs), where it is hard for both parties to get further interaction. Analyzing online course reviews is thus an inevitable part for course providers towards the improvement of course quality and the structuring of future courses. However, reading through the often-time thousands of comments and extracting key ideas is not efficient and will potentially incur non-coverage of some important ideas. In this work, we propose a key idea extractor that is based on fine-grained aspect-level semantic units from comments, powered by different variations of state-of-the-art pre-trained language models (PLMs). Our approach differs from both previous topic modeling and keyword extraction methods, which lies in: First, we aim to not only eliminate the heavy reliance on human intervention and statistical characteristics that traditional topic models like LDA are based on, but also to overcome the coarse granularity of state-of-the-art topic models like top2vec. Second, different from previous keyword extraction methods, we do not extract keywords to summarize each comment, which we argue is not necessarily helpful for human readers to grasp key ideas at the course level. Instead, we cluster the ideas and concerns that have been most expressed throughout the whole course, without relying on the verbatimness of students’ wording. We show that this method provides high and stable coverage of students’ ideas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Engaging with massive online courses. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 687–698 (2014)
Angelov, D.: Top2vec: distributed representations of topics (2020). arXiv preprint arXiv:2008.09470
Baddeley, A.D.: Working memory and reading. In: Processing of Visible Language, pp. 355–370. Springer (1979). https://doi.org/10.1007/978-1-4684-0994-9_21
Baker, R., Dee, T., Evans, B., John, J.: Bias in online classes: evidence from a field experiment. Econ. Educ. Rev. 88, 102259 (2022)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bolliger, D.U.: Key factors for determining student satisfaction in online courses. Int. J. E-learn. 3(1), 61–67 (2004)
Campello, R.J., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data (TKDD) 10(1), 1–51 (2015)
Del Vicario, M., Scala, A., Caldarelli, G., Stanley, H.E., Quattrociocchi, W.: Modeling confirmation bias and polarization. Sci. Rep. 7(1), 1–9 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (long and short papers). pp. 4171–4186 (2019)
Frost, P., Casey, B., Griffin, K., Raymundo, L., Farrell, C., Carrigan, R.: The influence of confirmation bias on memory and source monitoring. J. Gen. Psychol. 142(4), 238–252 (2015)
Grootendorst, M.: Bertopic: leveraging bert and c-tf-idf to create easily interpretable topics, vol. 4381785 (2020). https://doi.org/10.5281/zenodo
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: A survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1 (long papers), pp. 1262–1273 (2014)
Hassan, T.: On bias in social reviews of university courses. In: Companion Publication of the 10th ACM Conference on Web Science, pp. 11–14 (2019)
Jiang, D., Shi, L., Lian, R., Wu, H.: Latent topic embedding. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2689–2698 (2016)
Kim, S.W.: Kepler vs Newton: teaching programming and math to almost all-majors in a single classroom. In: 2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), pp. 956–957 (2020). https://doi.org/10.1109/TALE48869.2020.9368332
Kop, R.: The challenges to connectivist learning on open online networks: Learning experiences during a massive open online course. Int. Rev. Res. Open Distrib. Learn. 12, 19–38 (2011)
Lau, J.H., Baldwin, T.: An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368 (2016)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, PMLR, pp. 1188–1196 (2014)
Lishinski, A., Yadav, A., Enbody, R.: Students’ emotional reactions to programming projects in introduction to programming: measurement approach and influence on learning outcomes. In: Proceedings of the 2017 ACM Conference on International Computing Education Research, pp. 30–38 (2017)
Liu, Y., et al.: Roberta: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Liu, Z.: Research on Keyword Extraction Using Document Topical Structure. Tsinghua University, Beijing (2011)
Lu, Y., Wang, B., Lu, Y.: Understanding key drivers of MOOC satisfaction and continuance intention to use. J. Electron. Commer. Res. 20(2), 105–117 (2019)
Luo, W., Litman, D.: Summarizing student responses to reflection prompts. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1955–1960 (2015)
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11) 2579–2605 (2008)
Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds.) AIED 2021. LNCS (LNAI), vol. 12748, pp. 282–292. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78292-4_23
McInnes, L., Healy, J., Astels, S.: hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017)
McInnes, L., Healy, J., Saul, N., Großberger, L.: UMAP: Uniform manifold approximation and projection. J. Open Source Softw. 3(29), 861 (2018)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Miller, D.: Leveraging bert for extractive text summarization on lectures. arXiv preprint arXiv:1906.04165 (2019)
Oswald, M.E., Grosjean, S.: Confirmation bias. In: Cognitive illusions: a Handbook on Fallacies and Biases in Thinking, Judgement and Memory, vol. 79 (2004)
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019)
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Min. Appli. Theory 1, 1–20 (2010)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.Y.: Mpnet: masked and permuted pre-training for language understanding. Adv. Neural. Inf. Process. Syst. 33, 16857–16867 (2020)
Timkey, W., van Schijndel, M.: All bark and no bite: rogue dimensions in transformer language models obscure representational quality. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 4527–4546 (2021)
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: Minilm: deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv. Neural. Inf. Process. Syst. 33, 5776–5788 (2020)
Weintrop, D., et al.: Defining computational thinking for mathematics and science classrooms. J. Sci. Educ. Technol. 25(1), 127–147 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Xiao, C., Shi, L., Cristea, A., Li, Z., Pan, Z. (2022). Fine-grained Main Ideas Extraction and Clustering of Online Course Reviews. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-11644-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11643-8
Online ISBN: 978-3-031-11644-5
eBook Packages: Computer ScienceComputer Science (R0)