Abstract
As a global online education platform, Massive Open Online Courses (MOOCs) provide high-quality learning content. It is a challenging issue to design a key course concept for students with different backgrounds. Even though much work concerned with course concept extraction in MOOC has been done, those related works simply utilize external knowledge to get the relatedness of two different candidate concepts. Furthermore, they require the input to belong to multi-document and severely rely on seed sets, in which their model shows poor performance when input is a single document. Addressing these drawbacks, we tackle concept extraction from a single document using LTWNN, a novel method Learning to Weight with Neural Network for Course Concept Extraction in MOOCs. With LTWNN, we make full use of external knowledge via making relatedness between each candidate concept and document by introducing an embedding-based maximal marginal relevance (MMR), which explicitly increases diversity among selected concepts. Moreover, we combine the inner statistical information and external knowledge, in which the neural network automatically learns to allocate weight for them. Experiments on different course corpus show that our method outperforms alternative methods.
Supported by organization x.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The source dataset is released on http://moocdata.cn/data/concept-extraction.
- 2.
- 3.
References
Adi, Y., Kermany, E., Belinkov, Y., Lavi, O., Goldberg, Y.: Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. arXiv preprint arXiv:1608.04207 (2016)
Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 (2018)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine (1998)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998)
Chen, P., Lu, Y., Zheng, V.W., Chen, X., Yang, B.: KnowEdu: a system to construct knowledge graph for education. IEEE Access 6, 31553–31563 (2018)
Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
Hisamitsu, T., Niwa, Y., Tsujii, J.: A method of measuring term representativeness-baseline method using co-occurrence distribution. In: COLING 2000: The 18th International Conference on Computational Linguistics, vol. 1 (2000)
Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)
Korkontzelos, I., Klapaftis, I.P., Manandhar, S.: Reviewing and evaluating automatic term recognition techniques. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 248–259. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85287-2_24
Li, S., Li, J., Song, T., Li, W., Chang, B.: A novel topic model for automatic term extraction. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 885–888 (2013)
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 366–376 (2010)
Logeswaran, L., Lee, H.: An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893 (2018)
Lu, W., Zhou, Y., Yu, J., Jia, C.: Concept extraction and prerequisite relation learning from educational data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9678–9685 (2019)
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mori, T., Sasaki, T.: Information gain ratio meets maximal marginal relevance. In: les actes de National Institute of Informatics Test Collections for Information Retrieval (NTCIR) (2002)
Pagliardini, M., Gupta, P., Jaggi, M.: Unsupervised learning of sentence embeddings using compositional n-Gram features. arXiv preprint arXiv:1703.02507 (2017)
Pan, L., Wang, X., Li, C., Li, J., Tang, J.: Course concept extraction in MOOCs via embedding-based graph propagation. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 875–884 (2017)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142, New Jersey, USA (2003)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Seaton, D.T., Bergner, Y., Chuang, I., Mitros, P., Pritchard, D.E.: Who does what in a massive open online course? Commun. ACM 57(4), 58–65 (2014)
Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C.R., Han, J.: Automated phrase mining from massive text corpora. IEEE Trans. Knowl. Data Eng. 30(10), 1825–1837 (2018)
Yu, J., et al.: Course concept expansion in MOOCs with external knowledge and interactive game. arXiv preprint arXiv:1909.07739 (2019)
Zesch, T., Gurevych, I.: Approximate matching for evaluating keyphrase extraction. In: Proceedings of the International Conference RANLP-2009, pp. 484–489 (2009)
Acknowledgment
This work was supported by the National Natural Science Foundation of China (No. 62077015).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Z., Zhu, J., Xu, S., Yan, Z., Liang, W. (2022). LTWNN: A Novel Approach Using Sentence Embeddings for Extracting Diverse Concepts in MOOCs. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_62
Download citation
DOI: https://doi.org/10.1007/978-3-030-97546-3_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97545-6
Online ISBN: 978-3-030-97546-3
eBook Packages: Computer ScienceComputer Science (R0)