LTWNN: A Novel Approach Using Sentence Embeddings for Extracting Diverse Concepts in MOOCs

Wu, Zhijie; Zhu, Jia; Xu, Shi; Yan, Zhiwen; Liang, Wanying

doi:10.1007/978-3-030-97546-3_62

LTWNN: A Novel Approach Using Sentence Embeddings for Extracting Diverse Concepts in MOOCs

Zhijie Wu ORCID: orcid.org/0000-0003-2425-5836¹¹,
Jia Zhu ORCID: orcid.org/0000-0002-5959-390X¹²,
Shi Xu¹¹,
Zhiwen Yan¹¹ &
…
Wanying Liang¹¹

Conference paper
First Online: 19 March 2022

1754 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13151))

Abstract

As a global online education platform, Massive Open Online Courses (MOOCs) provide high-quality learning content. It is a challenging issue to design a key course concept for students with different backgrounds. Even though much work concerned with course concept extraction in MOOC has been done, those related works simply utilize external knowledge to get the relatedness of two different candidate concepts. Furthermore, they require the input to belong to multi-document and severely rely on seed sets, in which their model shows poor performance when input is a single document. Addressing these drawbacks, we tackle concept extraction from a single document using LTWNN, a novel method Learning to Weight with Neural Network for Course Concept Extraction in MOOCs. With LTWNN, we make full use of external knowledge via making relatedness between each candidate concept and document by introducing an embedding-based maximal marginal relevance (MMR), which explicitly increases diversity among selected concepts. Moreover, we combine the inner statistical information and external knowledge, in which the neural network automatically learns to allocate weight for them. Experiments on different course corpus show that our method outperforms alternative methods.

Supported by organization x.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The source dataset is released on http://moocdata.cn/data/concept-extraction.
2.
https://github.com/boudinfl/pke.
3.
https://github.com/thukg/concept-expansion-snippet.

References

Adi, Y., Kermany, E., Belinkov, Y., Lavi, O., Goldberg, Y.: Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. arXiv preprint arXiv:1608.04207 (2016)
Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 (2018)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine (1998)
Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998)
Google Scholar
Chen, P., Lu, Y., Zheng, V.W., Chen, X., Yang, B.: KnowEdu: a system to construct knowledge graph for education. IEEE Access 6, 31553–31563 (2018)
Article Google Scholar
Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Google Scholar
Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
Google Scholar
Hisamitsu, T., Niwa, Y., Tsujii, J.: A method of measuring term representativeness-baseline method using co-occurrence distribution. In: COLING 2000: The 18th International Conference on Computational Linguistics, vol. 1 (2000)
Google Scholar
Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)
Google Scholar
Korkontzelos, I., Klapaftis, I.P., Manandhar, S.: Reviewing and evaluating automatic term recognition techniques. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 248–259. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85287-2_24
Chapter Google Scholar
Li, S., Li, J., Song, T., Li, W., Chang, B.: A novel topic model for automatic term extraction. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 885–888 (2013)
Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 366–376 (2010)
Google Scholar
Logeswaran, L., Lee, H.: An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893 (2018)
Lu, W., Zhou, Y., Yu, J., Jia, C.: Concept extraction and prerequisite relation learning from educational data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9678–9685 (2019)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mori, T., Sasaki, T.: Information gain ratio meets maximal marginal relevance. In: les actes de National Institute of Informatics Test Collections for Information Retrieval (NTCIR) (2002)
Google Scholar
Pagliardini, M., Gupta, P., Jaggi, M.: Unsupervised learning of sentence embeddings using compositional n-Gram features. arXiv preprint arXiv:1703.02507 (2017)
Pan, L., Wang, X., Li, C., Li, J., Tang, J.: Course concept extraction in MOOCs via embedding-based graph propagation. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 875–884 (2017)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142, New Jersey, USA (2003)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Seaton, D.T., Bergner, Y., Chuang, I., Mitros, P., Pritchard, D.E.: Who does what in a massive open online course? Commun. ACM 57(4), 58–65 (2014)
Article Google Scholar
Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C.R., Han, J.: Automated phrase mining from massive text corpora. IEEE Trans. Knowl. Data Eng. 30(10), 1825–1837 (2018)
Article Google Scholar
Yu, J., et al.: Course concept expansion in MOOCs with external knowledge and interactive game. arXiv preprint arXiv:1909.07739 (2019)
Zesch, T., Gurevych, I.: Approximate matching for evaluating keyphrase extraction. In: Proceedings of the International Conference RANLP-2009, pp. 484–489 (2009)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (No. 62077015).

Author information

Authors and Affiliations

School of Computer Science, South China Normal University, Guangzhou, China
Zhijie Wu, Shi Xu, Zhiwen Yan & Wanying Liang
Zhejiang Normal University, Zhejiang, China
Jia Zhu

Authors

Zhijie Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jia Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwen Yan
View author publications
You can also search for this author in PubMed Google Scholar
Wanying Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia Zhu .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, NSW, Australia
Guodong Long
RMIT University, Melbourne, SA, Australia
Xinghuo Yu
University of Queensland, Brisbane, QLD, Australia
Sen Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Z., Zhu, J., Xu, S., Yan, Z., Liang, W. (2022). LTWNN: A Novel Approach Using Sentence Embeddings for Extracting Diverse Concepts in MOOCs. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_62

Download citation

DOI: https://doi.org/10.1007/978-3-030-97546-3_62
Published: 19 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97545-6
Online ISBN: 978-3-030-97546-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics