Abstract
The generalization ability of the supervised model is relatively weak in keyword extraction technology. For enhancing the robustness of the model, a keyword extraction method is proposed inspired by the pre-training model. After pre-training with plenty of corpus and fine-tuning with specific datasets, the proposed method performs more robust in keyword extraction tasks. In addition, multi-task training is added in the fine-tuning stage to improve the accuracy of the model. Plenty of comparative experiments show that the proposed method is very significant in improving the robustness and accuracy of the model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc
Campos R, Mangaravite V, Pasquali A et al (2018) YAKE! collection-independent automatic keyword extractor. In: European conference on information retrieval. Springer, Cham, pp 806–810
Page L, Brin S, Motwani R et al (1999) The pagerank citation ranking: bringing order to the web. Stanford InfoLab
Mihalcea R, Textrank TP (2004) Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411
Witten IH, Paynter GW, Frank E et al (2005) Kea: practical automated keyphrase extraction. In: Design and usability of digital libraries: case studies in the Asia Pacific. IGI global, pp 129–152
Basaldella M, Antolli E, Serra G et al (2018) Bidirectional lstm recurrent neural network for keyphrase extraction. In: Italian research conference on digital libraries. Springer, Cham, pp 180–187
Meng R, Zhao S, Han S et al (2017) Deep keyphrase generation. arXiv:1704.06879
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Florescu C, Positionrank CC (2017) An unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1105–1115
Sun Y, Qiu H, Zheng Y et al (2020) IFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8:10896–10906
Zhang Y, Tuo M, Yin Q et al (2020) Keywords extraction with deep neural network model. Neurocomputing 383:113–121
Devlin J, Chang M W, Lee K et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Radford A, Wu J, Child R et al (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9
Zhang Z, Han X, Liu Z et al (2019) ERNIE: Enhanced language representation with informative entities. arXiv:1905.07129
Lan Z, Chen M, Goodman S et al (2019) Albert: a lite bert for self-supervised learning of language representations. arXiv:1909.11942
Liu Y, Ott M, Goyal N et al (2019) Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692
Yang Z, Dai Z, Yang Y et al (2019) Xlnet: Generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems, pp 5754–5764
Krapivin M, Autaeu A, Marchese M (2009) Large dataset for keyphrases extraction. University of Trento
Nguyen TD, Kan MY (2007) Keyphrase extraction in scientific publications. International conference on Asian digital libraries. Springer, Berlin, Heidelberg, pp 317–326
Kim SN, Medelyan O, Kan MY et al (2010) Semeval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th international workshop on semantic evaluation, pp 21–26
Loper E, Bird S (2016) NLTK: the natural language toolkit. cs/0205028
Abadi M, Barham P, Chen J et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Guo, L., Sun, H., Qi, Q., Wang, J. (2022). Keyword Extraction Algorithm Based on Pre-training and Multi-task Training. In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Sixth International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 235. Springer, Singapore. https://doi.org/10.1007/978-981-16-2377-6_67
Download citation
DOI: https://doi.org/10.1007/978-981-16-2377-6_67
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2376-9
Online ISBN: 978-981-16-2377-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)