Abstract
The paper describes a strategy that applies heuristics to combine sets of terminological words and words combination pre-extracted from a scientific text by several term recognition procedures. Each procedure is based on a collection of lexico-syntactic patterns representing specific linguistic information about terms within scientific texts. Our strategy is aimed to improve the quality of automatic term extraction from a particular scientific text. The experiments have shown that the strategy gives 11–17 % increase of F-measure compared with the commonly-used methods of term extraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arora, C., Sabetzadeh, M., Briand, L., Zimmer, F.: Improving requirements glossary construction via clustering: approach and industrial case studies. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, New York, NY (2014)
Bolshakova, E.I.: Recognition of author’s scientific and technical terms. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 281–290. Springer, Heidelberg (2001)
Bolshakova, E., Efremova, N., Noskov, A.: LSPL-patterns as a tool for information extraction from natural language texts. In: Markov, K., Ryazanov, V., Velychko, V., Aslanyan, L. (eds.) New Trends in Classification and Data Mining, pp. 110–118. ITHEA, Sofia (2010)
Bosma, W., Vossen, P.: Bootstrapping language neutral term extraction. In: Proceedings of the 7th Language Resources and Evaluation Conference, pp. 2277–2282. LREC, Valetta (2010)
Castellvi, M., Bagot, R., Palatresi, J.: Automatic term detection: a review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M.-C. (eds.) Recent Advances in Computational Terminology, pp. 53–87. John Benjamins, Amsterdam (2001)
Csomai, A., Mihalcea, R.: Investigations in unsupervised back-of-the-book indexing. In: Proceedings of the Florida Artificial Intelligence Research Society Conference, pp. 211–216 (2007)
Dobrov, B., Loukachevich, N., Syromiatnikov, S.: Forming base of terminological word combinations from problem oriented texts. In: Proceedings of the 5th Russian Scientific Conference “Digital Libraries: Perspective Methods and Technologies, Electronic Collections”, pp. 201–210 (2003) (in Russian)
Efremova, N.E.: Methods and Programming Tools for Extraction of Terminological Information from Scientific and Technical Texts. PhD Thesis, Lomonosov Moscow State University (2013) (in Russian)
Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-Word Terms: The C-value/NC-value method. In: Nikolau, C. et al. (Eds.) International Journal on Digital Libraries, vol. 3(2), pp. 115–130 (2000)
Jacquemin, C., Tsoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon, and syntax. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval, pp. 25–74. Kluwer Academic Publishers, Dordrecht (1999)
Korkontzelos, I., Ananiadou, S.: Term extraction. In: Oxford Handbook of Computational Linguistics (2nd Ed.). Oxford University Press, Oxford (2014)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(1), 157–169 (2004)
Nenadic, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: Proceedings of 20th International Conference on Computational Linguistics COLING 2004, pp. 604–610. Morristown, NJ (2004)
Nokel, M.A., Bolshakova, E.I., Loukachevich, N.V.: Combining multiple features for single-word term extraction. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 1, no. 11 pp. 490–501. RGGU, Moscow (2012)
Paice, C.D., Jones P.A.: The identification of important concepts in highly structured technical papers. In: Korfhage, R., Rasmussen, E., Willett, P. (eds.) Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 69–78. ACM, Pittsburgh, PA (1993)
Smadja, F., McKeown, K.: Automatically extracting and representing collocations for language generation. In: Proceedings of the 28th Annual Meeting on Association for Computational Linguistics, pp. 252–259. ACL, Pittsburgh, PA (1990)
Acknowledgements
We would like to thank the anonymous reviewers of our paper for their helpful and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bolshakova, E.I., Efremova, N.E. (2015). A Heuristic Strategy for Extracting Terms from Scientific Texts. In: Khachay, M., Konstantinova, N., Panchenko, A., Ignatov, D., Labunets, V. (eds) Analysis of Images, Social Networks and Texts. AIST 2015. Communications in Computer and Information Science, vol 542. Springer, Cham. https://doi.org/10.1007/978-3-319-26123-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-26123-2_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26122-5
Online ISBN: 978-3-319-26123-2
eBook Packages: Computer ScienceComputer Science (R0)