A Heuristic Strategy for Extracting Terms from Scientific Texts

Bolshakova, Elena I.; Efremova, Natalia E.

doi:10.1007/978-3-319-26123-2_29

Elena I. Bolshakova¹⁵ &
Natalia E. Efremova¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 542))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

1036 Accesses
1 Citations

Abstract

The paper describes a strategy that applies heuristics to combine sets of terminological words and words combination pre-extracted from a scientific text by several term recognition procedures. Each procedure is based on a collection of lexico-syntactic patterns representing specific linguistic information about terms within scientific texts. Our strategy is aimed to improve the quality of automatic term extraction from a particular scientific text. The experiments have shown that the strategy gives 11–17 % increase of F-measure compared with the commonly-used methods of term extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arora, C., Sabetzadeh, M., Briand, L., Zimmer, F.: Improving requirements glossary construction via clustering: approach and industrial case studies. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, New York, NY (2014)
Google Scholar
Bolshakova, E.I.: Recognition of author’s scientific and technical terms. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 281–290. Springer, Heidelberg (2001)
Chapter Google Scholar
Bolshakova, E., Efremova, N., Noskov, A.: LSPL-patterns as a tool for information extraction from natural language texts. In: Markov, K., Ryazanov, V., Velychko, V., Aslanyan, L. (eds.) New Trends in Classification and Data Mining, pp. 110–118. ITHEA, Sofia (2010)
Google Scholar
Bosma, W., Vossen, P.: Bootstrapping language neutral term extraction. In: Proceedings of the 7th Language Resources and Evaluation Conference, pp. 2277–2282. LREC, Valetta (2010)
Google Scholar
Castellvi, M., Bagot, R., Palatresi, J.: Automatic term detection: a review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M.-C. (eds.) Recent Advances in Computational Terminology, pp. 53–87. John Benjamins, Amsterdam (2001)
Chapter Google Scholar
Csomai, A., Mihalcea, R.: Investigations in unsupervised back-of-the-book indexing. In: Proceedings of the Florida Artificial Intelligence Research Society Conference, pp. 211–216 (2007)
Google Scholar
Dobrov, B., Loukachevich, N., Syromiatnikov, S.: Forming base of terminological word combinations from problem oriented texts. In: Proceedings of the 5th Russian Scientific Conference “Digital Libraries: Perspective Methods and Technologies, Electronic Collections”, pp. 201–210 (2003) (in Russian)
Google Scholar
Efremova, N.E.: Methods and Programming Tools for Extraction of Terminological Information from Scientific and Technical Texts. PhD Thesis, Lomonosov Moscow State University (2013) (in Russian)
Google Scholar
Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-Word Terms: The C-value/NC-value method. In: Nikolau, C. et al. (Eds.) International Journal on Digital Libraries, vol. 3(2), pp. 115–130 (2000)
Google Scholar
Jacquemin, C., Tsoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon, and syntax. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval, pp. 25–74. Kluwer Academic Publishers, Dordrecht (1999)
Chapter Google Scholar
Korkontzelos, I., Ananiadou, S.: Term extraction. In: Oxford Handbook of Computational Linguistics (2nd Ed.). Oxford University Press, Oxford (2014)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(1), 157–169 (2004)
Article Google Scholar
Nenadic, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: Proceedings of 20th International Conference on Computational Linguistics COLING 2004, pp. 604–610. Morristown, NJ (2004)
Google Scholar
Nokel, M.A., Bolshakova, E.I., Loukachevich, N.V.: Combining multiple features for single-word term extraction. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 1, no. 11 pp. 490–501. RGGU, Moscow (2012)
Google Scholar
Paice, C.D., Jones P.A.: The identification of important concepts in highly structured technical papers. In: Korfhage, R., Rasmussen, E., Willett, P. (eds.) Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 69–78. ACM, Pittsburgh, PA (1993)
Google Scholar
Smadja, F., McKeown, K.: Automatically extracting and representing collocations for language generation. In: Proceedings of the 28th Annual Meeting on Association for Computational Linguistics, pp. 252–259. ACL, Pittsburgh, PA (1990)
Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers of our paper for their helpful and constructive comments.

Author information

Authors and Affiliations

Lomonosov Moscow State University, National Research University Higher School of Economics, Moscow, Russia
Elena I. Bolshakova & Natalia E. Efremova

Authors

Elena I. Bolshakova
View author publications
You can also search for this author in PubMed Google Scholar
Natalia E. Efremova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elena I. Bolshakova .

Editor information

Editors and Affiliations

Krasovsky Institute of Mathematics and Mechanics, Yekaterinburg, Russia
Mikhail Yu. Khachay
Wolverhampton, United Kingdom
Natalia Konstantinova
Technische Universität Darmstadt, Darmstadt, Germany
Alexander Panchenko
National Research University Higher School of Economics, Moscow, Russia
Dmitry Ignatov
Ural Federal University, Yekaterinbug, Russia
Valeri G. Labunets

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bolshakova, E.I., Efremova, N.E. (2015). A Heuristic Strategy for Extracting Terms from Scientific Texts. In: Khachay, M., Konstantinova, N., Panchenko, A., Ignatov, D., Labunets, V. (eds) Analysis of Images, Social Networks and Texts. AIST 2015. Communications in Computer and Information Science, vol 542. Springer, Cham. https://doi.org/10.1007/978-3-319-26123-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-26123-2_29
Published: 05 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26122-5
Online ISBN: 978-3-319-26123-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics