Terminology/Keyphrase Extraction for Creation of Book Indexes in Polish

Marciniak, Małgorzata; Mykowiecka, Agnieszka; Rychlik, Piotr

doi:10.1007/978-3-030-86324-1_5

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12866))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

607 Accesses

Abstract

The paper addresses the problem of automatic identification of phrases to be included in back-of-book indexes. We analyzed books in Polish and English published with subject indexes compiled by their authors. We checked what kinds of phrases are placed in those indexes and how often they actually occur in the corresponding books. In the experiments, we use existing terminology and keyphrase extraction tools. For Polish, the first tool is better than the second one, but for English texts, the results are inconclusive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Black, J.S., et al.: Organizational Behavior. OpenStax, Houston (2019)
Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Article Google Scholar
Chang, J.S., et al.: A corpus-based statistical approach to automatic book indexing. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 147–151. Trento, Italy (1992)
Google Scholar
Chumwatana, T., Wong, K., Xie, H.: An automatic indexing technique for Thai texts using frequent max substring. In: Proceedings of the 8th International Symposium on Natural Language Processing, pp. 67–72 (2009)
Google Scholar
Greenlaw, S.A., David, S.: Principles of Economics 2e. OpenStax, Houston (2018)
Google Scholar
Hajnicz, E.: Automatyczne tworzenie semantycznego słownika walencyjnego.Problemy Współczesnej Nauki. Teoria i Zastosowania: InżynieriaLingwistyczna, Akademicka Oficyna Wydawnicza EXIT, Warszawa (2011)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)
Google Scholar
Kieraś, W., Kobyliński, Ł., Ogrodniczuk, M.: Korpusomat – a tool for creating searchable morphosyntactically tagged corpora. Comput. Methods Sci. Technol. 24(1), 21–27 (2018). https://doi.org/10.12921/cmst.2018.0000005
Koutropoulou, T., Gallopoulos, E.: TMG-BoBI: generating back-of-the-book indexes with the text-to-matrix-generator. In: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–8 (2019)
Google Scholar
Krawczyk, M.: Ekonomia eksperymentalna. Wolters Kluwer Polska Sp. z o.o.,Warszawa(2012)
Google Scholar
Lv, S., Li, N., Tian, Y.: Improving index term extraction for Chinese books with professional score. In: 2016 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 2016). Atlantis Press (2016/12)
Google Scholar
Marciniak, M.: Domain corpora as a source of information, Monograph Series, vol. 4. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2015)
Google Scholar
Marciniak, M., Mykowiecka, A., Rychlik, P.: TermoPL – a flexible tool for terminology extraction. In: Proceedings of LREC, pp. 2278–2284. ELRA, Portorož, Slovenia (2016)
Google Scholar
Mulvany, N.C.: Indexing Books. The University of Chicago Press, Chicago (2005)
Google Scholar
Mykowiecka, A., Marciniak, M., Rychlik, P.: Recognition of irrelevant phrases in automatically extracted lists of domain terms. Terminology. Int. J. Theo. Appl. Issues Spec. Commun. 24, 66–90 (2018)
Google Scholar
Müller, S.: Grammatical Theory. Language Science Press, Berlin (2018)
Google Scholar
Ogrodniczuk, M.: Automatyczne wykrywanie nominalnych zależności referencyjnych w polskich tekstach współczesnych. Wydawnictwa Uniwersytetu Warszawskiego, Warszawa (2019)
Google Scholar
Olesiński, Z.: Zarządzanie relacjami międzyorganizacyjnymi. Wydawnictwo C.H. Beck, Warszawa (2010)
Google Scholar
Pacek, J.: Indeksowanie w XXI wieku. Ewolucja i współczesne funkcje pojęcia. Zagadnienia Informacji Naukowej (2), 32–49 (2006)
Google Scholar
Przepiórkowski, A.: Powierzchniowe przetwarzanie języka polskiego. Akademicka Oficyna Wydawnicza EXIT, Warszawa (2008)
Google Scholar
Saros, D.E.: Principles of Political Economy, 3e: A Pluralistic Approach to Economic Theory. bepress, Valparaiso (2020)
Google Scholar
Stelmaszczyk, P. (ed.): Metodologie językoznawstwa. Podstawy teoretyczne. Wydawnictwo Uniwersytetu Łódzkiego, Łódź (2006)
Google Scholar
The University of Chicago Press Editorial Staff: Indexes: A Chapter from The Chicago Manual of Style, Seventeenth Edition. University of Chicago Press (2017)
Google Scholar
Ujwary-Gil, A.: Kapitał intelektualny a wartość rynkowa przedsiębiorstwa. Wydawnictwo C.H. Beck, Warszawa (2009)
Google Scholar
Wacholder, N., Liu, L.: Assessing term effectiveness in the interactive information access process. Inf. Process. Manag. 44, 1022–1031 (2008)
Article Google Scholar
Wellisch, H.: Indexing from A to Z. H.W Wilson, Bronx (1991)
Google Scholar
Wolański, A.: Edycja tekstów. Praktyczny poradnik. Państwowe WydawnictwoNaukowe (2016)
Google Scholar
Wu, Z., Li, Z., Mitra, P., Giles, C.L.: Can back-of-the-book indexes be automatically created? In: He, Q., Iyengar, A., Nejdl, W., Pei, J., Rastogi, R. (eds.) 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, October 27–November 1 2013, pp. 1745–1750. ACM (2013)
Google Scholar
Łada, M., Kozarkiewicz, A.: Zarządzanie wartością projektów. Wydawnictwo C.H. Beck, Warszawa (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science Polish Academy of Sciences, Jana Kazimierza 5, Warsaw, Poland
Małgorzata Marciniak, Agnieszka Mykowiecka & Piotr Rychlik

Authors

Małgorzata Marciniak
View author publications
You can also search for this author in PubMed Google Scholar
Agnieszka Mykowiecka
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Rychlik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piotr Rychlik .

Editor information

Editors and Affiliations

OsloMet – Oslo Metropolitan University, Oslo, Norway
Gerd Berget
The Open University, Milton Keynes, UK
Mark Michael Hall
Martin Luther University Halle-Wittenberg, Halle, Germany
Daniel Brenn
Tampere University, Tampere, Finland
Sanna Kumpulainen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marciniak, M., Mykowiecka, A., Rychlik, P. (2021). Terminology/Keyphrase Extraction for Creation of Book Indexes in Polish. In: Berget, G., Hall, M.M., Brenn, D., Kumpulainen, S. (eds) Linking Theory and Practice of Digital Libraries. TPDL 2021. Lecture Notes in Computer Science(), vol 12866. Springer, Cham. https://doi.org/10.1007/978-3-030-86324-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-86324-1_5
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86323-4
Online ISBN: 978-3-030-86324-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Terminology/Keyphrase Extraction for Creation of Book Indexes in Polish