Abstract
The paper addresses the problem of automatic identification of phrases to be included in back-of-book indexes. We analyzed books in Polish and English published with subject indexes compiled by their authors. We checked what kinds of phrases are placed in those indexes and how often they actually occur in the corresponding books. In the experiments, we use existing terminology and keyphrase extraction tools. For Polish, the first tool is better than the second one, but for English texts, the results are inconclusive.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Black, J.S., et al.: Organizational Behavior. OpenStax, Houston (2019)
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Chang, J.S., et al.: A corpus-based statistical approach to automatic book indexing. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 147–151. Trento, Italy (1992)
Chumwatana, T., Wong, K., Xie, H.: An automatic indexing technique for Thai texts using frequent max substring. In: Proceedings of the 8th International Symposium on Natural Language Processing, pp. 67–72 (2009)
Greenlaw, S.A., David, S.: Principles of Economics 2e. OpenStax, Houston (2018)
Hajnicz, E.: Automatyczne tworzenie semantycznego słownika walencyjnego.Problemy Współczesnej Nauki. Teoria i Zastosowania: InżynieriaLingwistyczna, Akademicka Oficyna Wydawnicza EXIT, Warszawa (2011)
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)
Kieraś, W., Kobyliński, Ł., Ogrodniczuk, M.: Korpusomat – a tool for creating searchable morphosyntactically tagged corpora. Comput. Methods Sci. Technol. 24(1), 21–27 (2018). https://doi.org/10.12921/cmst.2018.0000005
Koutropoulou, T., Gallopoulos, E.: TMG-BoBI: generating back-of-the-book indexes with the text-to-matrix-generator. In: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–8 (2019)
Krawczyk, M.: Ekonomia eksperymentalna. Wolters Kluwer Polska Sp. z o.o.,Warszawa(2012)
Lv, S., Li, N., Tian, Y.: Improving index term extraction for Chinese books with professional score. In: 2016 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 2016). Atlantis Press (2016/12)
Marciniak, M.: Domain corpora as a source of information, Monograph Series, vol. 4. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2015)
Marciniak, M., Mykowiecka, A., Rychlik, P.: TermoPL – a flexible tool for terminology extraction. In: Proceedings of LREC, pp. 2278–2284. ELRA, Portorož, Slovenia (2016)
Mulvany, N.C.: Indexing Books. The University of Chicago Press, Chicago (2005)
Mykowiecka, A., Marciniak, M., Rychlik, P.: Recognition of irrelevant phrases in automatically extracted lists of domain terms. Terminology. Int. J. Theo. Appl. Issues Spec. Commun. 24, 66–90 (2018)
Müller, S.: Grammatical Theory. Language Science Press, Berlin (2018)
Ogrodniczuk, M.: Automatyczne wykrywanie nominalnych zależności referencyjnych w polskich tekstach współczesnych. Wydawnictwa Uniwersytetu Warszawskiego, Warszawa (2019)
Olesiński, Z.: Zarządzanie relacjami międzyorganizacyjnymi. Wydawnictwo C.H. Beck, Warszawa (2010)
Pacek, J.: Indeksowanie w XXI wieku. Ewolucja i współczesne funkcje pojęcia. Zagadnienia Informacji Naukowej (2), 32–49 (2006)
Przepiórkowski, A.: Powierzchniowe przetwarzanie języka polskiego. Akademicka Oficyna Wydawnicza EXIT, Warszawa (2008)
Saros, D.E.: Principles of Political Economy, 3e: A Pluralistic Approach to Economic Theory. bepress, Valparaiso (2020)
Stelmaszczyk, P. (ed.): Metodologie językoznawstwa. Podstawy teoretyczne. Wydawnictwo Uniwersytetu Łódzkiego, Łódź (2006)
The University of Chicago Press Editorial Staff: Indexes: A Chapter from The Chicago Manual of Style, Seventeenth Edition. University of Chicago Press (2017)
Ujwary-Gil, A.: Kapitał intelektualny a wartość rynkowa przedsiębiorstwa. Wydawnictwo C.H. Beck, Warszawa (2009)
Wacholder, N., Liu, L.: Assessing term effectiveness in the interactive information access process. Inf. Process. Manag. 44, 1022–1031 (2008)
Wellisch, H.: Indexing from A to Z. H.W Wilson, Bronx (1991)
Wolański, A.: Edycja tekstów. Praktyczny poradnik. Państwowe WydawnictwoNaukowe (2016)
Wu, Z., Li, Z., Mitra, P., Giles, C.L.: Can back-of-the-book indexes be automatically created? In: He, Q., Iyengar, A., Nejdl, W., Pei, J., Rastogi, R. (eds.) 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, October 27–November 1 2013, pp. 1745–1750. ACM (2013)
Łada, M., Kozarkiewicz, A.: Zarządzanie wartością projektów. Wydawnictwo C.H. Beck, Warszawa (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Marciniak, M., Mykowiecka, A., Rychlik, P. (2021). Terminology/Keyphrase Extraction for Creation of Book Indexes in Polish. In: Berget, G., Hall, M.M., Brenn, D., Kumpulainen, S. (eds) Linking Theory and Practice of Digital Libraries. TPDL 2021. Lecture Notes in Computer Science(), vol 12866. Springer, Cham. https://doi.org/10.1007/978-3-030-86324-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-86324-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86323-4
Online ISBN: 978-3-030-86324-1
eBook Packages: Computer ScienceComputer Science (R0)