Skip to main content

Terminology/Keyphrase Extraction for Creation of Book Indexes in Polish

  • Conference paper
  • First Online:
Linking Theory and Practice of Digital Libraries (TPDL 2021)

Abstract

The paper addresses the problem of automatic identification of phrases to be included in back-of-book indexes. We analyzed books in Polish and English published with subject indexes compiled by their authors. We checked what kinds of phrases are placed in those indexes and how often they actually occur in the corresponding books. In the experiments, we use existing terminology and keyphrase extraction tools. For Polish, the first tool is better than the second one, but for English texts, the results are inconclusive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Black, J.S., et al.: Organizational Behavior. OpenStax, Houston (2019)

    Google Scholar 

  2. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)

    Article  Google Scholar 

  3. Chang, J.S., et al.: A corpus-based statistical approach to automatic book indexing. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 147–151. Trento, Italy (1992)

    Google Scholar 

  4. Chumwatana, T., Wong, K., Xie, H.: An automatic indexing technique for Thai texts using frequent max substring. In: Proceedings of the 8th International Symposium on Natural Language Processing, pp. 67–72 (2009)

    Google Scholar 

  5. Greenlaw, S.A., David, S.: Principles of Economics 2e. OpenStax, Houston (2018)

    Google Scholar 

  6. Hajnicz, E.: Automatyczne tworzenie semantycznego słownika walencyjnego.Problemy Współczesnej Nauki. Teoria i Zastosowania: InżynieriaLingwistyczna, Akademicka Oficyna Wydawnicza EXIT, Warszawa (2011)

    Google Scholar 

  7. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)

    Google Scholar 

  8. Kieraś, W., Kobyliński, Ł., Ogrodniczuk, M.: Korpusomat – a tool for creating searchable morphosyntactically tagged corpora. Comput. Methods Sci. Technol. 24(1), 21–27 (2018). https://doi.org/10.12921/cmst.2018.0000005

  9. Koutropoulou, T., Gallopoulos, E.: TMG-BoBI: generating back-of-the-book indexes with the text-to-matrix-generator. In: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–8 (2019)

    Google Scholar 

  10. Krawczyk, M.: Ekonomia eksperymentalna. Wolters Kluwer Polska Sp. z o.o.,Warszawa(2012)

    Google Scholar 

  11. Lv, S., Li, N., Tian, Y.: Improving index term extraction for Chinese books with professional score. In: 2016 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 2016). Atlantis Press (2016/12)

    Google Scholar 

  12. Marciniak, M.: Domain corpora as a source of information, Monograph Series, vol. 4. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2015)

    Google Scholar 

  13. Marciniak, M., Mykowiecka, A., Rychlik, P.: TermoPL – a flexible tool for terminology extraction. In: Proceedings of LREC, pp. 2278–2284. ELRA, Portorož, Slovenia (2016)

    Google Scholar 

  14. Mulvany, N.C.: Indexing Books. The University of Chicago Press, Chicago (2005)

    Google Scholar 

  15. Mykowiecka, A., Marciniak, M., Rychlik, P.: Recognition of irrelevant phrases in automatically extracted lists of domain terms. Terminology. Int. J. Theo. Appl. Issues Spec. Commun. 24, 66–90 (2018)

    Google Scholar 

  16. Müller, S.: Grammatical Theory. Language Science Press, Berlin (2018)

    Google Scholar 

  17. Ogrodniczuk, M.: Automatyczne wykrywanie nominalnych zależności referencyjnych w polskich tekstach współczesnych. Wydawnictwa Uniwersytetu Warszawskiego, Warszawa (2019)

    Google Scholar 

  18. Olesiński, Z.: Zarządzanie relacjami międzyorganizacyjnymi. Wydawnictwo C.H. Beck, Warszawa (2010)

    Google Scholar 

  19. Pacek, J.: Indeksowanie w XXI wieku. Ewolucja i współczesne funkcje pojęcia. Zagadnienia Informacji Naukowej (2), 32–49 (2006)

    Google Scholar 

  20. Przepiórkowski, A.: Powierzchniowe przetwarzanie języka polskiego. Akademicka Oficyna Wydawnicza EXIT, Warszawa (2008)

    Google Scholar 

  21. Saros, D.E.: Principles of Political Economy, 3e: A Pluralistic Approach to Economic Theory. bepress, Valparaiso (2020)

    Google Scholar 

  22. Stelmaszczyk, P. (ed.): Metodologie językoznawstwa. Podstawy teoretyczne. Wydawnictwo Uniwersytetu Łódzkiego, Łódź (2006)

    Google Scholar 

  23. The University of Chicago Press Editorial Staff: Indexes: A Chapter from The Chicago Manual of Style, Seventeenth Edition. University of Chicago Press (2017)

    Google Scholar 

  24. Ujwary-Gil, A.: Kapitał intelektualny a wartość rynkowa przedsiębiorstwa. Wydawnictwo C.H. Beck, Warszawa (2009)

    Google Scholar 

  25. Wacholder, N., Liu, L.: Assessing term effectiveness in the interactive information access process. Inf. Process. Manag. 44, 1022–1031 (2008)

    Article  Google Scholar 

  26. Wellisch, H.: Indexing from A to Z. H.W Wilson, Bronx (1991)

    Google Scholar 

  27. Wolański, A.: Edycja tekstów. Praktyczny poradnik. Państwowe WydawnictwoNaukowe (2016)

    Google Scholar 

  28. Wu, Z., Li, Z., Mitra, P., Giles, C.L.: Can back-of-the-book indexes be automatically created? In: He, Q., Iyengar, A., Nejdl, W., Pei, J., Rastogi, R. (eds.) 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, October 27–November 1 2013, pp. 1745–1750. ACM (2013)

    Google Scholar 

  29. Łada, M., Kozarkiewicz, A.: Zarządzanie wartością projektów. Wydawnictwo C.H. Beck, Warszawa (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Rychlik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marciniak, M., Mykowiecka, A., Rychlik, P. (2021). Terminology/Keyphrase Extraction for Creation of Book Indexes in Polish. In: Berget, G., Hall, M.M., Brenn, D., Kumpulainen, S. (eds) Linking Theory and Practice of Digital Libraries. TPDL 2021. Lecture Notes in Computer Science(), vol 12866. Springer, Cham. https://doi.org/10.1007/978-3-030-86324-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86324-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86323-4

  • Online ISBN: 978-3-030-86324-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics