Skip to main content

Embedding Transcription and Transliteration Layers in the Digital Library of Polish and Poland-Related News Pamphlets

  • Conference paper
  • First Online:
Towards Open and Trustworthy Digital Societies (ICADL 2021)

Abstract

The paper presents an experiment intended to overcome the problem of searching for different spelling variants in old Polish prints. In the case of The Digital Library of Polish and Poland-related Ephemeral Prints from the 16th, 17th and 18th Centuries two concurrent layers of text (transliteration and transcription) underlying selected digital library items are available in the related Electronic Corpus of the 17th and 18th Century Polish Texts (until 1772). Both variants are retrieved and a double-hidden layer representation of a sample item is prepared and made available for textual searching in a PDF containing its scanned image. The experiment can be generalized to other libraries dealing with multiple concurrent textual interpretations of graphical items.

The work was financed by a research grant from the Polish Ministry of Science and Higher Education under the National Programme for the Development of Humanities for the years 2019–2023 (grant 11H 18 0413 86, grant funds received: 1,797,741 PLN).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See https://cbdu.ijp.pan.pl/.

  2. 2.

    See also https://korba.edu.pl/overview?lang=en.

  3. 3.

    The intention of transliteration is accurate representation of the graphemes of a text while transcription is concerned with representing its phonemes.

  4. 4.

    See Translation into Contemporary Polish section of print 1264 at http://cbdu.ijp.pan.pl/12640/.

  5. 5.

    Improving Access to Texts international project, see also http://www.impact-project.eu.

  6. 6.

    See https://korba.edu.pl/.

  7. 7.

    KORBA project is being continued until 2023 and several new texts from CBDU will be included in the corpus.

  8. 8.

    See https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/.

  9. 9.

    See https://community.coherentpdf.com/.

  10. 10.

    See https://cbdu.ijp.pan.pl/id/eprint/11790/.

  11. 11.

    Tested with Chrome 91.0.4472.124, Firefox 90.0 and Edge 91.0.864.67.

References

  1. EPrints Manual (2010). http://wiki.eprints.org/w/EPrints_Manual

  2. Bień, J.S.: Efficient search in hidden text of large DjVu documents. In: Bernardi, R., Chambers, S., Gottfried, B., Segond, F., Zaihrayeu, I. (eds.) AT4DL/NLP4DL -2009. LNCS, vol. 6699, pp. 1–14. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23160-5_1

    Chapter  Google Scholar 

  3. Bień, J.S.: The IMPACT project Polish Ground-Truth texts as a Djvu corpus. Cogn. Stud. 75–84 (2014)

    Google Scholar 

  4. Bronikowska, R., Gruszczyński, W., Ogrodniczuk, M., Woliński, M.: The use of electronic historical dictionary data in corpus design. Stud. Pol. Linguist. 11(2), 47–56 (2016). https://doi.org/10.4467/23005920SPL.16.003.4818

    Article  Google Scholar 

  5. Gruszczyński, W. (ed.): Elektroniczny słownik języka polskiego XVII i XVIII w. (Electronic Dictionary of the 17th and the 18th century Polish, in Polish). Institute of Polish Language, Polish Academy of Sciences (2004). https://sxvii.pl/

  6. Gruszczyński, W., Adamiec, D., Bronikowska, R., Wieczorek, A.: Elektroniczny Korpus Tekstów Polskich z XVII i XVIII w. - problemy teoretyczne i warsztatowe. Poradnik Językowy (8/2020 (777)), 32–51 (2020). https://doi.org/10.33896/porj.2020.8.3

  7. Gruszczyński, W., Ogrodniczuk, M.: Cyfrowa Biblioteka Druków Ulotnych Polskich i Polski dotyczących z XVI, XVII i XVIII w. w nauce i dydaktyce (Digital Library of Poland-related Old Ephemeral Prints in research and teaching. In: Polish). In: Materiały konferencji Polskie Biblioteki Cyfrowe 2010 (Proceedings of the Polish Digital Libraries 2010 Conference), Poznań, Poland, pp. 23–27 (2010)

    Google Scholar 

  8. Ogrodniczuk, M., Gruszczyński, W.: Digital library of Poland-related old ephemeral prints: preserving multilingual cultural heritage. In: Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, Hissar, Bulgaria, pp. 27–33 (2011). http://www.aclweb.org/anthology/W11-4105

  9. Ogrodniczuk, M., Gruszczyński, W.: Digital library 2.0 – source of knowledge and research collaboration platform. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavík, Iceland, pp. 1649–1653. European Language Resources Association (2014). http://www.lrec-conf.org/proceedings/lrec2014/pdf/14_Paper.pdf

  10. Ogrodniczuk, M., Gruszczyński, W.: Connecting data for digital libraries: the library, the dictionary and the corpus. In: Jatowt, A., Maeda, A., Syn, S.Y. (eds.) ICADL 2019. LNCS, vol. 11853, pp. 125–138. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34058-2_13

    Chapter  Google Scholar 

  11. Zawadzki, K.: Gazety ulotne polskie i Polski dotyczące z XVI, XVII i XVIII wieku (Polish and Poland-related Ephemeral Prints from the 16th-18th Centuries, in Polish). National Ossoliński Institute, Polish Academy of Sciences, Wrocław (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Ogrodniczuk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ogrodniczuk, M., Gruszczyński, W. (2021). Embedding Transcription and Transliteration Layers in the Digital Library of Polish and Poland-Related News Pamphlets. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91669-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91668-8

  • Online ISBN: 978-3-030-91669-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics