Abstract
The paper presents an experiment intended to overcome the problem of searching for different spelling variants in old Polish prints. In the case of The Digital Library of Polish and Poland-related Ephemeral Prints from the 16th, 17th and 18th Centuries two concurrent layers of text (transliteration and transcription) underlying selected digital library items are available in the related Electronic Corpus of the 17th and 18th Century Polish Texts (until 1772). Both variants are retrieved and a double-hidden layer representation of a sample item is prepared and made available for textual searching in a PDF containing its scanned image. The experiment can be generalized to other libraries dealing with multiple concurrent textual interpretations of graphical items.
The work was financed by a research grant from the Polish Ministry of Science and Higher Education under the National Programme for the Development of Humanities for the years 2019–2023 (grant 11H 18 0413 86, grant funds received: 1,797,741 PLN).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
See also https://korba.edu.pl/overview?lang=en.
- 3.
The intention of transliteration is accurate representation of the graphemes of a text while transcription is concerned with representing its phonemes.
- 4.
See Translation into Contemporary Polish section of print 1264 at http://cbdu.ijp.pan.pl/12640/.
- 5.
Improving Access to Texts international project, see also http://www.impact-project.eu.
- 6.
- 7.
KORBA project is being continued until 2023 and several new texts from CBDU will be included in the corpus.
- 8.
- 9.
- 10.
- 11.
Tested with Chrome 91.0.4472.124, Firefox 90.0 and Edge 91.0.864.67.
References
EPrints Manual (2010). http://wiki.eprints.org/w/EPrints_Manual
Bień, J.S.: Efficient search in hidden text of large DjVu documents. In: Bernardi, R., Chambers, S., Gottfried, B., Segond, F., Zaihrayeu, I. (eds.) AT4DL/NLP4DL -2009. LNCS, vol. 6699, pp. 1–14. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23160-5_1
Bień, J.S.: The IMPACT project Polish Ground-Truth texts as a Djvu corpus. Cogn. Stud. 75–84 (2014)
Bronikowska, R., Gruszczyński, W., Ogrodniczuk, M., Woliński, M.: The use of electronic historical dictionary data in corpus design. Stud. Pol. Linguist. 11(2), 47–56 (2016). https://doi.org/10.4467/23005920SPL.16.003.4818
Gruszczyński, W. (ed.): Elektroniczny słownik języka polskiego XVII i XVIII w. (Electronic Dictionary of the 17th and the 18th century Polish, in Polish). Institute of Polish Language, Polish Academy of Sciences (2004). https://sxvii.pl/
Gruszczyński, W., Adamiec, D., Bronikowska, R., Wieczorek, A.: Elektroniczny Korpus Tekstów Polskich z XVII i XVIII w. - problemy teoretyczne i warsztatowe. Poradnik Językowy (8/2020 (777)), 32–51 (2020). https://doi.org/10.33896/porj.2020.8.3
Gruszczyński, W., Ogrodniczuk, M.: Cyfrowa Biblioteka Druków Ulotnych Polskich i Polski dotyczących z XVI, XVII i XVIII w. w nauce i dydaktyce (Digital Library of Poland-related Old Ephemeral Prints in research and teaching. In: Polish). In: Materiały konferencji Polskie Biblioteki Cyfrowe 2010 (Proceedings of the Polish Digital Libraries 2010 Conference), Poznań, Poland, pp. 23–27 (2010)
Ogrodniczuk, M., Gruszczyński, W.: Digital library of Poland-related old ephemeral prints: preserving multilingual cultural heritage. In: Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, Hissar, Bulgaria, pp. 27–33 (2011). http://www.aclweb.org/anthology/W11-4105
Ogrodniczuk, M., Gruszczyński, W.: Digital library 2.0 – source of knowledge and research collaboration platform. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavík, Iceland, pp. 1649–1653. European Language Resources Association (2014). http://www.lrec-conf.org/proceedings/lrec2014/pdf/14_Paper.pdf
Ogrodniczuk, M., Gruszczyński, W.: Connecting data for digital libraries: the library, the dictionary and the corpus. In: Jatowt, A., Maeda, A., Syn, S.Y. (eds.) ICADL 2019. LNCS, vol. 11853, pp. 125–138. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34058-2_13
Zawadzki, K.: Gazety ulotne polskie i Polski dotyczące z XVI, XVII i XVIII wieku (Polish and Poland-related Ephemeral Prints from the 16th-18th Centuries, in Polish). National Ossoliński Institute, Polish Academy of Sciences, Wrocław (1990)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ogrodniczuk, M., Gruszczyński, W. (2021). Embedding Transcription and Transliteration Layers in the Digital Library of Polish and Poland-Related News Pamphlets. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-91669-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)