Abstract
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Bibliša supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic dictionaries, SQL and NoSQL databases, which are distributed in different servers accessed in various ways. The web application has been tested on a collection of texts from 3 journals and 2 projects, comprising 299 documents generated from TMX, stored in a NoSQL database. The tool allows the full-text and metadata search, with extraction of concordance sentence pairs for translation and terminology work support.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
The corpus processing system http://igm.univ-mlv.fr/~unitex.
References
Gavrilidou, M., Labropoulou, P., Desipri, E., Giouli, V., Antonopoulos, V., Piperidis, S.: Building parallel corpora for econtent professionals. In: Proceedings of the Workshop on Multilingual Linguistic Resources, pp. 97–100. Association for Computational Linguistics (2004)
Graën, J., Clematide, S., Volk, M.: Efficient exploration of translation variants in large multiparallel corpora using a relational database. In: 4th WS on Challenges in the Management of Large Corpora (Workshop Programme), p. 20 (2016)
Gravano, L., Henzinger, M.H.: Systems and methods for using anchor text as parallel corpora for cross-language information retrieval, US Patent 8,631,010, January 2014. https://www.google.ch/patents/US8631010
Kovačević, L., Injac, V., Begenišić, D.: Bibliotekarski terminološki rečnik: englesko-srpski, srpsko-engleski [Library Terminological Dictionary: English-Serbian, Serbian-English]. Narodna biblioteka Srbije (2004)
Krstev, C.: Processing of Serbian – Automata, Texts and Electronic Dictionaries. Faculty of Philology, University of Belgrade, Belgrade (2008)
Lytras, M., Sicilia, M.A., Davies, J., Kashyap, V., Lytras, M., Sicilia, M.A., Davies, J., Kashyap, V.: Digital libraries in the knowledge era: knowledge management and semantic web technologies. Libr. Manage. 26(4/5), 170–175 (2005)
Obradović, I., Stanković, R., Utvić, M.: An integrated environment for development of parallel corpora. In: Die Unterschiede zwischen dem Bosnischen/Bosniakischen, Kroatischen und Serbischen, pp. 563–578 (2008)
Radovanović, M., Ivanović, M.: Text mining: approaches and applications. Novi Sad J. Math. 38(3), 227–234 (2008)
Savourel, Y.: TMX 1.4 b Specification, The Localisation Industry Standards Association (LISA) (2004)
Stanković, R., Krstev, C., Lazić, B., Vorkapić, D.: A bilingual digital library for academic and entrepreneurial knowledge management. In: Spender, J., Schiuma, G., Albino, V. (eds.) 10th International Forum on Knowledge Asset Dynamics – IFKAD 2015, pp. 1764–1777 (2015). http://www.knowledgeasset.org/Proceedings/
Stanković, R., Krstev, C., Obradović, I., Kitanović, O.: Indexing of textual databases based on lexical resources: a case study for Serbian. In: Cardoso, J., Guerra, F., Houben, G.-J., Pinto, A.M., Velegrakis, Y. (eds.) KEYSTONE 2015. LNCS, vol. 9398, pp. 167–181. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27932-9_15
Stanković, R., Krstev, C., Obradović, I., Trtovac, A., Utvić, M.: A tool for enhanced search of multilingual digital libraries of e-journals. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012) (2012)
Stanković, R., Obradović, I., Krstev, C., Vitas, D.: Production of morphological dictionaries of multi-word units using a multipurpose tool. In: Jassem, K., Fuglewicz, P.W., Piasecki, M., Przepiórkowski, A. (eds.) Proceedings of the Computational Linguistics-Applications Conference, pp. 77–84 (2011). ISBN: 978-83-60810-47-7
Stanković, R., Trivić, B., Kitanović, O., Blagojević, B., Nikolić, V.: The Development of the GeolISSTerm Terminological Dictionary. INFOtheca 12(1), 49a–63a (2011)
Thong, J.Y., Hong, W., Tam, K.Y.: What leads to user acceptance of digital libraries? Commun. ACM 47(11), 78–83 (2004)
Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012) (2012)
Tufis, D., Cristea, D., Stamou, S.: Balkanet: aims, methods, results and perspectives. A general overview. Rom. J. Inf. Sci. Technol. 7(1–2), 9–43 (2004)
Volk, M., Graën, J., Callegaro, E.: Innovations in parallel corpus search tools. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 3172–3178 (2014)
Acknowledgements
Preprocessing of texts and correction of the alignment were done by Biljana Lazić, Jelena Andonovski and Jelena Andjelković, PhD students at the Faculty of Philology and Danica Seničić, MSc student at the KU Leuven (LLN). This research was supported by Keystone COST Action IC1302 and Serbian Ministry of Education and Science under the grant \(\#\)III 47003.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Stanković, R., Krstev, C., Vitas, D., Vulović, N., Kitanović, O. (2017). Keyword-Based Search on Bilingual Digital Libraries. In: Calì, A., Gorgan, D., Ugarte, M. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2016. Lecture Notes in Computer Science(), vol 10151. Springer, Cham. https://doi.org/10.1007/978-3-319-53640-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-53640-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53639-2
Online ISBN: 978-3-319-53640-8
eBook Packages: Computer ScienceComputer Science (R0)