Skip to main content

Keyword-Based Search on Bilingual Digital Libraries

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10151))

Abstract

This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Bibliša supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic dictionaries, SQL and NoSQL databases, which are distributed in different servers accessed in various ways. The web application has been tested on a collection of texts from 3 journals and 2 projects, comprising 299 documents generated from TMX, stored in a NoSQL database. The tool allows the full-text and metadata search, with extraction of concordance sentence pairs for translation and terminology work support.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://hlt.rgf.bg.ac.rs/Biblisha.

  2. 2.

    http://infoteka.bg.ac.rs/index.php/en/.

  3. 3.

    http://www.rgf.bg.ac.rs/publikacije/PodzemniRadovi.

  4. 4.

    http://www.iaus.ac.rs/code/navigate.aspx?Id=221.

  5. 5.

    http://www.baektel.eu/.

  6. 6.

    http://led.loria.fr/outils/ALIGN/align.html.

  7. 7.

    http://termi.rgf.bg.ac.rs.

  8. 8.

    http://www.marklogic.com.

  9. 9.

    The corpus processing system http://igm.univ-mlv.fr/~unitex.

References

  1. Gavrilidou, M., Labropoulou, P., Desipri, E., Giouli, V., Antonopoulos, V., Piperidis, S.: Building parallel corpora for econtent professionals. In: Proceedings of the Workshop on Multilingual Linguistic Resources, pp. 97–100. Association for Computational Linguistics (2004)

    Google Scholar 

  2. Graën, J., Clematide, S., Volk, M.: Efficient exploration of translation variants in large multiparallel corpora using a relational database. In: 4th WS on Challenges in the Management of Large Corpora (Workshop Programme), p. 20 (2016)

    Google Scholar 

  3. Gravano, L., Henzinger, M.H.: Systems and methods for using anchor text as parallel corpora for cross-language information retrieval, US Patent 8,631,010, January 2014. https://www.google.ch/patents/US8631010

  4. Kovačević, L., Injac, V., Begenišić, D.: Bibliotekarski terminološki rečnik: englesko-srpski, srpsko-engleski [Library Terminological Dictionary: English-Serbian, Serbian-English]. Narodna biblioteka Srbije (2004)

    Google Scholar 

  5. Krstev, C.: Processing of Serbian – Automata, Texts and Electronic Dictionaries. Faculty of Philology, University of Belgrade, Belgrade (2008)

    Google Scholar 

  6. Lytras, M., Sicilia, M.A., Davies, J., Kashyap, V., Lytras, M., Sicilia, M.A., Davies, J., Kashyap, V.: Digital libraries in the knowledge era: knowledge management and semantic web technologies. Libr. Manage. 26(4/5), 170–175 (2005)

    Article  Google Scholar 

  7. Obradović, I., Stanković, R., Utvić, M.: An integrated environment for development of parallel corpora. In: Die Unterschiede zwischen dem Bosnischen/Bosniakischen, Kroatischen und Serbischen, pp. 563–578 (2008)

    Google Scholar 

  8. Radovanović, M., Ivanović, M.: Text mining: approaches and applications. Novi Sad J. Math. 38(3), 227–234 (2008)

    MATH  Google Scholar 

  9. Savourel, Y.: TMX 1.4 b Specification, The Localisation Industry Standards Association (LISA) (2004)

    Google Scholar 

  10. Stanković, R., Krstev, C., Lazić, B., Vorkapić, D.: A bilingual digital library for academic and entrepreneurial knowledge management. In: Spender, J., Schiuma, G., Albino, V. (eds.) 10th International Forum on Knowledge Asset Dynamics – IFKAD 2015, pp. 1764–1777 (2015). http://www.knowledgeasset.org/Proceedings/

  11. Stanković, R., Krstev, C., Obradović, I., Kitanović, O.: Indexing of textual databases based on lexical resources: a case study for Serbian. In: Cardoso, J., Guerra, F., Houben, G.-J., Pinto, A.M., Velegrakis, Y. (eds.) KEYSTONE 2015. LNCS, vol. 9398, pp. 167–181. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27932-9_15

    Chapter  Google Scholar 

  12. Stanković, R., Krstev, C., Obradović, I., Trtovac, A., Utvić, M.: A tool for enhanced search of multilingual digital libraries of e-journals. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012) (2012)

    Google Scholar 

  13. Stanković, R., Obradović, I., Krstev, C., Vitas, D.: Production of morphological dictionaries of multi-word units using a multipurpose tool. In: Jassem, K., Fuglewicz, P.W., Piasecki, M., Przepiórkowski, A. (eds.) Proceedings of the Computational Linguistics-Applications Conference, pp. 77–84 (2011). ISBN: 978-83-60810-47-7

    Google Scholar 

  14. Stanković, R., Trivić, B., Kitanović, O., Blagojević, B., Nikolić, V.: The Development of the GeolISSTerm Terminological Dictionary. INFOtheca 12(1), 49a–63a (2011)

    Google Scholar 

  15. Thong, J.Y., Hong, W., Tam, K.Y.: What leads to user acceptance of digital libraries? Commun. ACM 47(11), 78–83 (2004)

    Article  Google Scholar 

  16. Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012) (2012)

    Google Scholar 

  17. Tufis, D., Cristea, D., Stamou, S.: Balkanet: aims, methods, results and perspectives. A general overview. Rom. J. Inf. Sci. Technol. 7(1–2), 9–43 (2004)

    Google Scholar 

  18. Volk, M., Graën, J., Callegaro, E.: Innovations in parallel corpus search tools. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 3172–3178 (2014)

    Google Scholar 

Download references

Acknowledgements

Preprocessing of texts and correction of the alignment were done by Biljana Lazić, Jelena Andonovski and Jelena Andjelković, PhD students at the Faculty of Philology and Danica Seničić, MSc student at the KU Leuven (LLN). This research was supported by Keystone COST Action IC1302 and Serbian Ministry of Education and Science under the grant \(\#\)III 47003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ranka Stanković .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Stanković, R., Krstev, C., Vitas, D., Vulović, N., Kitanović, O. (2017). Keyword-Based Search on Bilingual Digital Libraries. In: Calì, A., Gorgan, D., Ugarte, M. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2016. Lecture Notes in Computer Science(), vol 10151. Springer, Cham. https://doi.org/10.1007/978-3-319-53640-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53640-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53639-2

  • Online ISBN: 978-3-319-53640-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics