Skip to main content

Combining and Extending Data Infrastructures with Linguistic Annotation Services

  • Conference paper
  • First Online:
Book cover Worldwide Language Service Infrastructure (WLSI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9442))

Included in the following conference series:

  • 435 Accesses

Abstract

This paper reports on a first prototype implementation for combining and extending a data infrastructure with linguistic processing services, bringing language datasets and basic language processing services together in a unified platform thus boosting the organic growth of data and facilitating language technology research and development. The META-SHARE data infrastructure is enhanced by providing a language processing mechanism for annotating content with appropriate NLP services that are documented with the appropriate metadata. Atomic services are combined into workflows modeled as an acyclic directed graph where each node corresponds to an NLP processing service (e.g. sentence splitting, part-of-speech tagging). Services run either locally or remotely. Currently, the language processing layer implements services and workflows for processing monolingual and bilingual content/resources in raw text, xces, tmx formats. From the legal framework point of view, a simple operational model is adopted by which only openly licensed datasets can be processed by openly licensed services and workflows.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.meta-share.eu/org/net.

  2. 2.

    https://github.com/metashare/META-SHARE.

  3. 3.

    www.djangoproject.com.

  4. 4.

    http://www.python.org/about/.

  5. 5.

    SQLite can also be used. SQLite comes built-in with Python 2.7. Since SQLite has a number of limitations, including missing transaction management and access permission management, the preferred database is PostgreSQL.

  6. 6.

    PostgreSQL 9.0.5.

  7. 7.

    http://langrid.org/en/index.html.

  8. 8.

    http://www.clarin.eu/content/virtual-language-observatory.

  9. 9.

    import_xml.py.

  10. 10.

    http://www.meta-net.eu/meta-share/licenses.

  11. 11.

    http://www.qt21.eu/launchpad/.

  12. 12.

    http://www.clarin.gr.

  13. 13.

    http://qt21.metashare.ilsp.gr/.

  14. 14.

    http://camel.apache.org/.

  15. 15.

    In our tests, we used Jetty, which is small, fast and embeddable server that powers many software projects (e.g. Solr).

  16. 16.

    <tuv> tags.

References

  1. Soria, C., Bel, N., Choukri, K., Mariani, J., Monachini, M., Odijk, J., Piperidis, S., Quochi, V., Calzolari, N.: The FLaReNet strategic language resource agenda. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA), Istanbul, 23–25 May 2012

    Google Scholar 

  2. Wittenburg, P., Bel, N., Borin, L., Budin, G., Calzolari, N., Hajicova, E., Koskenniemi, K., Lemnitzer, L., Maegaard, B., Piasecki, M., Pierrel, J.M., Piperidis, S., Skadina, I., Tufis, D., Veenendaal, R.V., Váradi, T., Wynne, M.: Resource and service centres as the backbone for a sustainable service infrastructure. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), European Language Resources Association (ELRA), Valletta (2010)

    Google Scholar 

  3. Ishida, T. (ed.): The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. Springer, Heidelberg (2011)

    Google Scholar 

  4. Poch, M., Bel, N.: Interoperability and technology for a language resources factory. Article Presented in the Workshop on Language Resources, Technology and Services in the Sharing Paradigm at IJCNLP 2011, Chiang Mai, 12 November 2011

    Google Scholar 

  5. Ide, N., Pustejovsky, J., Cieri, C., Nyberg, E., Wang, D., Suderman, K., Verhagen, M., Wright, J.: The language application grid. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), European Language Resources Association (ELRA), Reykjavik (2014)

    Google Scholar 

  6. Piperidis, S.: The META-SHARE language resources sharing infrastructure: principles, challenges, solutions. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA), Istanbul, 23–25 May 2012

    Google Scholar 

  7. Piperidis, S., Papageorgiou, H., Spurk, C., Rehm, G., Choukri, K., Hamon, O., Calzolari, N., del Gratta, R., Magnini, B., Girardi, C.: METASHARE: one year after. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings Of The Ninth International Conference On Language Resources and Evaluation (LREC 2014), European Language Resources Association (ELRA), Reykjavik (2012)

    Google Scholar 

  8. Federmann, C., Georgantopoulos, B., Girardi, C., Hamon, O., Mavroeidis, D., Minutoli, S., Schröder, M.: META-SHARE v2: an open network of repositories for language resources including data and tools. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA), Istanbul, 23–25 May 2012

    Google Scholar 

  9. Gavrilidou, M., Labropoulou, P., Desypri, E., Piperidis, S., Papageorgiou, H., Monachini, M., Frontini, F., Declerck, T., Francopoulo, G., Arranz, V., Mapelli, V: The META-SHARE metadata schema for the description of language resources. In: Calzolari, N., Choukri, K., Declerck, T., Uğur Doğan, M., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA), Istanbul, 23–25 May 2012

    Google Scholar 

  10. Broeder, D., Kemps-Snijders, M., Van Uytvanck, D., Windhouwer, M., Withers, P., Wittenburg, P. Zinn, C.: A Data category registry- and component-based metadata framework. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), European Language Resources Association (ELRA), Valletta (2010)

    Google Scholar 

  11. ISO 12620. Terminology and other language and content resources – Specification of data categories and management of a Data Category Registry for language resources. (2009). http://www.isocat.org

Download references

Acknowledgements

This paper presents work done in the framework of the projects T4ME (GA no. 249119), QTLaunchPad project (GA no. 296347), funded by DG INFSO of the European Commission through the FP7 and ICT-PSP Programmes. The infrastructure described in the paper is maintained and further extended in the framework of the Greek CLARIN Attiki project (MIS 441451), Support for ESFRI/2006 Research Infrastructures, of the Greek Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stelios Piperidis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Piperidis, S., Galanis, D., Bakagianni, J., Sofianopoulos, S. (2016). Combining and Extending Data Infrastructures with Linguistic Annotation Services. In: Murakami, Y., Lin, D. (eds) Worldwide Language Service Infrastructure. WLSI 2015. Lecture Notes in Computer Science(), vol 9442. Springer, Cham. https://doi.org/10.1007/978-3-319-31468-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31468-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31467-9

  • Online ISBN: 978-3-319-31468-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics