Skip to main content

A Parallel Corpus Labeled Using Open and Restricted Domain Ontologies

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2009)

Abstract

The analysis and creation of annotated corpus is fundamental for implementing natural language processing solutions based on machine learning. In this paper we present a parallel corpus of 4500 questions in Spanish and English on the touristic domain, obtained from real users. With the aim of training a question answering system, the questions were labeled with the expected answer type, according to two different ontologies. The first one is an open domain ontology based on Sekine’s Extended Named Entity Hierarchy, while the second one is a restricted domain ontology, specific for the touristic field. Due to the use of two ontologies with different characteristics, we had to solve many problematic cases and adjusted our annotation thinking on the characteristics of each one. We present the analysis of the domain coverage of these ontologies and the results of the inter-annotator agreement. Finally we use a question classification system to evaluate the labeling of the corpus.

This research has been partially funded by the Spanish Government under project CICyT number TIC2003-07158-C04-01 and by the European Commission under FP6 project QALL-ME number 033860.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agichtein, E., Lawrence, S., Gravano, L.: Learning search engine specific query transformations for question answering. In: Proceedings of the 10th World Wide Web Conference (WWW 10) (2001)

    Google Scholar 

  2. Austin, J.: How to do things with words. In: CPaperback, 2nd edn. Harvard University Press (2005)

    Google Scholar 

  3. Berger, A., Caruana, R., Cohn, D., Freitag, D., Mittal, V.: Bridging the lexical chasm: statistical approaches to answer-finding. Research and Development in Information Retrieval, 192–199 (2000)

    Google Scholar 

  4. Burke, R., Hammond, K., Kulyukin, V., Lytinen, S., Tomuro, N., Schoenberg, S.: Question answering from frequently-asked question files: Experiences with the faq finder system. AI Magazine 18(2), 57–66 (1997)

    Google Scholar 

  5. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5), 378–382 (1971)

    Article  Google Scholar 

  6. Giampiccolo, D., Forner, P., Herrera, J., Peñas, A., Ayache, C., Forascu, C., Jijkoun, V., Osenova, P., Rocha, P., Sacaleanu, B., Sutcliffe, R.F.E.: Overview of the clef 2007 multilingual question answering track. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 200–236. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Grishman, R., Sundheim, B.: Message understanding conference- 6: A brief history. In: COLING, pp. 466–471 (1996)

    Google Scholar 

  8. Klettke, M., Bietz, M., Bruder, I., Heuer, A., Priebe, D., Neumann, G., Becker, M., Bedersdorfer, J., Uszkoreit, H., Maedche, A., Staab, S., Studer, R.: Getess - ontologien, objektrelationale datenbanken und textanalyse als bausteine einer semantischen suchmaschine. Datenbank-Spektrum 1, 14–24 (2001)

    Google Scholar 

  9. Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th international conference on Computational linguistics, Morristown, NJ, USA, pp. 1–7. Association for Computational Linguistics (2002)

    Google Scholar 

  10. Metzler, D., Croft, W.B.: Analysis of statistical question classification for fact-based questions. Information Retrieval 8(3), 481–504 (2005)

    Article  Google Scholar 

  11. Mollá, D., Vicedo, J.L.: Question answering in restricted domains: An overview. Computational Linguistic 33(1), 41–61 (2008)

    Article  Google Scholar 

  12. Ou, S., Pekar, V., Orasan, C., Spurk, C., Negri, M.: Development and alignment of a domain-specific ontology for question answering. In European Language Resources Association (ELRA) (ed.) Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (May 2008)

    Google Scholar 

  13. Ravichandran, D., Ittycheriah, A., Roukos, S.: Automatic derivation of surface text patterns for a maximum entropy based question answering system. In: Proceedings of the HLT-NAACL Conference (2003)

    Google Scholar 

  14. Sekine, S., Isahara, H.: Irex: Ir and ie evaluation project in japanese. In: European Language Resources Association (ELRA) (ed.) Proceedings of the Sixth International Language Resources and Evaluation (LREC 2000), Athens, Greece (May-June 2000)

    Google Scholar 

  15. Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: European Language Resources Association (ELRA) (ed.) Proceedings of the Sixth International Language Resources and Evaluation (LREC 2002), Las Palmas, Spain (March 2002)

    Google Scholar 

  16. Soricut, R., Brill, E.: Automatic question answering: Beyond the factoid. In: Proceedings of the HLT-NAACL Conference (2004)

    Google Scholar 

  17. Staab, S., Braun, C., Bruder, I., Düsterhöft, A., Heuer, A., Klettke, M., Neumann, G., Prager, B., Pretzel, J., Schnurr, H.-P., Studer, R., Uszkoreit, H., Wrenger, B.: Getess - searching the web exploiting german texts. In: Klusch, M., Shehory, O., Weiss, G. (eds.) CIA 1999. LNCS, vol. 1652, pp. 113–124. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  18. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    Book  MATH  Google Scholar 

  19. Voorhees, E.M.: Overview of trec 2007. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152. Springer, Heidelberg (2008)

    Google Scholar 

  20. Zhang, D., Lee, W.S.: Question classification using support vector machines. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 26–32. ACM, New York (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boldrini, E., Ferrández, S., Izquierdo, R., Tomás, D., Vicedo, J.L. (2009). A Parallel Corpus Labeled Using Open and Restricted Domain Ontologies. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00382-0_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00381-3

  • Online ISBN: 978-3-642-00382-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics