Skip to main content

Integrating Verbal Idioms into an NLP System

  • Conference paper
Book cover Computational Processing of the Portuguese Language (PROPOR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8775))

  • 636 Accesses

Abstract

This paper describes the integration of verbal idioms into an Natural Language Processing (NLP) system, adopting a construction approach, which is based on the prior parsing stage, so that these Multi-Word Expressions (MWE) can be taken into account in subsequent tasks, such as semantic role labeling or whole-part relation extraction. The paper focuses on body-part nouns, which are often part of many verbal idioms, and uses a manually annotated corpus to evaluate its parsing strategy. Results showed a precision of 0.92, 0.83 recall, 0.87 f-measure and an accuracy 0.99.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ait-Mokhtar, S., Chanod, J., Roux, C.: Robustness beyond shallowness: incremental dependency parsing. Natural Language Engineering 8(2/3), 121–144 (2002)

    Google Scholar 

  2. Baptista, J., Correia, A., Fernandes, G.: Frozen Sentences of Portuguese: Formal Descriptions for NLP. In: Workshop on Multiword Expressions: Integrating Processing. Intl. Conf. of the European Chapter of the ACL, Barcelona, Spain, pp. 72–79 (2004)

    Google Scholar 

  3. Baptista, J., Correia, A., Fernandes, G.: Léxico Gramática das Frases Fixas do Portugués Europeo. Cadernos de Fraseoloxía Galega 7, 41–53 (2005)

    Google Scholar 

  4. Baptista, J., Mamede, N., Gomes, F.: Auxiliary verbs and verbal chains in European Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 110–119. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)

    Article  Google Scholar 

  6. Copestake, A.: Representing idioms. Presentation at the HPSG Conference, Copenhagen (1994)

    Google Scholar 

  7. Fernandes, G., Baptista, J.: Frozen sentences with obligatory negation: linguistic challenges for natural language processing. In: Mellado-Blanco, C. (ed.) Colocaciones y Fraseología en Los Diccionarios, pp. 85–96. Peter Lang, Frankfurt (2008)

    Google Scholar 

  8. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psych. Bull. 76(5), 378–382 (1971)

    Article  Google Scholar 

  9. Fleiss, J.L.: Statistical methods for rates and proportions, 2nd edn. John Wiley, New York (1981)

    MATH  Google Scholar 

  10. Gross, M.: Une classification des phrases “figées” du français. Revue Québécoise de Linguistique 12(2), 1–16 (1982)

    Google Scholar 

  11. Gross, M.: Lexicon-Grammar. In: Brown, K., Miller, J. (eds.) Concise Encyclopedia of Syntactic Theories, pp. 244–259. Pergamon, Cambridge (1996)

    Google Scholar 

  12. Landis, J., Koch, G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  13. Mamede, N., Baptista, J., Diniz, C., Cabarrão, V.: STRING: An Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese. In: Intl. Conf. on Computational Processing of Portuguese, Propor 2012, vol. Demo Session (2012), Paper available at http://www.propor2012.org/demos/DemoSTRING.pdf

  14. Markov, I.: Automatic Identification of Whole-Part Relations in Portuguese. Master’s thesis. U. Algarve, Faro (2014)

    Google Scholar 

  15. Ramisch, C., Araújo, V., Villavicencio, A.: A Broad Evaluation of Techniques for Automatic Acquisition of Multiword Expressions. In: Proceedings of the ACL 2012 Student Research Workshop, pp. 1–6. ACL (2012)

    Google Scholar 

  16. Rocha, P., Santos, D.: CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa. In: Nunes, M.G. (ed.) V Encontro para o processamento computacional da língua portuguesa escrita e falada (PROPOR 2000), pp. 131–140. ICMC/USP, São Paulo (2000)

    Google Scholar 

  17. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: A pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  18. Vale, O.: Expressões Cristalizadas do Português do Brasil: uma proposta de tipologia. Ph.D. thesis. Universidade Estadual Paulista, Araraquara, SP (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Baptista, J., Mamede, N., Markov, I. (2014). Integrating Verbal Idioms into an NLP System. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09761-9_28

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09760-2

  • Online ISBN: 978-3-319-09761-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics