Abstract
This paper describes the integration of verbal idioms into an Natural Language Processing (NLP) system, adopting a construction approach, which is based on the prior parsing stage, so that these Multi-Word Expressions (MWE) can be taken into account in subsequent tasks, such as semantic role labeling or whole-part relation extraction. The paper focuses on body-part nouns, which are often part of many verbal idioms, and uses a manually annotated corpus to evaluate its parsing strategy. Results showed a precision of 0.92, 0.83 recall, 0.87 f-measure and an accuracy 0.99.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ait-Mokhtar, S., Chanod, J., Roux, C.: Robustness beyond shallowness: incremental dependency parsing. Natural Language Engineering 8(2/3), 121–144 (2002)
Baptista, J., Correia, A., Fernandes, G.: Frozen Sentences of Portuguese: Formal Descriptions for NLP. In: Workshop on Multiword Expressions: Integrating Processing. Intl. Conf. of the European Chapter of the ACL, Barcelona, Spain, pp. 72–79 (2004)
Baptista, J., Correia, A., Fernandes, G.: Léxico Gramática das Frases Fixas do Portugués Europeo. Cadernos de Fraseoloxía Galega 7, 41–53 (2005)
Baptista, J., Mamede, N., Gomes, F.: Auxiliary verbs and verbal chains in European Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 110–119. Springer, Heidelberg (2010)
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Copestake, A.: Representing idioms. Presentation at the HPSG Conference, Copenhagen (1994)
Fernandes, G., Baptista, J.: Frozen sentences with obligatory negation: linguistic challenges for natural language processing. In: Mellado-Blanco, C. (ed.) Colocaciones y Fraseología en Los Diccionarios, pp. 85–96. Peter Lang, Frankfurt (2008)
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psych. Bull. 76(5), 378–382 (1971)
Fleiss, J.L.: Statistical methods for rates and proportions, 2nd edn. John Wiley, New York (1981)
Gross, M.: Une classification des phrases “figées” du français. Revue Québécoise de Linguistique 12(2), 1–16 (1982)
Gross, M.: Lexicon-Grammar. In: Brown, K., Miller, J. (eds.) Concise Encyclopedia of Syntactic Theories, pp. 244–259. Pergamon, Cambridge (1996)
Landis, J., Koch, G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)
Mamede, N., Baptista, J., Diniz, C., Cabarrão, V.: STRING: An Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese. In: Intl. Conf. on Computational Processing of Portuguese, Propor 2012, vol. Demo Session (2012), Paper available at http://www.propor2012.org/demos/DemoSTRING.pdf
Markov, I.: Automatic Identification of Whole-Part Relations in Portuguese. Master’s thesis. U. Algarve, Faro (2014)
Ramisch, C., Araújo, V., Villavicencio, A.: A Broad Evaluation of Techniques for Automatic Acquisition of Multiword Expressions. In: Proceedings of the ACL 2012 Student Research Workshop, pp. 1–6. ACL (2012)
Rocha, P., Santos, D.: CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa. In: Nunes, M.G. (ed.) V Encontro para o processamento computacional da língua portuguesa escrita e falada (PROPOR 2000), pp. 131–140. ICMC/USP, São Paulo (2000)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: A pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)
Vale, O.: Expressões Cristalizadas do Português do Brasil: uma proposta de tipologia. Ph.D. thesis. Universidade Estadual Paulista, Araraquara, SP (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Baptista, J., Mamede, N., Markov, I. (2014). Integrating Verbal Idioms into an NLP System. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-09761-9_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)