Abstract
Natural Language (NL) processing tools, such as tokenizers, part-of speech taggers or syntactic processors obtain knowledge from a set of documents (e.g., tokens, syntactic patterns, etc.) and produce the different elements that will take part on the discourse universe in a NL text (e.g., noun phrases, verbs, sentences, etc.). In this paper, we present how NL software systems development can be performed incrementally by using a high-performance specification language like Maude. A generic algebraic specification for NL is defined, including sorts and sub-sorts apart from equational properties, such as associativity and commutativity for built-in lists and sets. Then, the full discourse universe, available for NL processing, is described in terms of the algebraic specification by providing a non-deterministic but terminating set of transformation rules. Finally, and as a proof of concept, a set of documents for NL processing is given to Maude as an input term and successfully transformed into a proper document, exploring all the non-deterministic possibilities, as well as resolving the ambiguity in language. The main advantages of implementing NL in this manner are: generality, transparency, extensibility, reusability, and maintainability. To the best of our knowledge, this is the first attempt to represent and develop complex NL software systems with this formal notation, and based on the analysis conducted, this implementation constitute the basis for the design and development of more specific NL processing applications, such as text summarization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ghezzi C, Jazayeri M, Mandrioli D (2002) Fundamentals of software engineering, 2nd edn. Prentice Hall PTR, Upper Saddle River, NJ
Pressman RS (2001) Software engineering: a practitioner’s approach, 5th edn. McGraw-Hill Higher Education, Columbus, OH
Hinchey M, Jackson M, Cousot P, Cook B, Bowen JP, Margaria T (2008) Software engineering and formal methods. Commun ACM 51(9):54–59
Dale R, Somers HL, Moisl H (eds) (2000) Handbook of natural language processing. Marcel Dekker, Inc., New York, NY
Leidner J (2003) Current issues in software engineering for natural language processing. In: Proceedings of the workshop on software engineering and architecture of language technology systems, pp 45–50
Frankel D (2002) Model driven architecture: applying MDA to enterprise computing. Wiley, New York, NY
Czarnecki K, Eisenecker UW (2000) Generative programming: methods, tools, and applications. ACM Press/Addison-Wesley Publishing Co., New York, NY
Clavel M, Durán F, Eker S, Lincoln P, Martí-Oliet N, Meseguer J, Talcott CL (eds) (2007) All about Maude—a high-performance logical framework, vol 4350, How to specify, program and verify systems in rewriting logic, Lecture Notes in Computer Science. Springer, Heidelberg
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Klein D, Manning CD (2003) Accurate unlexicalized parsing. In: Proceedings of the 41st annual meeting on association for computational linguistics—vol 1. Association for Computational Linguistics, Stroudsburg, PA, pp 423–430
Pereira F, Warren D (1980) Definite clause grammars for language analysis—a survey of the formalism and a comparison with augmented transition networks. Artif Intell 13:231–278
Steedman M, Baldridge J (2011) Combinatory categorial grammar. Wiley-Blackwell, Oxford, pp 181–224
Steedman M (2010) Some important problems in natural language processing. Technical report, University of Edinburgh
Huang F, Yates A, Ahuja A, Downey D (2011) Language models as representations for weakly-supervised nlp tasks. In: Proceedings of the fifteenth conference on computational natural language learning, pp 125–134
Bateman JA, Hois J, Ross R, Tenbrink T (2010) A linguistic ontology of space for natural language processing. Artif Intell 174(14):1027–1071
Chiarcos C (2012) A generic formalism to represent linguistic corpora in rdf and owl/dl. In: Proceedings of the eight international conference on language resources and evaluation (LREC’12)
Clavel M, Durán F, Eker S, Lincoln P, Martí-Oliet N, Meseguer J, Talcott C (2003) The Maude 2.0 system. In: Rewriting techniques and applications (RTA 2003), 2706, pp 76–87
Lloret E, Escobar S, Palomar M, Ramos I (2013) Natural language modelling using maude. Technical report, University of Alicante
Martínez-Barco P, Ferrández-Rodríguez A, Tomás D, Lloret E, Saquete E, Llopis F, Peral J, Palomar M, Gmez-Soriano JM, Romá MT (2013) Legolang: Técnicas de deconstrucción en la tecnolog´ıas del lenguaje humano. Procesamiento de Lenguaje natural (51)
Acknowledgements
E. Lloret and M. Palomar have been partially funded by the Spanish Government through the project TEXT-MESS 2.0 (TIN2009-13391-C04) and Técnicas de Deconstrucción en la tecnologías del Lenguaje Humano (TIN2012-3 1224) and by the Generalitat Valenciana through project PROMETEO (PROMETEO/2009/199). Moreover, S. Escobar has been partially supported by the EU (FEDER) and the Spanish MEC/MICINN under grant TIN 2010-21062-C02-02, and by Generalitat Valenciana PROMETEO201 1/052.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Lloret, E., Escobar, S., Palomar, M., Ramos, I. (2014). Incremental and Adaptive Software Systems Development of Natural Language Applications. In: José Escalona, M., Aragón, G., Linger, H., Lang, M., Barry, C., Schneider, C. (eds) Information System Development. Springer, Cham. https://doi.org/10.1007/978-3-319-07215-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-07215-9_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07214-2
Online ISBN: 978-3-319-07215-9
eBook Packages: Computer ScienceComputer Science (R0)