Abstract
In this paper, we propose a top-down approach for converting business processes information from corporate documents into controlled language. This proposal is achieved with a multi-level methodology. We first characterize document structure by using rhetorical analysis to determine relevant sections for information extraction. Then, a verb-centered event analysis is performed to start defining the typical patterns featured by business processes information. Lastly, morpho-syntactic and dependency parsing is carried out for extracting this information. This multi-level knowledge is used to define rules for converting the extracted sentences into a controlled language, which is intended to be used in software requirements elicitation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This work has been partly funded by the University of Medellin’s Research Vice-provost’s Office, Wake Forest University, and National University of Colombia, under the project: “Defining a Specific-Domain Controlled Language: Linguistic and Transformational Bases from Corporate Documents in Natural Language”.
- 2.
A controlled natural language is a sub-language of the corresponding natural language [2].
- 3.
UN-Lencep is the Spanish acronym for ‘National University of Colombia—Controlled language for the specification of pre-conceptual models.
- 4.
We used Freeling (http://nlp.lsi.upc.edu/freeling/) for dependency parsing.
References
Manrique-Losada, B., Burgos, D.A., Zapata-Jaramillo, C.M.: Exploring MWEs for knowledge acquisition from corporate technical documents. In: 9th Workshop on Multiword Expressions -MWE 2013, NAACL 2013, Atlanta, July 2013
Fuchs, N.E., Schwitter, R.: Specifying logic programs in controlled natural language. Technical report IFI 95.17, University of Zurich (1995)
Cybulski, J.L., Reed, K.: Requirements classification and reuse: crossing domain boundaries. In: Frakes, W.B. (ed.) ICSR 2000. LNCS, vol. 1844, pp. 190–210. Springer, Heidelberg (2000)
Cleland-Huang, J., Marrero, W., Berenbach, B.: Goal-centric traceability: using virtual plumblines to maintain critical systemic qualities. Trans. Soft. Eng. 34, 685–699 (2008)
Bajwa, I.S., Lee, M., Bordbar, B.: SBVR business rules generation from natural language specification. In: AAAI Spring Symposium, pp. 2–8. AAAI, San Francisco (2011)
Meth, H., Li, Y., Maedche, A. Mueller, B.: Advancing task elicitation systems–an experimental evaluation of design principles. In: Proceedings of 33rd International Conference on Information Systems, pp. 54–68. AISEL, Florida (2012)
Young, J.D., Antón, A.I.: A method for identifying software requirements based on policy commitments. In: 18th International Requirements engineering Conference, pp. 47–56. IEEE, Sydney (2010)
Wang, F.H.: On acquiring classification knowledge from noisy data based on rough set. Expert Syst. Appl. 29(1), 49–64 (2005)
Dinesh, N., Joshi, A., Lee, I. Webber, B.: Extracting formal specifications from natural language regulatory documents. In: ICoS-5, Buxton (2006)
Vegega, C., Amatriain, H., Pytel, P., Pollo, F., Britos, P., García, R.: Formalización de Dominios de Negocio basada en Técnicas de Ingeniería del Conocimiento para Proyectos de Explotación de Información. In: Proceedings of IX JIISIC, pp. 79–86. PUCP, Lima (2012)
Aysolmaz, B., Demirors, O.: Modeling business processes to generate artifacts for software development: a methodology. In: Proceedings of the 6th International Workshop on Modeling in Software Engineering, pp. 7–12. ACM, New York (2014)
Hao, J., Yan, Y., Gong, L., Wang, G., Lin, J.: Knowledge map-based method for domain knowledge browsing. Decis. Support Syst. 61, 106–114 (2014)
Tavares, V., Santoro, F.M., Borges, M.R.S.: A context-based model for knowledge management embodied in work processes. Inf. Sci. 179, 2538–2554 (2009)
Azaustre, A., Casas, J.: Manual de retórica española. Ariel, Barcelona (1997)
Burdiles, G.A.: Descripción de la organización retórica del género caso clínico de la medicina a partir del corpus CCM-2009. Ph.D. thesis in Applied Linguistics. Pontificia Universidad Católica de Valparaíso, Chile (2011)
Swales, J.M.: Research Genres: Explorations and Applications. Univ. Press, Cambridge (2004)
Parodi, G.: Lingüística de corpus: una introducción al ámbito. Revista de Lingüística Teórica y Aplicada 46(1), 93–119 (2008)
Manrique-Losada, B.: A formalization for mapping discourses from business-based technical documents into controlled language texts for requirements elicitation. Ph.D. thesis, Universidad Nacional de Colombia (2014)
Pivovarova, L., Huttunen, S., Yangarber, R.: Event representation across genre. In: Proceedings of 1st Workshop on EVENTS: Definition, Detection, Coreference, and Representation, pp. 29–37 (2013)
Do, Q.X., Chan, Y.S., Roth, D.: Minimally supervised event causality identification. In: EMNLP 2011 (2011)
Bejan, C.A., Harabagiu, S.: Unsupervised Event Coreference Resolution. Computational Linguistics 40(2) (2013)
Vossen, P. (ed.): EuroWordNet General Document. Version 3. University of Amsterdam, Amsterdam (2002)
Chaowicharat, E., Naruedomkul, K.: Co-ocurrence-based error correction approach to word segmentation. In: Boonthum-Denecke, C., McCarthy, P.M., Lamkin, T. (eds.) Cross-Disciplinary Advances in Applied Natural Language Processing. Issues and Approaches, pp. 354–364. Information Sciences Reference Publishers (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Manrique-Losada, B., Zapata-Jaramillo, C.M., Burgos, D.A. (2016). Re-expressing Business Processes Information from Corporate Documents into Controlled Language. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-41754-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41753-0
Online ISBN: 978-3-319-41754-7
eBook Packages: Computer ScienceComputer Science (R0)