ABSTRACT
The conceptual design of the Extract -- Transform -- Load (ETL) processes is a crucial, burdensome, and challenging procedure that takes places at the early phases of a Data Warehouse project. Several models have been proposed for the conceptual design and representation of ETL processes, but all share two inconveniences: they require intensive human effort from the designers to create them, as well as technical knowledge from the business people to understand them. In a previous work, we have relaxed the former difficulty by working on the automation of the conceptual design leveraging Semantic Web technology. In this paper, we built upon our previous results and we tackle the second issue by investigating the application of natural language generation techniques to the ETL environment. In particular, we provide a method for the representation of a conceptual ETL design as a narrative, which is the most natural means of communication and does not require knowledge of any specific model. We discuss how linguistic techniques can be used for the establishment of a common application vocabulary. Finally, we present a flexible and customizable template-based mechanism for generating natural language representations for the ETL process requirements and operations.
- Bontcheva, K.: Generating Tailored Textual Summaries from Ontologies. In ESWC, 2005. Google ScholarDigital Library
- Bontcheva, K., Wilks, Y.: Automatic Report Generation from Ontologies: The MIAKT Approach. In NLDB, 2004.Google ScholarCross Ref
- Dalianis, H., Hovy, E.H.: Aggregation in Natural Language Generation. In EWNLG, 1993. Google ScholarDigital Library
- van Deemter, K, Theune, M., Krahmer, E.: Real versus Template-Based Natural Language Generation: A False Opposition? Computational Linguistics 31(1), 2005. Google ScholarDigital Library
- IBM. IBM WebSphere DataStage. URL: http://www-306.ibm.com/software/data/integration/datastage/Google Scholar
- Informatica. PowerCenter. URL: http://www.informatica.com/powercenter/Google Scholar
- Kedad, Z., Métais, E.: Ontology-Based Data Cleaning. In NLDB, 2002.Google ScholarCross Ref
- Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit (chapter 11). Wiley Publishing, Inc., 2004.Google Scholar
- Kimball, R., et al.: The Data Warehouse Lifecycle Toolkit. John Wiley & Sons, 1998. Google ScholarDigital Library
- Kiyavitskaya, N., Zeni, N., Mich, L., Mylopoulos, J.: Experimenting with Linguistic Tools for Conceptual Modelling: Quality of the Models and Critical Features. In NLDB, 2004.Google ScholarCross Ref
- Kof, L.: Natural Language Processing: Mature Enough for Requirements Documents Analysis? In NLDB, 2005. Google ScholarDigital Library
- Luján-Mora, S., Vassiliadis, P., Trujillo, J.: Data Mapping Diagrams for Data Warehouse Design with UML. In ER, 2004.Google Scholar
- Mazon, J-N., Trujillo, J., Serrano, M., Piattini, M.: Applying MDA to the Development of Data Warehouses. In DOLAP, 2005. Google ScholarDigital Library
- Metais, E., Meunier, J., Levreau, G.: Database Schema Design: A Perspective from Natural Language Techniques to Validation and View Integration. In ER, 1993. Google ScholarDigital Library
- Microsoft. Data Transformation Services. URL: http://www.microsoft.com/sql/prodinfo/features/Google Scholar
- Oracle. Oracle Warehouse Builder Product Page. URL: http://otn.oracle.com/products/warehouse/content.htmlGoogle Scholar
- Rahm, E., Bernstein, P. A.: A survey of approaches to automatic schema matching. In VLDB J. 10(4), 2001. Google ScholarDigital Library
- Reape, M., Mellish, C.: Just What is Aggregation Anyway? In ENLG, 1999.Google Scholar
- Reiter, E., Mellish, C., Levine, J.: Automatic generation of technical documentation. In Applied Artificial Intelligence 9(3), 1995.Google Scholar
- Rolland, C., Proix, C.: A Natural Language Approach for Requirements Engineering. In CAiSE, 1992.Google ScholarCross Ref
- Romero, O., Abelló, A.: Automating Multidimensional Design from Ontologies. In DOLAP, 2007. Google ScholarDigital Library
- Simitsis, A.: Mapping Conceptual to Logical Models for ETL Processes. In DOLAP, 2005. Google ScholarDigital Library
- Simitsis, A., Koutrika, G., Alexandrakis, Y., Ioannidis, Y.: Synthesizing Structured Text from Logical Database Subsets. In EDBT, 2008. Google ScholarDigital Library
- Skoutas, D., Simitsis, A.: Designing ETL Processes Using Semantic Web Technologies. In DOLAP, 2006. Google ScholarDigital Library
- Skoutas, D., Simitsis, A.: Flexible and Customizable NL Representation of Requirements for ETL processes. In NLDB, 2007. Google ScholarDigital Library
- Smith, M. K., Welty, C., McGuinness, D. L. OWL Web Ontology Language Guide. W3C Rec. 2004 (http://www.w3.org/TR/owl-guide)Google Scholar
- Storey, V. C., Goldstein, R. C., Ullrich, H.: Naive Semantics to Support Automated Database Design. In TKDE 14(1), 2002. Google ScholarDigital Library
- Min Tjoa, A., Berger, L.: Transformation of Requirement Specifications Expressed in Natural Language into an EER Model. In ER, 1993. Google ScholarDigital Library
- Trujillo, J., Lujan-Mora, S.: A UML Based Approach for Modeling ETL Processes in Data Warehouses. In ER, 2003.Google Scholar
- Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual Modeling for ETL Processes. In DOLAP, 2002. Google ScholarDigital Library
- Wilcock, G.: Talking OWLs: Towards an Ontology Verbalizer. In ISWC, 2003.Google Scholar
- Wilcock, G., Jokinen, K.: Generating Responses and Explanations from RDF/XML and DAML+OIL. In IJCAI, 2003.Google Scholar
- Wu, W., Reinwald, R., Sismanis, Y., Manjrekar, R.: Discovering Topical Structures of Databases. In SIGMOD, 2008. Google ScholarDigital Library
Index Terms
- Natural language reporting for ETL processes
Recommendations
Conceptual modeling for ETL processes
DOLAP '02: Proceedings of the 5th ACM international workshop on Data Warehousing and OLAPExtraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we focus on the problem of the ...
Representation of conceptual ETL designs in natural language using Semantic Web technology
Extract-Transform-Load (ETL) processes constitute the back stage of Data Warehouse architectures. Several studies characterize the ETL design as a time-consuming and error-prone procedure. A critical phase in the ETL lifecycle involves the early ...
Flexible and customizable NL representation of requirements for ETL processes
NLDB'07: Proceedings of the 12th international conference on Applications of Natural Language to Information SystemsThe design of an Extract - Transform - Load (ETL) workflow for the population of a Data Warehouse is a complex and challenging procedure. In previous work, we have presented an ontology-based approach to facilitate the conceptual design of an ETL ...
Comments