Abstract
Many modern software systems are designed to be highly configurable. Configurability is the ability to build consistent systems from a common architecture through selecting and synthesizing provided design elements. Configurability offers high customizability and efficient reuse strategy. Configurability has not enjoyed the same popularity in data warehouse (DW) design comparing to other types of software. Nowadays, we are assisting to an explosion of new DW applications due to high-performance computing and emerging hardware. This continuous evolution context reveals a high degree of variability that needs to be managed and exploited. We propose in this paper a configurability-aware approach for DW design, which allows designers to specify requirements defining suitable design options to generate a customized DW. To satisfy this objective, we need to perform the following three tasks: (i) a deep understanding of the DW design life-cycle analyzed by reviewing its evolutions, (ii) a formalization of each design phase and (iii) an identification of the interactions between phases. This analysis contributes in defining our approach containing: the configuration model which tailors the DW system to meet designers’ requirements and the configuration process which produces the corresponding DW configuration. The approach is defined using the Base, Variability, Resolution (BVR) models defined using the Common Variability Language proposed by the Object Management Group for defining variability modeling and implemented using BVR Tool. A case study providing two DW configurations is proposed to show the effectiveness of our approach.
Similar content being viewed by others
Notes
We will use the terms design cycle and design life-cycle analogously in the rest of the paper.
References
Abelló A, Samos J, Saltor F (2006) YAM2: a multidimensional conceptual model extending UML. Inf Syst 31(6):541–567
Agarwal S, Agrawal R, Deshpande P, Gupta A, Naughton JF, Ramakrishnan R, Sarawagi S (1996) On the computation of multidimensional aggregates. Proceedings of the 22th international conference on very large data bases, VLDB ’96. USA. Morgan Kaufmann, San Francisco, CA, pp 506–521
Agrawal D, Das S, El Abbadi A (2010) Big data and cloud computing: new wine or just new bottles? Proc VLDB Endow 3(1–2):1647–1648
Anderlik S, Neumayr B, Schrefl M (2012) Using domain ontologies as semantic dimensions in data warehouses. Conceptual modeling, vol 7532., Lecture notes in computer scienceSpringer, Berlin, pp 88–101
Asikainen T, Mannisto T, Soininen T (2004) Using a configurator for modelling and configuring software product lines based on feature models. In: Software variability management for product derivation-towards tool support at international workshop of SPLC
Asikainen T, Soininen T, Mnnist T (2004) A Koala-based approach for modelling and deploying configurable software product families. Springer, Berlin
Baader F, Calvanese D, McGuinness D, Nardi D, Patel-Schneider P (eds) (2003) The description logic handbook: theory, implementation, and applications. Cambridge University Press, Cambridge
Badia A, Lemire D (2011) A call to arms: revisiting database design. SIGMOD Rec 40(3):6169
Batoory D, Barnett J, Garza JF, Smith KP, Tsukuda K, Twichell B, Wise T (1988) Genesis: an extensible database management system. IEEE Trans Softw Eng 14(11):1711–1730
Bellatreche L, Giacometti A, Marcel A, Mouloudi H, Laurent D (2005) A personalization framework for OLAP queries. In: Proceedings of DOLAP05, pp 9–18
Bellatreche L, Khouri S, Berkani N (2013) Semantic data warehouse design: from ETL to deployment a la carte. In: 18th international conference on database systems for advanced applications (DASFAA), pp 64–83
Berkani N, Bellatreche L, Khouri S (2013) Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service. Clust Comput 16(4):915–931
Bonifati A, Cattaneo F, Ceri S, Fuggetta A, Paraboschi S (2001) Designing data marts for data warehouses. ACM Trans SEM 10(4):452–483
Bosch J (2002) Maturity and evolution in software product lines: approaches, artefacts and organization. Springer, Berlin
Brockmans S (2009) Formal and conceptual comparison of ontology mapping languages. In: Stuckenschmidt H, Parent C, Spaccapietra S (eds) Modular ontologies. Springer, Berlin, pp 267–291
Brown PG, Hass PJ (2003) BHUNT: automatic discovery of fuzzy algebraic constraints in relational data. In: Proceedings of the 29th international conference on very large data bases, volu 29, VLDB ’03. VLDB Endowment, pp 668–679
BVR—the language. http://bvr.modelbased.net/docs/VARIES_D4.2_v01_PP_FINAL.pdf
Cabibbo L, Torlone R (1998) A logical approach to multidimensional databases. In: Advances in database technology—EDBT’98, pp 183–197
Calvanese D, De Giacomo G, Lenzerini M (2002) A framework for ontology integration. In: The emerging semantic web-selected papers from the first semantic web working symposium, pp 201–214
Calvanese D, Lenzerini M, Nardi D (1998) Description logics for conceptual data modeling. In: Chomicki J, Saake G (eds) Logics for databases and information systems. Springer, Boston, MA, pp 229–263
Chen PP-S (1976) The entity-relationship model toward a unified view of data. ACM Trans Database Syst 1(1):9–36
Cirilo E, Kulesza U, Garcia A, Cowan DD, Alencar PSC, de Lucena CJP (2013) Configurable software product lines—supporting heterogeneous configuration knowledge. In: 13th international conference on software reuse (ICSR), pp 176–191
El Akkaoui Z, Zimányi E, Mazón J-N, Trujillo J (2011) A model-driven framework for ETL process development. In: Proceedings of the ACM 14th international workshop on data warehousing and OLAP. ACM, pp 45–52
Elmasri R (2008) Fundamentals of database systems. Pearson Education India, Bengaluru
Fankam C (2009) OntoDB2: un système flexible et efficient de Base de Données à Base Ontologique pour le Web sémantique et les données techniques. PhD thesis, ISAE-ENSMA Ecole Nationale Supérieure de Mécanique et d’Aérotechique-Poitiers
Gam I, Salinesi C et al (2006) A requirement-driven approach for designing data warehouses. In: Requirements engineering: foundation for software quality (REFSQ)
Geppert A, Scherrer S, Dittrich KR (1997) KIDS: construction of database management systems based on reuse. University of Zurich
Golfarelli M (2010) From user requirements to conceptual design in data warehouse design a survey. In: Bellatreche L (ed) Data warehousing design and advanced engineering applications methods for complex construction. IGI Global, Hershey, pp 1–16
Golfarelli M, Maio D, Rizzi S (1998) The dimensional fact model: a conceptual model for data warehouses. Int J Cooper Inf Syst 7(02n03):215–247
Golfarelli M, Rizzi S, Biondi P (2011) MyOLAP: an approach to express and evaluate olap preferences. IEEE Trans Knowl Data Eng 23(7):1050–1064
Grabova O, Darmont J, Chauchat J-H, Zolotaryova I (2010) Business intelligence for small and middle-sized entreprises. SIGMOD Rec 39(2):39–50
Gruber T (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220
Haase P, Motik B (2005) A mapping system for the integration of OWL-DL ontologies. In: Proceedings of the first international workshop on interoperability of heterogeneous information systems. ACM, pp 9–16
Haugen Ø, Øgård O (2014) BVR-better variability results. In: Amyot D, Fonseca i Casas P, Mussbacher G (eds) System analysis and modeling: models and reusability. Springer, Berlin, pp 1–15
Inmon WH (2002) Building the data warehouse. Wiley, London
Jean S, Sadoune IA, Bellatreche L, Boukhari I (2014) On using requirements throughout the life cycle of data repository. In: Proceedings of 25th international conference on database and expert systems applications (DEXA)
Jensen M, Holmgren T, Pedersen T (2004) Discovering multidimensional structure in relational data. In: Data warehousing and knowledge discovery, pp 138–148
Khedri N, Khosravi R (2013) Handling database schema variability in software product lines. In: 2013 20th Asia-Pacific software engineering conference (APSEC), vol 1. IEEE, pp 331–338
Khouri S (2013) Cycle de vie smantique de conception de systmes de stockage et de manipulation de donnes. PhD thesis, ENSMA & ESI, Oct 2013
Khouri S, Bellatreche L (2014) Towards a configurable database design: a case of semantic data warehouses. In: On the move to meaningful internet systems: OTM 2014 conferences. Springer, pp 760–767
Khouri S, Bellatreche L, Boukhari I, Bouarar S (2012) More investment in conceptual designers: think about it! In: CSE’12, pp 88–93
Khouri S, Boukhari I, Bellatreche L, Jean S, Sardet E, Baron M (2012) Ontology-based structured web data warehouses for sustainable interoperability: requirement modeling, design methodology and tool. Comput Ind 63(8):799–812
Khouri S, Semassel K, Bellatreche L (2015) Managing data warehouse traceability: a life-cycle driven approach. In: Advanced information systems engineering—27th international conference, CAiSE 2015, Proceedings, Stockholm, Sweden, 8–12 June 2015, pp 199–213
Kimball R (1996) The data warehouse toolkit: practical techniques for building dimensional data warehouses. Wiley, New York
Kimball R, Reeves L, Thornthwaite W, Ross M, Thornwaite W (1998) The data warehouse lifecycle toolkit: expert methods for designing, developing and deploying data warehouses, 1st edn. Wiley, New York
Kimball R, Ross M et al (2002) The data warehouse toolkit: the complete guide to dimensional modelling. Wiley, New York
Kimura H, Huo G, Rasin A, Madden S, Zdonik SB (2010) CORADD: correlation aware database designer for materialized views and indexes. Proc VLDB Endow 3(1–2):1103–1113
Knackstedt R, Klose K (2005) Configurative reference model-based development of data warehouse systems. In: Proceedings of the 16th information resources management association conference (IRMA), San Diego, pp 32–39. Citeseer
Labio WJ, Wiener JL, Garcia-Molina H, Gorelik V (2000) Efficient resumption of interrupted warehouse loads. SIGMOD Rec 29(2):46–57
Lenzerini M (2002) Data integration: a theoretical perspective. In: PODS, pp 233–246
List B, Schiefer J, Tjoa A (2000) Process-oriented requirement analysis supporting the data warehouse design process a use case driven approach. In: Database and expert systems applications. Springer, pp 593–603
Lu J, Ma L, Zhang L, Brunner J-S, Wang C, Pan Y, Yu Y (2007) SOR: a practical system for ontology storage, reasoning and search. In: Proceedings of the 33rd international conference on very large data bases, VLDB ’07. VLDB Endowment, pp 1402–1405
Lubars M, Potts C, Richter C (1993) A review of the state of the practice in requirements modeling. In: Proceedings of IEEE international symposium on requirements engineering. IEEE, pp 2–14
Luján-Mora S, Trujillo J (2004) Physical modeling of data warehouses using UML. Proceedings of the 7th ACM international workshop on data warehousing and OLAP, DOLAP ’04. New York, NY, USA. ACM, pp 48–57
Luján-Mora S, Trujillo J, Song I-Y (2006) A UML profile for multidimensional modeling in data warehouses. Data Knowl Eng 59(3):725–769
Männistö T, Soininen T, Sulonen R (2000) Configurable software product families. In ECAI 2000 configuration workshop, pp 56–58
Mazón J-N, Trujillo J (2008) An MDA approach for the development of data warehouses. Decis Support Syst 45(1):41–58
Mazon J-N, Trujillo J, Serrano M, Piattini M (2005) Applying MDA to the development of data warehouses. In: Proceedings of the 8th ACM international workshop on data warehousing and OLAP. ACM, pp 57–66
Nebot V, Berlanga R (2012) Building data warehouses with semantic web data. Decis Support Syst 52(4):853–868
Object Management Group (2012) Common Variability Language (CVL), revised submission. http://www.omgwiki.org/variability/lib/exe/fetch.php?media=cvl-revised-submission.pdf
Ognjanovic I, Mohabbati B, Gasevic D, Bagheri E, Boskovic M (2012) A metaheuristic approach for the configuration of business process families. In: IEEE ninth international conference on services computing, pp 25–32
Peltonen H, Mannisto T, Soininen T, Tiihonen J, Martio A, Sulonen R (1998) Concepts for modelling configurable products. Helsinki University of Technology, Espoo
Pucheral P, Bouganim L, Valduriez P, Bobineau C (2001) PicoDBMS: scaling down database techniques for the smartcard. VLDB J 10(2–3):120–132
Raatikainen M, Soininen T, Mannisto T, Mattila A (2005) Characterizing configurable software product families and their derivation. Softw Process Improv Pract 10(1):4160
Romero O, Abelló A (2007) Automating multidimensional design from ontologies. In: Proceedings of the ACM tenth international workshop on Data warehousing and OLAP. ACM, pp 1–8
Romero O, Abelló A (2010) Automatic validation of requirements to support multidimensional design. Data Knowl Eng 69(9):917–942
Romero O, Simitsis A, Abelló A (2011) GEM: requirement-driven generation of ETL and multidimensional conceptual designs. In: Data warehousing and knowledge discovery, pp 80–95
Rosenmuller M, Apel S, Leich T, Saake G (2009) Tailor-made data management for embedded systems: a case study on Berkeley DB. Data Knowl Eng 68(12):14931512
Rosenmüller M, Kästner C, Siegmund N, Sunkle S, Apel S, Leich T, Saake G (2009) SQL á la carte-toward tailor-made data management. In: BTW, pp 117–136. Citeseer
Rosenmüller M, Siegmund N, Schirmeier H, Sincero J, Apel S, Leich T, Spinczyk O, Saake G (2008) FAME-DBMS: tailor-made data management solutions for embedded systems. In: Proceedings of the 2008 EDBT workshop on software engineering for tailor-made data management. ACM, pp 1–6
Royer J-C, Arboleda H (2013) Model-driven and software product line engineering. Wiley, London
Schaefer I, Rabiser R, Clarke D, Bettini L, Benavides D, Botterweck G, Pathak A, Trujillo S, Villela K (2012) Software diversity: state of the art and perspectives. Int J Softw Tools Technol Transf 14(5):477–495
Schütz C, Schrefl M (2014) Customization of domain-specific reference models for data warehouses. In: 2014 IEEE 18th international enterprise distributed object computing conference (EDOC). IEEE, pp 61–70
Seltzer M (2008) Beyond relational databases. Commun ACM 51(7):5258
Shaw M, DeLine R, Klein DV, Ross TL, Young DM, Zelesnik G (1995) Abstractions for software architecture and tools to support them. IEEE Tran Softw Eng 21(4):314335
Simitsis A, Vassiliadis P (2003) A methodology for the conceptual modeling of ETL processes. In: CAiSE workshops
Skoutas D, Simitsis A (2006) Designing ETL processes using semantic web technologies. In: Data Warehousing and OLAP: proceedings of the 9 th ACM international workshop on data warehousing and OLAP, vol 10, pp 67–74
Skoutas D, Simitsis A (2007) Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int J Semant Web Inf Syst 3(4):1–24
Soininen T, Tiihonen J, Mnnist T, Sulonen R (1998) Towards a general ontology of configuration. AI EDAM 12(04):357372
Stöhr T, Märtens H, Rahm E (2000) Multi-dimensional database allocation for parallel data warehouses. In: Proceedings of the 26th international conference on very large data bases, VLDB ’00. Morgan Kaufmann, pp 273–284
Stöhr T, Müller R, Rahm E (1999) An integrative and uniform model for metadata management in data warehousing environments. In: Proceedings of the international workshop on design and management of data warehouses, vol 189, Heidelberg, Germany
Stonebraker M, Hellerstein JM (2005) What goes around comes around. In: Readings in database systems, 4th edn. The MIT Press, Cambridge, London
Svahnberg M, Van Gurp J, Bosch J (2005) A taxonomy of variability realization techniques. Softw Pract Exp 754(8):35–705
The Common Variability Language Wiki (2012). http://www.omgwiki.org/variability/doku.php
Thiel S, Hein A (2002) Systematic integration of variability into product line architecture design. Springer, Berlin
Trujillo J, Luján-Mora S (2003) A UML based approach for modeling ETL processes in data warehouses. In: Conceptual modeling—ER 2003, pp 307–320
Tziovara V, Vassiliadis P, Simitsis A (2007) Deciding the physical implementation of ETL workflows. DOLAP, New York, NY, USA. ACM, pp 49–56
Vaisman A, Zimányi E (2014) Data warehouse systems. Springer, Berlin
van der Hoek A, Heimbigner D, Wolf AL (1999) Capturing architectural configurability: variants, options, and evolution (No. CU-CS-895-99). Colorado univ at boulder dept of computer science
Vassiliadis P, Bouzeghoub M, Quix C (2000) Towards quality-oriented data warehouse usage and evolution. Inf Syst 25(2):89–115
Voigt H, Hanisch A, Lehner W (2015) Flexs—a logical model for physical data layout. In: New trends in database and information systems II. Springer, pp 85–95
Wehrle P, Miquel M, Tchounikine A (2005) A model for distributing and querying a data warehouse on a computing grid. In: 11th international conference on parallel and distributed systems, 2005. Proceedings, vol 1. IEEE, pp 203–209
Winter R, Strauch B (2003) A method for demand-driven information requirements analysis in data warehousing projects. In: Proceedings of the 36th annual Hawaii international conference on system sciences, 2003. IEEE, pp 9–19
Wu Z, Eadon G, Das S, Chong E, Kolovski V, Annamalai M, Srinivasan J (2008) Implementing an inference engine for RDFS/OWL constructs and user-defined rules in oracle. In: ICDE, pp 1239–1248
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is an extension of our ODBASE paper [40].
Rights and permissions
About this article
Cite this article
Khouri, S., Bellatreche, L. Design Life-Cycle-Driven Approach for Data Warehouse Systems Configurability. J Data Semant 6, 83–111 (2017). https://doi.org/10.1007/s13740-017-0077-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-017-0077-8