Skip to main content
Log in

Design Life-Cycle-Driven Approach for Data Warehouse Systems Configurability

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

Many modern software systems are designed to be highly configurable. Configurability is the ability to build consistent systems from a common architecture through selecting and synthesizing provided design elements. Configurability offers high customizability and efficient reuse strategy. Configurability has not enjoyed the same popularity in data warehouse (DW) design comparing to other types of software. Nowadays, we are assisting to an explosion of new DW applications due to high-performance computing and emerging hardware. This continuous evolution context reveals a high degree of variability that needs to be managed and exploited. We propose in this paper a configurability-aware approach for DW design, which allows designers to specify requirements defining suitable design options to generate a customized DW. To satisfy this objective, we need to perform the following three tasks: (i) a deep understanding of the DW design life-cycle analyzed by reviewing its evolutions, (ii) a formalization of each design phase and (iii) an identification of the interactions between phases. This analysis contributes in defining our approach containing: the configuration model which tailors the DW system to meet designers’ requirements and the configuration process which produces the corresponding DW configuration. The approach is defined using the Base, Variability, Resolution (BVR) models defined using the Common Variability Language proposed by the Object Management Group for defining variability modeling and implemented using BVR Tool. A case study providing two DW configurations is proposed to show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. We will use the terms design cycle and design life-cycle analogously in the rest of the paper.

  2. http://www.omgwiki.org/variability/doku.php?id=start.

  3. http://modelbased.net/tools/bvr-tool/.

  4. http://swat.cse.lehigh.edu/projects/lubm/.

  5. http://bvr.modelbased.net/update/site.xml.

  6. http://www.w3.org/2007/OWL/wiki/OracleOwlPrime.

  7. http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl.

References

  1. Abelló A, Samos J, Saltor F (2006) YAM2: a multidimensional conceptual model extending UML. Inf Syst 31(6):541–567

    Article  Google Scholar 

  2. Agarwal S, Agrawal R, Deshpande P, Gupta A, Naughton JF, Ramakrishnan R, Sarawagi S (1996) On the computation of multidimensional aggregates. Proceedings of the 22th international conference on very large data bases, VLDB ’96. USA. Morgan Kaufmann, San Francisco, CA, pp 506–521

    Google Scholar 

  3. Agrawal D, Das S, El Abbadi A (2010) Big data and cloud computing: new wine or just new bottles? Proc VLDB Endow 3(1–2):1647–1648

    Article  Google Scholar 

  4. Anderlik S, Neumayr B, Schrefl M (2012) Using domain ontologies as semantic dimensions in data warehouses. Conceptual modeling, vol 7532., Lecture notes in computer scienceSpringer, Berlin, pp 88–101

    Chapter  Google Scholar 

  5. Asikainen T, Mannisto T, Soininen T (2004) Using a configurator for modelling and configuring software product lines based on feature models. In: Software variability management for product derivation-towards tool support at international workshop of SPLC

  6. Asikainen T, Soininen T, Mnnist T (2004) A Koala-based approach for modelling and deploying configurable software product families. Springer, Berlin

    Book  Google Scholar 

  7. Baader F, Calvanese D, McGuinness D, Nardi D, Patel-Schneider P (eds) (2003) The description logic handbook: theory, implementation, and applications. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  8. Badia A, Lemire D (2011) A call to arms: revisiting database design. SIGMOD Rec 40(3):6169

    Article  Google Scholar 

  9. Batoory D, Barnett J, Garza JF, Smith KP, Tsukuda K, Twichell B, Wise T (1988) Genesis: an extensible database management system. IEEE Trans Softw Eng 14(11):1711–1730

    Article  Google Scholar 

  10. Bellatreche L, Giacometti A, Marcel A, Mouloudi H, Laurent D (2005) A personalization framework for OLAP queries. In: Proceedings of DOLAP05, pp 9–18

  11. Bellatreche L, Khouri S, Berkani N (2013) Semantic data warehouse design: from ETL to deployment a la carte. In: 18th international conference on database systems for advanced applications (DASFAA), pp 64–83

  12. Berkani N, Bellatreche L, Khouri S (2013) Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service. Clust Comput 16(4):915–931

    Article  Google Scholar 

  13. Bonifati A, Cattaneo F, Ceri S, Fuggetta A, Paraboschi S (2001) Designing data marts for data warehouses. ACM Trans SEM 10(4):452–483

    Google Scholar 

  14. Bosch J (2002) Maturity and evolution in software product lines: approaches, artefacts and organization. Springer, Berlin

    MATH  Google Scholar 

  15. Brockmans S (2009) Formal and conceptual comparison of ontology mapping languages. In: Stuckenschmidt H, Parent C, Spaccapietra S (eds) Modular ontologies. Springer, Berlin, pp 267–291

    Chapter  Google Scholar 

  16. Brown PG, Hass PJ (2003) BHUNT: automatic discovery of fuzzy algebraic constraints in relational data. In: Proceedings of the 29th international conference on very large data bases, volu 29, VLDB ’03. VLDB Endowment, pp 668–679

  17. BVR—the language. http://bvr.modelbased.net/docs/VARIES_D4.2_v01_PP_FINAL.pdf

  18. Cabibbo L, Torlone R (1998) A logical approach to multidimensional databases. In: Advances in database technology—EDBT’98, pp 183–197

  19. Calvanese D, De Giacomo G, Lenzerini M (2002) A framework for ontology integration. In: The emerging semantic web-selected papers from the first semantic web working symposium, pp 201–214

  20. Calvanese D, Lenzerini M, Nardi D (1998) Description logics for conceptual data modeling. In: Chomicki J, Saake G (eds) Logics for databases and information systems. Springer, Boston, MA, pp 229–263

    Chapter  Google Scholar 

  21. Chen PP-S (1976) The entity-relationship model toward a unified view of data. ACM Trans Database Syst 1(1):9–36

    Article  MathSciNet  Google Scholar 

  22. Cirilo E, Kulesza U, Garcia A, Cowan DD, Alencar PSC, de Lucena CJP (2013) Configurable software product lines—supporting heterogeneous configuration knowledge. In: 13th international conference on software reuse (ICSR), pp 176–191

  23. El Akkaoui Z, Zimányi E, Mazón J-N, Trujillo J (2011) A model-driven framework for ETL process development. In: Proceedings of the ACM 14th international workshop on data warehousing and OLAP. ACM, pp 45–52

  24. Elmasri R (2008) Fundamentals of database systems. Pearson Education India, Bengaluru

    MATH  Google Scholar 

  25. Fankam C (2009) OntoDB2: un système flexible et efficient de Base de Données à Base Ontologique pour le Web sémantique et les données techniques. PhD thesis, ISAE-ENSMA Ecole Nationale Supérieure de Mécanique et d’Aérotechique-Poitiers

  26. Gam I, Salinesi C et al (2006) A requirement-driven approach for designing data warehouses. In: Requirements engineering: foundation for software quality (REFSQ)

  27. Geppert A, Scherrer S, Dittrich KR (1997) KIDS: construction of database management systems based on reuse. University of Zurich

  28. Golfarelli M (2010) From user requirements to conceptual design in data warehouse design a survey. In: Bellatreche L (ed) Data warehousing design and advanced engineering applications methods for complex construction. IGI Global, Hershey, pp 1–16

    Chapter  Google Scholar 

  29. Golfarelli M, Maio D, Rizzi S (1998) The dimensional fact model: a conceptual model for data warehouses. Int J Cooper Inf Syst 7(02n03):215–247

  30. Golfarelli M, Rizzi S, Biondi P (2011) MyOLAP: an approach to express and evaluate olap preferences. IEEE Trans Knowl Data Eng 23(7):1050–1064

    Article  Google Scholar 

  31. Grabova O, Darmont J, Chauchat J-H, Zolotaryova I (2010) Business intelligence for small and middle-sized entreprises. SIGMOD Rec 39(2):39–50

    Article  Google Scholar 

  32. Gruber T (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220

    Article  Google Scholar 

  33. Haase P, Motik B (2005) A mapping system for the integration of OWL-DL ontologies. In: Proceedings of the first international workshop on interoperability of heterogeneous information systems. ACM, pp 9–16

  34. Haugen Ø, Øgård O (2014) BVR-better variability results. In: Amyot D, Fonseca i Casas P, Mussbacher G (eds) System analysis and modeling: models and reusability. Springer, Berlin, pp 1–15

  35. Inmon WH (2002) Building the data warehouse. Wiley, London

    Google Scholar 

  36. Jean S, Sadoune IA, Bellatreche L, Boukhari I (2014) On using requirements throughout the life cycle of data repository. In: Proceedings of 25th international conference on database and expert systems applications (DEXA)

  37. Jensen M, Holmgren T, Pedersen T (2004) Discovering multidimensional structure in relational data. In: Data warehousing and knowledge discovery, pp 138–148

  38. Khedri N, Khosravi R (2013) Handling database schema variability in software product lines. In: 2013 20th Asia-Pacific software engineering conference (APSEC), vol 1. IEEE, pp 331–338

  39. Khouri S (2013) Cycle de vie smantique de conception de systmes de stockage et de manipulation de donnes. PhD thesis, ENSMA & ESI, Oct 2013

  40. Khouri S, Bellatreche L (2014) Towards a configurable database design: a case of semantic data warehouses. In: On the move to meaningful internet systems: OTM 2014 conferences. Springer, pp 760–767

  41. Khouri S, Bellatreche L, Boukhari I, Bouarar S (2012) More investment in conceptual designers: think about it! In: CSE’12, pp 88–93

  42. Khouri S, Boukhari I, Bellatreche L, Jean S, Sardet E, Baron M (2012) Ontology-based structured web data warehouses for sustainable interoperability: requirement modeling, design methodology and tool. Comput Ind 63(8):799–812

    Article  Google Scholar 

  43. Khouri S, Semassel K, Bellatreche L (2015) Managing data warehouse traceability: a life-cycle driven approach. In: Advanced information systems engineering—27th international conference, CAiSE 2015, Proceedings, Stockholm, Sweden, 8–12 June 2015, pp 199–213

  44. Kimball R (1996) The data warehouse toolkit: practical techniques for building dimensional data warehouses. Wiley, New York

    Google Scholar 

  45. Kimball R, Reeves L, Thornthwaite W, Ross M, Thornwaite W (1998) The data warehouse lifecycle toolkit: expert methods for designing, developing and deploying data warehouses, 1st edn. Wiley, New York

    Google Scholar 

  46. Kimball R, Ross M et al (2002) The data warehouse toolkit: the complete guide to dimensional modelling. Wiley, New York

    Google Scholar 

  47. Kimura H, Huo G, Rasin A, Madden S, Zdonik SB (2010) CORADD: correlation aware database designer for materialized views and indexes. Proc VLDB Endow 3(1–2):1103–1113

    Article  Google Scholar 

  48. Knackstedt R, Klose K (2005) Configurative reference model-based development of data warehouse systems. In: Proceedings of the 16th information resources management association conference (IRMA), San Diego, pp 32–39. Citeseer

  49. Labio WJ, Wiener JL, Garcia-Molina H, Gorelik V (2000) Efficient resumption of interrupted warehouse loads. SIGMOD Rec 29(2):46–57

    Article  Google Scholar 

  50. Lenzerini M (2002) Data integration: a theoretical perspective. In: PODS, pp 233–246

  51. List B, Schiefer J, Tjoa A (2000) Process-oriented requirement analysis supporting the data warehouse design process a use case driven approach. In: Database and expert systems applications. Springer, pp 593–603

  52. Lu J, Ma L, Zhang L, Brunner J-S, Wang C, Pan Y, Yu Y (2007) SOR: a practical system for ontology storage, reasoning and search. In: Proceedings of the 33rd international conference on very large data bases, VLDB ’07. VLDB Endowment, pp 1402–1405

  53. Lubars M, Potts C, Richter C (1993) A review of the state of the practice in requirements modeling. In: Proceedings of IEEE international symposium on requirements engineering. IEEE, pp 2–14

  54. Luján-Mora S, Trujillo J (2004) Physical modeling of data warehouses using UML. Proceedings of the 7th ACM international workshop on data warehousing and OLAP, DOLAP ’04. New York, NY, USA. ACM, pp 48–57

    Chapter  Google Scholar 

  55. Luján-Mora S, Trujillo J, Song I-Y (2006) A UML profile for multidimensional modeling in data warehouses. Data Knowl Eng 59(3):725–769

    Article  Google Scholar 

  56. Männistö T, Soininen T, Sulonen R (2000) Configurable software product families. In ECAI 2000 configuration workshop, pp 56–58

  57. Mazón J-N, Trujillo J (2008) An MDA approach for the development of data warehouses. Decis Support Syst 45(1):41–58

    Article  Google Scholar 

  58. Mazon J-N, Trujillo J, Serrano M, Piattini M (2005) Applying MDA to the development of data warehouses. In: Proceedings of the 8th ACM international workshop on data warehousing and OLAP. ACM, pp 57–66

  59. Nebot V, Berlanga R (2012) Building data warehouses with semantic web data. Decis Support Syst 52(4):853–868

    Article  Google Scholar 

  60. Object Management Group (2012) Common Variability Language (CVL), revised submission. http://www.omgwiki.org/variability/lib/exe/fetch.php?media=cvl-revised-submission.pdf

  61. Ognjanovic I, Mohabbati B, Gasevic D, Bagheri E, Boskovic M (2012) A metaheuristic approach for the configuration of business process families. In: IEEE ninth international conference on services computing, pp 25–32

  62. Peltonen H, Mannisto T, Soininen T, Tiihonen J, Martio A, Sulonen R (1998) Concepts for modelling configurable products. Helsinki University of Technology, Espoo

    Google Scholar 

  63. Pucheral P, Bouganim L, Valduriez P, Bobineau C (2001) PicoDBMS: scaling down database techniques for the smartcard. VLDB J 10(2–3):120–132

    MATH  Google Scholar 

  64. Raatikainen M, Soininen T, Mannisto T, Mattila A (2005) Characterizing configurable software product families and their derivation. Softw Process Improv Pract 10(1):4160

    Article  Google Scholar 

  65. Romero O, Abelló A (2007) Automating multidimensional design from ontologies. In: Proceedings of the ACM tenth international workshop on Data warehousing and OLAP. ACM, pp 1–8

  66. Romero O, Abelló A (2010) Automatic validation of requirements to support multidimensional design. Data Knowl Eng 69(9):917–942

    Article  Google Scholar 

  67. Romero O, Simitsis A, Abelló A (2011) GEM: requirement-driven generation of ETL and multidimensional conceptual designs. In: Data warehousing and knowledge discovery, pp 80–95

  68. Rosenmuller M, Apel S, Leich T, Saake G (2009) Tailor-made data management for embedded systems: a case study on Berkeley DB. Data Knowl Eng 68(12):14931512

    Article  Google Scholar 

  69. Rosenmüller M, Kästner C, Siegmund N, Sunkle S, Apel S, Leich T, Saake G (2009) SQL á la carte-toward tailor-made data management. In: BTW, pp 117–136. Citeseer

  70. Rosenmüller M, Siegmund N, Schirmeier H, Sincero J, Apel S, Leich T, Spinczyk O, Saake G (2008) FAME-DBMS: tailor-made data management solutions for embedded systems. In: Proceedings of the 2008 EDBT workshop on software engineering for tailor-made data management. ACM, pp 1–6

  71. Royer J-C, Arboleda H (2013) Model-driven and software product line engineering. Wiley, London

    Google Scholar 

  72. Schaefer I, Rabiser R, Clarke D, Bettini L, Benavides D, Botterweck G, Pathak A, Trujillo S, Villela K (2012) Software diversity: state of the art and perspectives. Int J Softw Tools Technol Transf 14(5):477–495

    Article  Google Scholar 

  73. Schütz C, Schrefl M (2014) Customization of domain-specific reference models for data warehouses. In: 2014 IEEE 18th international enterprise distributed object computing conference (EDOC). IEEE, pp 61–70

  74. Seltzer M (2008) Beyond relational databases. Commun ACM 51(7):5258

    Article  Google Scholar 

  75. Shaw M, DeLine R, Klein DV, Ross TL, Young DM, Zelesnik G (1995) Abstractions for software architecture and tools to support them. IEEE Tran Softw Eng 21(4):314335

    Google Scholar 

  76. Simitsis A, Vassiliadis P (2003) A methodology for the conceptual modeling of ETL processes. In: CAiSE workshops

  77. Skoutas D, Simitsis A (2006) Designing ETL processes using semantic web technologies. In: Data Warehousing and OLAP: proceedings of the 9 th ACM international workshop on data warehousing and OLAP, vol 10, pp 67–74

  78. Skoutas D, Simitsis A (2007) Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int J Semant Web Inf Syst 3(4):1–24

    Article  Google Scholar 

  79. Soininen T, Tiihonen J, Mnnist T, Sulonen R (1998) Towards a general ontology of configuration. AI EDAM 12(04):357372

    Google Scholar 

  80. Stöhr T, Märtens H, Rahm E (2000) Multi-dimensional database allocation for parallel data warehouses. In: Proceedings of the 26th international conference on very large data bases, VLDB ’00. Morgan Kaufmann, pp 273–284

  81. Stöhr T, Müller R, Rahm E (1999) An integrative and uniform model for metadata management in data warehousing environments. In: Proceedings of the international workshop on design and management of data warehouses, vol 189, Heidelberg, Germany

  82. Stonebraker M, Hellerstein JM (2005) What goes around comes around. In: Readings in database systems, 4th edn. The MIT Press, Cambridge, London

  83. Svahnberg M, Van Gurp J, Bosch J (2005) A taxonomy of variability realization techniques. Softw Pract Exp 754(8):35–705

    Google Scholar 

  84. The Common Variability Language Wiki (2012). http://www.omgwiki.org/variability/doku.php

  85. Thiel S, Hein A (2002) Systematic integration of variability into product line architecture design. Springer, Berlin

    Book  MATH  Google Scholar 

  86. Trujillo J, Luján-Mora S (2003) A UML based approach for modeling ETL processes in data warehouses. In: Conceptual modeling—ER 2003, pp 307–320

  87. Tziovara V, Vassiliadis P, Simitsis A (2007) Deciding the physical implementation of ETL workflows. DOLAP, New York, NY, USA. ACM, pp 49–56

  88. Vaisman A, Zimányi E (2014) Data warehouse systems. Springer, Berlin

    Book  Google Scholar 

  89. van der Hoek A, Heimbigner D, Wolf AL (1999) Capturing architectural configurability: variants, options, and evolution (No. CU-CS-895-99). Colorado univ at boulder dept of computer science

  90. Vassiliadis P, Bouzeghoub M, Quix C (2000) Towards quality-oriented data warehouse usage and evolution. Inf Syst 25(2):89–115

    Article  Google Scholar 

  91. Voigt H, Hanisch A, Lehner W (2015) Flexs—a logical model for physical data layout. In: New trends in database and information systems II. Springer, pp 85–95

  92. Wehrle P, Miquel M, Tchounikine A (2005) A model for distributing and querying a data warehouse on a computing grid. In: 11th international conference on parallel and distributed systems, 2005. Proceedings, vol 1. IEEE, pp 203–209

  93. Winter R, Strauch B (2003) A method for demand-driven information requirements analysis in data warehousing projects. In: Proceedings of the 36th annual Hawaii international conference on system sciences, 2003. IEEE, pp 9–19

  94. Wu Z, Eadon G, Das S, Chong E, Kolovski V, Annamalai M, Srinivasan J (2008) Implementing an inference engine for RDFS/OWL constructs and user-defined rules in oracle. In: ICDE, pp 1239–1248

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Selma Khouri.

Additional information

This work is an extension of our ODBASE paper [40].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khouri, S., Bellatreche, L. Design Life-Cycle-Driven Approach for Data Warehouse Systems Configurability. J Data Semant 6, 83–111 (2017). https://doi.org/10.1007/s13740-017-0077-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-017-0077-8

Keywords

Navigation