Abstract
Data Warehouses are the core of the modern systems for decision making. They store integrated information extracted from various and heterogeneous data sources, making it available in multidimensional form for analyses aimed at improving the users’ knowledge of their business. Though the first use of the term dates back to the 80s, only during the late 90s data warehousing has emerged as a research area on its own, though in strict correlation with several other research topics as database integration, view materialization, data visualization, etc. This paper surveys more than 20 years of research on data warehouse systems, from their early relational implementations (still widely adopted in corporate environments), to the new architectures solicited by Business Intelligence 2.0 scenarios during the last decade, and up to the exciting challenges now posed by the integration with big data settings. The timeline of research is organized into three interrelated tracks: techniques, architectures, and methodologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A. Abelló, J. Samos, F. Saltor, YAM\(^2\): a multidimensional conceptual model extending UML. Inf. Syst. 31(6), 541–567 (2006)
A. Abelló et al., Fusion cubes: towards self-service business intelligence. IJDWM 9(2), 66–88 (2013)
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, I. Stoica, BlinkDB: queries with bounded errors and bounded response times on very large data, in Proceedings of Eurosys (2013), pp. 29–42
T.O. Ahmed, M. Miquel, Multidimensional structures dedicated to continuous spatiotemporal phenomena, in Proceedings of the BNCOD (2005), pp. 29–40
F. Akal, K. Böhm, H. Schek, OLAP query evaluation in a database cluster: a performance study on intra-query parallelism, in Proceedings of the ADBIS (2002), pp. 218–231
J. Aligon, E. Gallinucci, M. Golfarelli, P. Marcel, S. Rizzi, A collaborative filtering approach for recommending OLAP sessions. Decis. Support Syst. 69, 20–30 (2015)
K. Aouiche, J. Darmont, Data mining-based materialized view and index selection in data warehouses. JIIS 33(1), 65–93 (2009)
M. Armbrust et al., Spark SQL: relational data processing in spark, in Proceedings of the SIGMOD (2015), pp. 1383–1394
M. Banek, B. Vrdoljak, A.M. Tjoa, Z. Skocir, Automating the schema matching process for heterogeneous data warehouses, in Proceedings of the DaWaK (2007), pp. 45–54
E. Baralis, S. Paraboschi, E. Teniente, Materialized views selection in a multidimensional database, in Proceedings of the VLDB (1997), pp. 156–165
B. Bȩbel, J. Eder, C. Koncilia, T. Morzy, R. Wrembel, Creation and management of versions in multiversion data warehouse, in Proceedings of the SAC (2004), pp. 717–723
Y. Bédard, T. Merrett, J. Han, Fundamentals of spatial data warehousing for geographic knowledge discovery. Geogr. Data Min. knowl. Discov. 2, 53–73 (2001)
L. Bellatreche, A. Cuzzocrea, S. Benkrid, Query optimization over parallel relational data warehouses in distributed environments by simultaneous fragmentation and allocation, in Proceedings of the ICA3PP (2010), pp. 124–135
S. Benkrid, L. Bellatreche, A. Cuzzocrea, A global paradigm for designing parallel relational data warehouses in distributed environments. TLDKS 15, 64–101 (2014)
S. Bergamaschi, M. Olaru, S. Sorrentino, M. Vincini, Dimension matching in peer-to-peer data warehousing, in Proceedings of the DSS (2012), pp. 149–160
S. Berger, M. Schrefl, Analysing multi-dimensional data across autonomous data warehouses, in Proceedings of the DaWaK (2006), pp. 120–133
S. Berger, M. Schrefl, From federated databases to a federated data warehouse system, in Proceedings of the HICSS-41 (2008), p. 394
N. Berkani, L. Bellatreche, B. Benatallah, A value-added approach to design BI applications, in Proceedings of the DaWaK (2016), pp. 361–375
S. Bimonte, A. Tchounikine, M. Miquel, Geocube, a multidimensional model and navigation operators handling complex measures: application in spatial OLAP, in Proceedings of the IAIT (Springer, Berlin, 2006), pp. 100–109
S. Bimonte, A. Tchounikine, M. Miquel, F. Pinet, When spatial analysis meets OLAP: multidimensional model and operators, Exploring Advances in Interdisciplinary Data Mining and Analytics (2011), pp. 249–277
M. Blaschka, C. Sapia, G. Höfling, On schema evolution in multidimensional databases, in Proceedings of the DaWaK (1999), pp. 153–164
M. Bouzeghoub, Z. Kedad, A quality-based framework for physical data warehouse design, in Proceedings of the DMDW (2000)
F. Braz, S. Orlando, R. Orsini, A. Raffaetà, A. Roncato, C. Silvestri, Approximate aggregations in trajectory data warehouses, in Proceedings of the ICDE (2007), pp. 536–545
L. Cabibbo, R. Torlone, On the integration of autonomous data marts, in Proceedings of the SSDBM (2004), pp. 223–231
C. Calero, M. Piattini, C. Pascual, M.A. Serrano, Towards data warehouse quality metrics, in Proceedings of the DMDW (2001)
D. Calvanese, L. Dragone, D. Nardi, R. Rosati, S.M. Trisolini, Enterprise modeling and data warehousing in Telecom Italia. Inf. Syst. 31(1), 1–32 (2006)
S. Chaudhuri, R. Krishnamurthy, S. Potamianos, K. Shim, Optimizing queries with materialized views, in Proceedings of the ICDE (1995), pp. 190–200
M. Chevalier, M.E. Malki, A. Kopliku, O. Teste, R. Tournier, Implementation of multidimensional databases in column-oriented NoSQL systems, in Proceedings of the ADBIS (2015), pp. 79–91
M.L. Chouder, S. Rizzi, R. Chalal, Enabling self-service BI on document stores, in Proceedings of the DOLAP (2017, to appear)
A. Cuzzocrea, D. Saccà, Balancing accuracy and privacy of OLAP aggregations on data cubes, in Proceedings of the DOLAP (2010), pp. 93–98
K. Dehdouh, Building OLAP cubes from columnar NoSQL data warehouses, in Proceedings of the MEDI (2016), pp. 166–179
K. Dehdouh, F. Bentayeb, O. Boussaid, N. Kabachi, Columnar NoSQL CUBE: aggregation operator for columnar NoSQL data warehouse, in Proceedings of the SMC (2014), pp. 3828–3833
C. Diamantini, D. Potena, E. Storti, Semantics-based multidimensional query over sparse data marts, in Proceedings of the DaWaK (2015), pp. 190–202
A. Dobra, M. Garofalakis, J. Gehrke, R. Rastogi, Processing complex aggregate queries over data streams, in Proceedings of the SIGMOD (2002), pp. 61–72
S. Eick, Visualizing multi-dimensional data. SIGGRAPH Comput. Graph. 34(1), 61–67 (2000)
F. Färber, S.K. Cha, J. Primsch, C. Bornhövd, S. Sigg, W. Lehner, SAP HANA database: data management for modern business applications. SIGMOD Record 40(4), 45–51 (2012)
M. Francia, M. Golfarelli, S. Rizzi, A methodology for social BI, in Proceedings of the IDEAS (2014), pp. 207–216
E. Franconi, A. Kamble, A data warehouse conceptual data model, in Proceedings of the SSDBM (2004), pp. 435–436
C. Furtado, A.A.B. Lima, E. Pacitti, P. Valduriez, M. Mattoso, Physical and virtual partitioning in OLAP database clusters, in Proceedings of the SBAC-PAD (2005), pp. 143–150
E. Gallinucci, M. Golfarelli, S. Rizzi, Advanced topic modeling for social business intelligence. Inf. Syst. 53, 87–106 (2015)
P. Giorgini, S. Rizzi, M. Garzetti, GRAnD: a goal-oriented approach to requirement analysis in data warehouses. Decis. Support Syst. 45(1), 4–21 (2008)
M. Golfarelli, S. Rizzi, Data Warehouse Design: Modern Principles and Methodologies (McGraw-Hill, New York, 2009)
M. Golfarelli, S. Rizzi, Data warehouse testing: a prototype-based methodology. IST 53(11), 1183–1198 (2011)
M. Golfarelli, S. Rizzi, E. Saltarelli, Index selection for data warehousing, in Proceedings of the DMDW (2002), pp. 33–42
M. Golfarelli, J. Lechtenbörger, S. Rizzi, G. Vossen, Schema versioning in data warehouses: enabling cross-version querying via schema augmentation. DKE 59(2), 435–459 (2006)
M. Golfarelli, S. Rizzi, P. Biondi, myOLAP: an approach to express and evaluate OLAP preferences. TKDE 23(7), 1050–1064 (2011)
M. Golfarelli, S. Rizzi, E. Turricchia, Modern software engineering methodologies meet data warehouse design: 4WD, in Proceedings of the DaWaK (2011), pp. 66–79
M. Golfarelli, F. Mandreoli, W. Penzo, S. Rizzi, E. Turricchia, OLAP query reformulation in peer-to-peer data warehousing. Inf. Syst. 37(5), 393–411 (2012)
M. Golfarelli, M. Mantovani, F. Ravaldi, S. Rizzi, Lily: a geo-enhanced library for location intelligence, in Proceedings of the DaWaK (2013), pp. 72–83
M. Golfarelli, S. Graziani, S. Rizzi, Shrink: an OLAP operation for balancing precision and size of pivot tables. DKE 93, 19–41 (2014)
M. Goller, S. Berger, Slowly changing measures, in Proceedings of DOLAP (2013), pp. 47–54
L.I. Gómez, S.A. Gómez, A. Vaisman, Modeling and querying continuous fields with OLAP cubes. IJDWM 9(3), 22–45 (2013)
J. Gray et al., Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)
H. Gupta, I.S. Mumick, Selection of views to materialize under a maintenance cost constraint, in Proceedings of the ICDT (1999), pp. 453–470
H. Gupta, V. Harinarayan, A. Rajaraman, J.D. Ullman, Index selection for OLAP, in Proceedings of the ICDE (1997), pp. 208–219
R. Hughes, Agile data warehousing: delivering world-class business intelligence systems using Scrum and XP, IUniverse (2008)
B. Hüsemann, J. Lechtenbörger, G. Vossen, Conceptual data warehouse design, in Proceedings of the DMDW (2000), pp. 3–9
B. Inmon, Building the Data Warehouse (Wiley, New York, 1992)
H. Jerbi, F. Ravat, O. Teste, G. Zurfluh, Preference-based recommendations for OLAP analysis, in Proceedings of the DaWaK (2009), pp. 467–478
M.A. Jeusfeld, C. Quix, M. Jarke, Design and analysis of quality information for data warehouses, in Proceedings of the ER (1998), pp. 349–362
M.E. Jones, I. Song, Dimensional modeling: identification, classification, and evaluation of patterns. Decis. Support Syst. 45(1), 59–76 (2008)
B. Kämpgen, S. O’Riain, A. Harth, Interacting with statistical linked data via OLAP operations, in Proceedings of the Semantic Web Satellite Events (2015), pp. 87–101
D.A. Keim, H. Kriegel, VisDB: a system for visualizing large databases, in Proceedings of the SIGMOD (1995), p. 482
R. Kimball, The Data Warehouse Toolkit (Wiley, New York, 1996)
A. Kotopoulis, Best practices for real-time data warehousing. Technical report, Oracle Corporation (2014)
L.V.S. Lakshmanan, J. Pei, Y. Zhao, QC-Trees: an efficient summary structure for semantic OLAP, in Proceedings of the SIGMOD (2003), pp. 64–75
J. Lechtenbörger, G. Vossen, Multidimensional normal forms for data warehouse design. Inf. Syst. 28(5), 415–434 (2003)
H.J. Lenz, A. Shoshani, Summarizability in OLAP and statistical data bases, in Proceedings of the SSDBM (1997), pp. 132–143
L. Leonardi et al., T-warehouse: visual OLAP analysis on trajectory data, in Proceedings of the ICDE (2010), pp. 1141–1144
A.A.B. Lima, C. Furtado, P. Valduriez, M. Mattoso, Parallel OLAP query processing in database clusters with data replication. Distrib. Parallel Databases 25(1–2), 97–123 (2009)
Z.H. Liu, D. Gawlick, Management of flexible schema data in RDBMSs - opportunities and limitations for NoSQL, in Proceedings of the CIDR (2015)
S. Luján-Mora, J. Trujillo, A comprehensive method for data warehouse design, in Proceedings of the DMDW (2003)
S. Luján-Mora, J. Trujillo, I. Song, A UML profile for multidimensional modeling in data warehouses, in DKE (2006, in press)
H. Mahboubi, XML warehousing and OLAP, Encyclopedia of Data Warehousing and Mining, 2nd edn. (IGI Global, Hershey, 2009), pp. 2109–2116
A.S. Maniatis, P. Vassiliadis, S. Skiadopoulos, Y. Vassiliou, G. Mavrogonatos, I. Michalarias, A presentation model & non-traditional visualization for OLAP. IJDWM 1(1), 1–36 (2005)
S. Mansmann, M.H. Scholl, Extending visual OLAP for handling irregular dimensional hierarchies, in Proceedings of the DaWaK (2006), pp. 95–105
P. Marcel, E. Negre, A survey of query recommendation techniques for data warehouse exploration, in Proceedings of the EDA (2011), pp. 119–134
G. Marketos, Y. Theodoridis, Ad-hoc OLAP on trajectory data, in International Conference on Mobile Data Management (MDM) (2010), pp. 189–198
A. Marotta, A.A. Vaisman, Rule-based multidimensional data quality assessment using contexts, in Proceedings of the DaWaK (2016), pp. 299–313
J. Mazón, J. Trujillo, M. Serrano, M. Piattini, Designing data warehouses: from business requirement analysis to multidimensional modeling, in Proceedings of the International Workshop on Requirements Engineering for Business Needs and IT Alignment (2005)
J.N. Mazón, J. Lechtenbörger, J. Trujillo, A survey on summarizability issues in multidimensional modeling. DKE 68(12), 1452–1469 (2009)
A.O. Mendelzon, A.A. Vaisman, Temporal queries in OLAP, in Proceedings of the VLDB (2000), pp. 242–253
R.B. Messaoud, S. Rabaséda, O. Boussaid, F. Bentayeb, OpAC: a new OLAP operator based on a data mining method, in Proceedings of the DB & IS (2004), pp. 417–420
M.A. Naeem, G. Dobbie, G. Weber, S. Alam, R-MESHJOIN for near-real-time data warehousing, in Proceedings DOLAP (2010), pp. 53–60
V. Nebot, R.B. Llavori, J.M. Pérez-Martínez, M.J. Aramburu, T.B. Pedersen, Multidimensional integrated ontologies: a framework for designing semantic data warehouses. J. Data Semant. XIII 13, 1–36 (2009)
T. Niemi, J. Nummenmaa, P. Thanisch, Constructing OLAP cubes based on queries, in Proceedings of the DOLAP (2001), pp. 9–15
T. Niemi, J. Nummenmaa, P. Thanisch, Logical multidimensional database design for ragged and unbalanced aggregation, in Proceedings of the DMDW (2001), p. 7
P. O’Neil, G. Graefe, Multi-table joins through bitmapped join indices. SIGMOD Record 24(3), 8–11 (1995)
C. Ordonez, A. Gurram, N. Rai, Recursive query evaluation in a column DBMS to analyze large graphs, in Proceedings of the DOLAP (2014), pp. 71–80
S. Orlando, R. Orsini, A. Raffaetà, A. Roncato, C. Silvestri, Trajectory data warehouses: design and implementation issues. J. Comput. Sci. Eng. 1(2), 211–232 (2007)
L. Oukid, O. Asfari, F. Bentayeb, N. Benblidia, O. Boussaid, CXT-cube: contextual text cube model and aggregation operator for text OLAP, in Proceedings of the DOLAP (2013), pp. 27–32
D. Papadias, P. Kalnis, J. Zhang, Y. Tao, Efficient OLAP operations in spatial data warehouses, in Proceedings of the SSTD (2001), pp. 443–459
T.B. Pedersen, C.S. Jensen, C.E. Dyreson, A foundation for capturing and querying complex multidimensional data. Inf. Syst. 26(5), 383–423 (2001)
L. Pipino, Y.W. Lee, R.Y. Wang, Data quality assessment. Comm. ACM 45(4), 211–218 (2002)
J. Pokorný, XML data warehouse: modelling and querying, in Proceedings of the DB & IS (2002), pp. 267–280
T. Priebe, G. Pernul, A pragmatic approach to conceptual modeling of OLAP security, in Proceedings of the ER (2001), pp. 311–324
F. Ravat, O. Teste, A temporal object-oriented data warehouse model, in Proceedings of the DEXA (2000), pp. 583–592
S. Rivest, Y. Bedard, M.J. Proulx, M. Nadeau, SOLAP: a new type of user interface to support spatiotemporal multidimensional data exploration and analysis, in Proceedings of the ISPRS Joint Workshop on Spatial, Temporal and Multi-Dimensional Data Modeling and Analysis (2003)
S. Rizzi, A. Abelló, J. Lechtenbörger, J. Trujillo, Research in data warehouse modeling and design: dead or alive?, in Proceedings of the DOLAP (2006), pp. 3–10
S. Rizzi, E. Gallinucci, M. Golfarelli, A. Abelló, O. Romero, Towards exploratory OLAP on linked data, in Proceedings of the SEBD (2016), pp. 86–93
O. Romero, A. Abelló, Multidimensional design by examples, in Proceedings of the DaWaK (2006), pp. 85–94
O. Romero, A. Abelló, Automating multidimensional design from ontologies, in Proceedings of the DOLAP (2007), pp. 1–8
O. Romero, A. Abelló, A survey of multidimensional modeling methodologies. IJDWM 5(2), 1–23 (2009)
A. Rosenthal, E. Sciore, View security as the basis for data warehouse security, in Proceedings of the DMDW (2000), p. 8
C. Sapia, M. Blaschka, G. Höfling, B. Dinter, Extending the E/R model for the multidimensional paradigm, in Proceedings of the ER Workshop on Data Warehousing and Data Mining (1998), pp. 105–116
S. Sarawagi, G. Sathe, i\(^{\text{3}}\): Intelligent, interactive investigation of OLAP data cubes, in Proceedings of the SIGMOD (2000), p. 589
L.C. Scabora, J.J. Brito, R.R. Ciferri, C.D. de Aguiar Ciferri, Physical data warehouse design on NoSQL databases, in Proceedings of the ICEIS (2016), pp. 111–118
M. Serrano, C. Calero, J. Trujillo, S. Luján-Mora, M. Piattini, Empirical validation of metrics for conceptual models of data warehouses, in Proceedings of the CAiSE (2004), pp. 506–520
M. Sifer, A visual interface technique for exploring OLAP data with coordinated dimension hierarchies, in Proceedings of the CIKM (2003), pp. 532–535
Y. Sismanis, A. Deligiannakis, Y. Kotidis, N. Roussopoulos, Hierarchical dwarfs for the rollup cube, in Proceedings of the DOLAP (2003), pp. 17–24
I. Spiegler, R. Maayan, Storage and retrieval considerations of binary data bases. Inf. Process. Manag. 21(3), 233–254 (1985)
K. Stefanidis, E. Pitoura, P. Vassiliadis, Adding context to preferences, in Proceedings of the ICDE (2007), pp. 846–855
C. Stolte, D. Tang, P. Hanrahan, Polaris: a system for query, analysis, and visualization of multidimensional relational databases. TVCG 8(1), 52–65 (2002)
S.Y. Sung, Y. Liu, H. Xiong, P.A. Ng, Privacy preservation for data cubes. KAIS 9(1), 38–61 (2006)
K. Techapichetvanich, A. Datta, Interactive visualization for OLAP, in Proceedings of the ICCSA (2005), pp. 206–214
D. Theodoratos, T. Sellis, Designing data warehouses. DKE 31(3), 279–301 (1999)
D. Theodoratos, M. Bouzeghoub, Data currency quality satisfaction in the design of a data warehouse. IJCIS 10(03), 299–326 (2001)
A. Thusoo et al., Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
R. Torlone, Two approaches to the integration of heterogeneous data warehouses. Distrib. Parallel Databases 23(1), 69–97 (2008)
J. Trujillo, E. Soler, E. Fernández-Medina, M. Piattini, An engineering process for developing secure data warehouses. IST 51(6), 1033–1051 (2009)
A. Vaisman, E. Zimányi, A multidimensional model representing continuous fields in spatial data warehouses, in Proceedings of the SIGSPATIAL (2009), pp. 168–177
A. Vaisman, A. Mendelzon, W. Ruaro, S. Cymerman, Supporting dimension updates in an OLAP server, in Proceedings of the CAiSE (2002), pp. 67–82
P. Valduriez, Join indices. TODS 12(2), 218–246 (1987)
J. Varga, O. Romero, T.B. Pedersen, C. Thomsen, SM4AM: a semantic metamodel for analytical metadata, in Proceedings of the DOLAP (2014), pp. 57–66
P. Vassiliadis, T.K. Sellis, A survey of logical models for OLAP databases. SIGMOD Record 28(4), 64–69 (1999)
P. Vassiliadis, A. Simitsis, P. Georgantas, M. Terrovitis, S. Skiadopoulos, A generic and customizable framework for the design of ETL scenarios. Inf. Syst. 30(7), 492–525 (2005)
L. Wang, D. Wijesekera, S. Jajodia, Cardinality-based inference control in data cubes. J. Comput. Secur. 12(5), 655–692 (2004)
W. Wang, H. Lu, J. Feng, J.X. Yu, Condensed cube: an efficient approach to reducing data cube size, in Proceedings of the ICDE (2002), pp. 155–165
A. Weininger, Efficient execution of joins in a star schema, in Proceedings of the SIGMOD (2002), pp. 542–545
R. Winter, B. Strauch, A method for demand-driven information requirements analysis in data warehousing projects, in Proceedings of the HICSS (2003), pp. 1359–1365
D. Xin, J. Han, P-cube: answering preference queries in multi-dimensional space, in Proceedings of the ICDE (2008), pp. 1092–1100
W.P. Yan, P.B. Larson et al., Eager aggregation and lazy aggregation, in Proceedings of the VLDB, vol. 95 (1995), pp. 345–357
J. Zhang, S. You, L. Gruenwald, High-performance online spatial and temporal aggregations on multi-core CPUs and many-core GPUs, in Proceedings of the DOLAP (2012), pp. 89–96
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Golfarelli, M., Rizzi, S. (2018). From Star Schemas to Big Data: 20\(+\) Years of Data Warehouse Research. In: Flesca, S., Greco, S., Masciari, E., Saccà, D. (eds) A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. Studies in Big Data, vol 31. Springer, Cham. https://doi.org/10.1007/978-3-319-61893-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-61893-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61892-0
Online ISBN: 978-3-319-61893-7
eBook Packages: EngineeringEngineering (R0)