Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering

Ceci, Michelangelo; Cuzzocrea, Alfredo; Malerba, Donato

doi:10.1007/s10844-013-0268-1

Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering

Published: 02 August 2013

Volume 44, pages 309–333, (2015)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Michelangelo Ceci¹,
Alfredo Cuzzocrea² &
Donato Malerba¹

680 Accesses
32 Citations
Explore all metrics

Abstract

In traditional OLAP systems, roll-up and drill-down operations over data cubes exploit fixed hierarchies defined on discrete attributes, which play the roles of dimensions, and operate along them. New emerging application scenarios, such as sensor networks, have stimulated research on OLAP systems, where even continuous attributes are considered as dimensions of analysis, and hierarchies are defined over continuous domains. The goal is to avoid the prior definition of an ad-hoc discretization hierarchy along each OLAP dimension. Following this research trend, in this paper we propose a novel method, founded on a density-based hierarchical clustering algorithm, to support roll-up and drill-down operations over OLAP data cubes with continuous dimensions. The method hierarchically clusters dimension instances by also taking fact-table measures into account. Thus, we enhance the clustering effect with respect to the possible analysis. Experiments on two well-known multidimensional datasets clearly show the advantages of the proposed solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VOX2BIM+ - A Fast and Robust Approach for Automated Indoor Point Cloud Segmentation and Building Model Generation

Article Open access 30 May 2023

Complex Pythagorean Hesitant Fuzzy Aggregation Operators Based on Aczel-Alsina t-Norm and t-Conorm and Their Applications in Decision-Making

Article 10 April 2024

Requirements of Data Visualisation Tools to Analyse Big Data: A Structured Literature Review

Notes

In our implementation, the clustering algorithm used in the third phase is the well-known DBSCAN (Ester et al. 1996) algorithm which performs a density-based clustering.
http://sourceforge.net/projects/mondrian/files/mondrian/
http://people.sc.fsu.edu/~jburkardt/datasets/spaeth/spaeth.html
http://www.tpc.org/tpch/
http://www.google.com/squared
http://jpivot.sourceforge.net

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In J.B. Bocca, M. Jarke, C. Zaniolo (Eds.), VLDB’94, Proceedings of 20th international conference on very large data bases, 12–15 Sept 1994, Santiago de Chile, Chile (pp. 487–499). Morgan Kaufmann.
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P. (2005). Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery, 11(1), 5–33.
Article MathSciNet Google Scholar
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I., Zaharia, M. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50–58.
Article Google Scholar
Broder, A.Z. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10.
Article Google Scholar
Cattell, R. (2010). Scalable sql and nosql data stores. SIGMOD Record, 39(4), 12–27.
Article Google Scholar
Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and olap technology. SIGMOD Record, 26(1), 65–74.
Article Google Scholar
Chen, Q., Dayal, U., Hsu, M. (2000). An olap-based scalable web access analysis engine. In Y. Kambayashi, M.K. Mohania, A.M. Tjoa (Eds.), DaWaK, Lecture notes in computer science (Vol. 1874, pp. 210–223). Springer.
Cuzzocrea, A. (2006). Improving range-sum query evaluation on data cubes via polynomial approximation. Data and Knowledge Engineering, 56(2), 85–121.
Article Google Scholar
Cuzzocrea, A., & Serafino, P. (2011). Clustcube: An olap-based framework for clustering and mining complex database objects. In SAC.
Cuzzocrea, A., & Wang, W. (2007). Approximate range-sum query answering on data cubes with probabilistic guarantees. Journal of Intelligent Information Systems, 28(2), 161–197.
Article Google Scholar
Cuzzocrea, A., Saccà, D., Serafino, P. (2007). Semantics-aware advanced olap visualization of multidimensional data cubes. International Journal of Data Warehousing and Mining, 3(4), 1–30.
Article Google Scholar
Cuzzocrea, A., Furfaro, F., Saccà, D. (2009). Enabling olap in mobile environments via intelligent data cube compression techniques. Journal of Intelligent Information Systems, 33(2), 95–143.
Article Google Scholar
Delis, A., Faloutsos, C., Ghandeharizadeh, S., (Eds.) (1999). In SIGMOD 1999, proceedings ACM SIGMOD international conference on management of data, 1–3 June 1999. Philadelphia, PA: ACM Press.
Google Scholar
Dong, G., Han, J., Lam, J.M.W., Pei, J., Wang, K. (2001). Mining multi-dimensional constrained gradients in data cubes. In P.M.G. Apers, P. Atzeni, S. Ceri, S. Paraboschi, K. Ramamohanarao, R.T. Snodgrass (Eds.), VLDB (pp. 321–330). Morgan Kaufmann.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD (pp. 226–231).
Ester, M., Kriegel, H.-P., Sander, J., Wimmer, M., Xu, X. (1998). Incremental clustering for mining in a data warehousing environment. In A. Gupta, O. Shmueli, J. Widom (Eds.), VLDB (pp. 323–333). Morgan Kaufmann.
Gao, B., Liu, T.-Y., Ma, W.-Y. (2006). Star-structured high-order heterogeneous data co-clustering based on consistent information theory. In Proceedings of the 6th International Conference on Data Mining, ICDM ’06 (pp. 880–884). Washington, DC: IEEE Computer Society.
Google Scholar
Goil, S., & Choudhary, A.N. (2001). Parsimony: an infrastructure for parallel multidimensional analysis and data mining. Journal of Parallel and Distributed Computing, 61(3), 285–321.
Article MATH Google Scholar
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H. (1997). Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Mining and Knowledge Discovery, 1(1), 29–53.
Article Google Scholar
Guha, S., Rastogi, R., Shim, K. (2001). Cure: an efficient clustering algorithm for large databases. Information Systems, 26(1), 35–58.
Article MATH Google Scholar
Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C. (2005). Selectivity estimators for multidimensional range queries over real attributes. VLDB Journal, 14(2), 137–154.
Article Google Scholar
Han, J. (1998). Towards on-line analytical mining in large databases. SIGMOD Record, 27(1), 97–107.
Article Google Scholar
Han, J., Chee, S.H.S., Chiang, J.Y. (1998). Issues for on-line analytical mining of data warehouses (extended abstract). In SIGMOD’98 workshop on research issues on Data Mining and Knowledge Discovery (DMKD’98).
Hinneburg, A., & Keim, D.A. (1999). Clustering methods for large databases: From the past to the future. In A. Delis, C. Faloutsos, S. Ghandeharizadeh (Eds.), SIGMOD 1999, Proceedings ACM SIGMOD international conference on management of data, 1–3 June 1999, Philadelphia, PA, USA (p. 509). ACM Press.
Ienco, D., Robardet, C., Pensa, R., Meo, R. (2012). Parameter-less co-clustering for star-structured heterogeneous data. Data Mining and Knowledge Discovery, 26(2), 1–38.
MathSciNet Google Scholar
Imieliński, T., Khachiyan, L., Abdulghani, A. (2002). Cubegrades: generalizing association rules. Data Mining and Knowledge Discovery, 6(3), 219–257.
Article MathSciNet Google Scholar
Kotidis, Y., & Roussopoulos, N. (2013). Dynamat: A dynamic view management system for data warehouses. In A. Delis, C. Faloutsos, S. Ghandeharizadeh (Eds.), SIGMOD 1999, proceedings ACM SIGMOD international conference on management of data, 1–3 June 1999, Philadelphia, PA, USA (pp. 371–382). ACM Press.
Kriegel, H.-P., Kröger, P., Zimek, A. (2009). Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. Transactions on Knowledge Discovery from Data, 3(1), Article 1.
Messaoud, R.B., Rabaséda, S.L., Boussaid, O., Missaoui, R. (2006). Enhanced mining of association rules from data cubes. In I.-Y. Song, P. Vassiliadis (Eds.), DOLAP (pp. 11–18). ACM.
Ng, R.T. & Han, J. (2002). Clarans: a method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14(5), 1003–1016.
Article Google Scholar
Parsaye, K. (1997). Olap and data mining: bridging the gap. Database Programming and Design, 10, 30–37.
Google Scholar
Pio, G., Ceci, M., Loglisci, C., D’Elia, D., Malerba, D. (2012). Hierarchical and overlapping co-clustering of mrna: mirna interactions. In L.D. Raedt, C. Bessière, D. Dubois, P. Doherty, P. Frasconi, F. Heintz, P.J.F. Lucas (Eds.), ECAI, frontiers in artificial intelligence and applications (Vol. 242, pp. 654–659). IOS Press.
Pio, G., Ceci, M., D’Elia, D., Loglisci, C., Malerba, D. (2013). A novel biclustering algorithm for the discovery of meaningful biological correlations between micrornas and their target genes. BMC Bioinformatics, 14(Suppl 7), S8.
Sarawagi, S. (2001). idiff: Informative summarization of differences in multidimensional aggregates. Data Mining and Knowledge Discovery, 5(4), 255–276.
Article MATH Google Scholar
Sarawagi, S., Agrawal, R., Megiddo, N. (1998). Discovery-driven exploration of olap data cubes. In H.-J. Schek, F. Saltor, I. Ramos, G. Alonso (Eds.), EDBT, Lecture notes in computer science (Vol. 1377, pp. 168–182). Springer.
Shanmugasundaram, J., Fayyad, U.M., Bradley, P.S. (1999). Compressed data cubes for olap aggregate query approximation on continuous dimensions. In KDD (pp. 223–232).
Sheikholeslami, G., Chatterjee, S., Zhang, A. (2000). Wavecluster: a wavelet based clustering approach for spatial data in very large databases. VLDB Journal, 8(3–4), 289–304.
Article Google Scholar
SPAETH (2013). Cluster Analysis Datasets. Available at: http://people.sc.fsu.edu/~jburkardt/datasets/spaeth/spaeth.html.
Stojanova, D., Ceci, M., Appice, A., Dzeroski, S. (2011). Network regression with predictive clustering trees. In D. Gunopulos, T. Hofmann, D. Malerba, M. Vazirgiannis (Eds.), ECML/PKDD (3), Lecture notes in computer science (Vol. 6913, pp. 333–348). Springer.
Stojanova, D., Ceci, M., Appice, A., Dzeroski, S. (2012). Network regression with predictive clustering trees. Data Mining and Knowledge Discovery, 25(2), 378–413.
Article MATH MathSciNet Google Scholar
Vens, C., Schietgat, L., Struyf, J., Blockeel, H., Kocev, D., Dzeroski, S. (2010). Predicting gene functions using predictive clustering trees. Springer.
Watson, H.J., & Wixom, B. (2007). The current state of business intelligence. IEEE Computer, 40(9), 96–99.
Article Google Scholar
Yin, X., Han, J., Yu, P.S. (2007). Crossclus: user-guided multi-relational clustering. Data Mining and Knowledge Discovery, 15(3), 321–348.
Article MATH MathSciNet Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M. (1996). Birch: An efficient data clustering method for very large databases. In H. V. Jagadish, I. S. Mumick (Eds.), SIGMOD conference (pp. 103–114). ACM Press.
Zhu, H. (1998). On-line analytical mining of association rules. M.Sc. thesis, Computing Science, Simon Fraser University.

Download references

Acknowledgements

The authors thank Lynn Rudd for reading through the paper. This work is in partial fulfillment of the requirements of the Italian project VINCENTE PON02_00563_3470993 “A Virtual collective INtelligenCe ENvironment to develop sustainable Technology Entrepreneurship ecosystems”.

Author information

Authors and Affiliations

University of Bari “Aldo Moro”, Via Orabona, 4, 70125, Bari, Italy
Michelangelo Ceci & Donato Malerba
ICAR-CNR and University of Calabria, Via P. Bucci, 41C, 87036, Rende, Cosenza, Italy
Alfredo Cuzzocrea

Authors

Michelangelo Ceci
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Cuzzocrea
View author publications
You can also search for this author in PubMed Google Scholar
Donato Malerba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michelangelo Ceci.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ceci, M., Cuzzocrea, A. & Malerba, D. Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering. J Intell Inf Syst 44, 309–333 (2015). https://doi.org/10.1007/s10844-013-0268-1

Download citation

Received: 15 October 2012
Revised: 12 July 2013
Accepted: 12 July 2013
Published: 02 August 2013
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10844-013-0268-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering

Abstract

Access this article

Similar content being viewed by others

VOX2BIM+ - A Fast and Robust Approach for Automated Indoor Point Cloud Segmentation and Building Model Generation

Complex Pythagorean Hesitant Fuzzy Aggregation Operators Based on Aczel-Alsina t-Norm and t-Conorm and Their Applications in Decision-Making

Requirements of Data Visualisation Tools to Analyse Big Data: A Structured Literature Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering

Abstract

Access this article

Similar content being viewed by others

VOX2BIM+ - A Fast and Robust Approach for Automated Indoor Point Cloud Segmentation and Building Model Generation

Complex Pythagorean Hesitant Fuzzy Aggregation Operators Based on Aczel-Alsina t-Norm and t-Conorm and Their Applications in Decision-Making

Requirements of Data Visualisation Tools to Analyse Big Data: A Structured Literature Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation