Abstract
Large volumes of dynamic stream data pose great challenges to its analysis. Besides its dynamic and transient behavior, stream data has another important characteristic: multi-dimensionality. Much of stream data resides at a multidimensional space and at rather low level of abstraction, whereas most analysts are interested in relatively high-level dynamic changes in some combination of dimensions. To discover high-level dynamic and evolving characteristics, one may need to perform multi-level, multi-dimensional on-line analytical processing (OLAP) of stream data. Such necessity calls for the investigation of new architectures that may facilitate on-line analytical processing of multi-dimensional stream data.
In this chapter, we introduce an interesting stream_cube architecture that effectively performs on-line partial aggregation of multi-dimensional stream data, captures the essential dynamic and evolving characteristics of data streams, and facilitates fast OLAP on stream data. Three important techniques are proposed for the design and implementation of stream cubes. First, a tilted time frame model is proposed to register time-related data in a multi-resolution model: The more recent data are registered at finer resolution, whereas the more distant data are registered at coarser resolution. This design reduces the overall storage requirements of time-related data and adapts nicely to the data analysis tasks commonly encountered in practice. Second, instead of materializing cuboids at all levels, two critical layers: observation layer and minimal interesting layer, are maintained to support routine as well as flexible analysis with minimal computation cost. Third, an efficient stream data cubing algorithm is developed that computes only the layers (cuboids) along a popular path and leaves the other cuboids for on-line, query-driven computation. Based on this design methodology, stream data cube can be constructed and maintained incrementally with reasonable memory space, computation cost, and query response time. This is verified by our substantial performance study.
Stream cube architecture facilitates online analytical processing of stream data. It also forms a preliminary structure for online stream mining. The impact of the design and implementation of stream cube in the context of stream mining is also discussed in the chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf. Very Large Data Bases (VLDB’96), pages 506–521, Bombay, India, Sept. 1996.
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proc. 2003 Int. Conf. Very Large Data Bases (VLDB’03), pages 81–92, Berlin, Germany, Sept. 2003.
C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for projected clustering of high dimensional data streams. In Proc. 2004 Int. Conf. Very Large Data Bases (VLDB’04), pages 852–863, Toronto, Canada, Aug. 2004.
C. Aggarwal, J. Han, J. Wang, and P. S. Yu. On demand classification of data streams. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’ 04), pages 503–508, Seattle, WA, Aug. 2004.
R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE’ 95), pages 3–14, Taipei, Taiwan, Mar. 1995.
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. 2002 ACM Symp. Principles of Database Systems (PODS’02), pages 1–16, Madison, WI, June 2002.
K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pages 359–370, Philadelphia, PA, June 1999.
S. Babu and J. Widom. Continuous queries over data streams. SIGMOD Record, 30:109–120, 2001.
B.-C. Chen, L. Chen, Y. Lin, and R. Ramakrishnan. Prediction cubes. In Proc. 2005 Int. Conf. Very Large Data Bases (VLDB’ 05), pages 982–993, Trondheim, Norway, Aug. 2005.
Y. D. Cai, D. Clutter, G. Pape, J. Han, M. Welge, and L. Auvil. MAIDS: Mining alarming incidents from data streams. In Proc. 2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’ 04), pages 919–920, Paris, France, June 2004.
S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26:65–74, 1997.
Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In Proc. 2002 Int. Conf. Very Large Data Bases (VLDB’02), pages 323–334, Hong Kong, China, Aug. 2002.
G. Dong, J. Han, J. Lam, J. Pei, and K. Wang. Mining multi-dimensional constrained gradients in data cubes. In Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB’01), pages 321–330, Rome, Italy, Sept. 2001.
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29–54, 1997.
C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu. Mining frequent patterns in data streams at multiple time granularities. In H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha, editors, Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT Press, 2004.
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’ 01), pages 58–66, Santa Barbara, CA, May 2001.
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB’ 01), pages 79–88, Rome, Italy, Sept. 2001.
J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continuous data streams. In Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’01), pages 13–24, Santa Barbara, CA, May 2001.
S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In Proc. 2000 Symp. Foundations of Computer Science (FOCS’00), pages 359–366, Redondo Beach, CA, 2000.
J. Han, J. Pei, G. Dong, and K. Wang. Efficient computation of iceberg cubes with complex measures. In Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’ 01), pages 1–12, Santa Barbara, CA, May 2001.
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pages 205–216, Montreal, Canada, June 1996.
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. 2001 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’ 01), San Fransisco, CA, Aug. 2001.
T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing association rules. Data Mining and Knowledge Discovery, 6:219–258, 2002.
X. Li, J. Han, and H. Gonzalez. High-dimensional OLAP: A minimal cubing approach. In Proc. 2004 Int. Conf. Very Large Data Bases (VLDB’04), pages 528–539, Toronto, Canada, Aug. 2004.
G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. 2002 Int. Conf. Very Large Data Bases (VLDB’ 02), pages 346–357, Hong Kong, China, Aug. 2002.
S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data cubes. In Proc. Int. Conf of Extending Database Technology (EDBT’98), pages 168–182, Valencia, Spain, Mar. 1998.
Z. Shao, J. Han, and D. Xin. MM-Cubing: Computing iceberg cubes by factorizing the lattice space. In Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM’04), pages 213–222, Santorini Island, Greece, June 2004.
G. Sathe and S. Sarawagi. Intelligent rollups in multidimensional OLAP data. In Proc. 2001 Int. Conf. Very Large Data Bases (VLDB’01), pages 531–540, Rome, Italy, Sept. 2001.
H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD’03), pages 226–235, Washington, DC, Aug. 2003.
D. Xin, J. Han, X. Li, and B. W. Wah. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In Proc. 2003 Int. Conf Very Large Data Bases (VLDB’03), pages 476–487, Berlin, Germany, Sept. 2003.
Y. Zhao, P. Deshpande, and J. Naughton. An array-based algorithm for simultaneous multi-dimensional aggregates. In Proc. ACM-SIGMOD International Conference on Management of Data, pages 159–170, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Han, J. et al. (2007). Multi-Dimensional Analysis of Data Streams Using Stream Cubes. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_6
Download citation
DOI: https://doi.org/10.1007/978-0-387-47534-9_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-28759-1
Online ISBN: 978-0-387-47534-9
eBook Packages: Computer ScienceComputer Science (R0)