Skip to main content

Approximate OLAP on Sustained Data Streams

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10178))

Included in the following conference series:

Abstract

Many organizations require detailed and real time analysis of their business data for effective decision making. OLAP is one of the commonly used methods for the analysis of static data and has been studied by many researchers. OLAP is also applicable to data streams, however the requirement to produce real time analysis on fast and evolving data streams is not possible unless the data to be analysed reside on memory. Keeping in view the limited size and the volatile nature of the memory, we propose a novel architecture AOLAP which in addition to storing raw data streams to the secondary storage, maintains data stream’s summaries in a compact memory-based data structure. This work proposes the use of piece-wise linear approximation (PLA) for storing such data summaries corresponding to each materialized node in the OLAP cube. Since the PLA is a compact data structure, it can store the long data streams’ summaries in comparatively smaller space and can give approximate answers to OLAP queries.

OLAP analysts query different nodes in the OLAP cube interactively. To support such analysis by the PLA-based data cube without the unnecessary amplification of querying errors, inherent in the PLA structure, many nodes should be materialized. However, even though each PLA structure is compact, it is impossible to materialize all the nodes in the OLAP cube. Thus, we need to select the best set of materialized nodes which can give query results with the minimum approximation errors within the given memory bound. This problem is NP-hard. Hence this work also proposes an optimization scheme to support this selection. Detailed experimental evaluation is performed to prove the effectiveness of the use of PLA structure and the optimization scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    TPC-H. http://www.tpc.org/tpch/.

References

  1. Han, J., Chen, Y., et al.: Stream cube: an architecture for multi-dimensional analysis of data streams. Distrib. Parallel Databases 18(2), 173–197 (2005)

    Article  Google Scholar 

  2. Duan, Q., Wang, P., Wu, M.X., Wang, W., Huang, S.: Approximate query on historical stream data. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011. LNCS, vol. 6861, pp. 128–135. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23091-2_12

    Chapter  Google Scholar 

  3. Sadoghi, M., Bhattacherjee, S., Bhattacharjee, B., Canim, M.: L-store: a real-time OLTP and OLAP system. In: CoRR (2016)

    Google Scholar 

  4. Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. Proc. VLDB Endow. 2(1), 145–156 (2009)

    Article  Google Scholar 

  5. Xie, Q., Zhu, J., et al.: Efficient buffer management for piecewise linear representation of multiple data streams. In: ACM CIKM, pp. 2114–2118 (2012)

    Google Scholar 

  6. Luo, G., Yi, K., Cheng, S.W., Li, Z., Fan, W., He, C., Mu, Y.: Piecewise linear approximation of streaming time series data with max-error guarantees. In: 2015 IEEE 31st ICDE, pp. 173–184 (2015)

    Google Scholar 

  7. Wei, Z., Luo, G., Yi, K., Du, X., Wen, J.-R.: Persistent data sketching. In: ACM SIGMOD 2015, pp. 795–810 (2015)

    Google Scholar 

  8. Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. In: ACM SIGMOD, pp. 205–216 (1996)

    Google Scholar 

  9. O’Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24(9), 574–578 (1981)

    Article  MATH  Google Scholar 

  10. O’Neil, P., O’Neil, E., Chen, X., Revilak, S.: The Star Schema Benchmark and Augmented Fact Table Indexing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 237–252. Springer, Heidelberg (2009). doi:10.1007/978-3-642-10424-4_17

    Google Scholar 

  11. Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: ACM PODS (2004)

    Google Scholar 

  12. Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: PVLDB, pp. 421–432 (2005)

    Google Scholar 

  13. De Rougemont, M., Cao, P.T.: Approximate answers to OLAP queries on streaming data warehouses. In: Proceedings of the Fifteenth International Workshop on Data Warehousing and OLAP, pp. 121–128 (2012)

    Google Scholar 

  14. Talebi, Z.A., Chirkova, R., Fathi, Y., Stallmann, M.: Exact and inexact methods for selecting views and indexes for OLAP performance improvement. In: EDBT (2008)

    Google Scholar 

  15. Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D., Zhou, P.: Streaming multiple aggregations using phantoms. VLDB J. 19(4), 557–583 (2010)

    Article  Google Scholar 

  16. Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Lupei, D., Shaikhha, A.: DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J. 23(2), 253–278 (2014)

    Article  Google Scholar 

  17. Lazaridis, I., Mehrotra, S.: Capturing sensor-generated time series with quality guarantees. In: ICDE, pp. 429–440 (2003)

    Google Scholar 

  18. Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: ACM SIGMOD, pp. 563–574 (2003)

    Google Scholar 

  19. Gray, J., Chaudhuri, S., et al.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)

    Article  Google Scholar 

Download references

Acknowledgment

This research was partly supported by the program “Research and Development on Real World Big Data Integration and Analysis” of the RIKEN, Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salman Ahmed Shaikh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Shaikh, S.A., Kitagawa, H. (2017). Approximate OLAP on Sustained Data Streams. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55699-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55698-7

  • Online ISBN: 978-3-319-55699-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics