Skip to main content

Abstract

Over the past decade’s several new concepts emerged to organize and query data over large Data Warehouse (DW) system with the same primary objective, that is, optimize processing speed. More recently, with the rise of BigData concept, storage cost lowered significantly, and performance (random accesses) increased, particularly with modern SSD disks. This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern. By de-normalizing the entire data schema (transparent to the user) it is proposed a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE. The proposed data model also allows easy partitioning and distributed processing to enable execution parallelism, boosting performance, as happens in MapReduce. TPC-H benchmark is used to evaluate storage space and query performance. Results show predictable performance when comparing with approaches based on a normalized relational schema, and MapReduce oriented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chaudhuri, S., Das, G., Narasayya, V.: Optimized stratified sampling for approximate query processing. ACM Trans. Database Syst. (TODS) 32(2), 9 (2007)

    Article  Google Scholar 

  2. Cheng, D., Zhou, X., Lama, P., Wu, J., Jiang, C.: Cross-platform resource scheduling for Spark and MapReduce on YARN. IEEE Trans. Comput. 66, 1341–1353 (2017)

    Article  MathSciNet  Google Scholar 

  3. Council, Transaction Processing Performance: TPC-H benchmark specification, vol. 21, pp. 592–603 (2008). http://www.tcp.org

  4. DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems, vol. 14. ACM (1984)

    Google Scholar 

  5. Harris, E.P., Ramamohanarao, K.: Join algorithm costs revisited. VLDB J.—Int. J. Very Large Data Bases 5(1), 064–084 (1996)

    Article  Google Scholar 

  6. Kimball, R.: The Data Warehouse Lifecycle Toolkit. Wiley, Hoboken (2008)

    Google Scholar 

  7. Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)

    Article  Google Scholar 

  8. Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69(1), 3–28 (2010)

    Article  Google Scholar 

  9. Mutharaju, R., Maier, F., Hitzler, P.: A MapReduce algorithm for SC. In: 23rd International Workshop on Description Logics DL2010, p. 456 (2010)

    Google Scholar 

  10. O’Neil, P., O’Neil, E., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 237–252. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10424-4_17

    Chapter  Google Scholar 

  11. Patel, J.M., Carey, M.J., Vernon, M.K.: Accurate modeling of the hybrid hash join algorithm. In: ACM SIGMETRICS Performance Evaluation Review, vol. 22, pp. 56–66. ACM (1994)

    Google Scholar 

  12. Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178. ACM (2009)

    Google Scholar 

  13. Pinto, Y.: A framework for systematic database denormalization. Glob. J. Comput. Sci. Technol. 9(4), 44–52 (2009)

    Google Scholar 

  14. Roy, S., Shit, B., Sen, S.: Association based multi-attribute analysis to construct materialized view. In: Chaki, R., Saeed, K., Cortesi, A., Chaki, N. (eds.) Advanced Computing and Systems for Security. AISC, vol. 567, pp. 115–131. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-3409-1_8

    Chapter  Google Scholar 

  15. Sanders, G.L., Shin, S.: Denormalization effects on performance of RDBMS. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences 2001, p. 9. IEEE (2001)

    Google Scholar 

  16. Zaker, M., Phon-Amnuaisuk, S., Haw, S.C.: Optimizing the data warehouse design by hierarchical denormalizing. In: Proceedings of the 8th Conference on Applied Computer Scince, pp. 131–138. World Scientific and Engineering Academy and Society (WSEAS) (2008)

    Google Scholar 

  17. Zhang, Y., Hu, W., Wang, S.: MOSS-DB: a hardware-aware OLAP database. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 582–594. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14246-8_57

    Chapter  Google Scholar 

Download references

Acknowledgements

This work is financed by national funds through FCT - Fundação para a Ciência e Tecnologia, I.P., under the project UID/Multi/04016/2016. Furthermore, we would like to thank the Instituto Politécnico de Viseu and CI&DETS for their support.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Maryam Abbasi , Pedro Martins , José Cecílio , João Costa or Pedro Furtado .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abbasi, M., Martins, P., Cecílio, J., Costa, J., Furtado, P. (2018). SINGLE vs. MapReduce vs. Relational: Predicting Query Execution Time. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99987-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99986-9

  • Online ISBN: 978-3-319-99987-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics