Skip to main content

Enhancing Big Data Warehousing for Efficient, Integrated and Advanced Analytics

Visionary Paper

  • Conference paper
  • First Online:
Information Systems Engineering in Responsible Information Systems (CAiSE 2019)

Abstract

The existing capacity to collect, store, process and analyze huge amounts of data that is rapidly generated, i.e., Big Data, is characterized by fast technological developments and by a limited set of conceptual advances that guide researchers and practitioners in the implementation of Big Data systems. New data stores or processing tools frequently appear, proposing new (and usually more efficient) ways to store and query data (like SQL-on-Hadoop). Although very relevant, the lack of common methodological guidelines or practices has motivated the implementation of Big Data systems based on use-case driven approaches. This is also the case for one of the most valuable organizational data assets, the Data Warehouse, which needs to be rethought in the way it is designed, modeled, implemented, managed and monitored. This paper addresses some of the research challenges in Big Data Warehousing systems, proposing a vision that looks into: (i) the integration of new business processes and data sources; (ii) the proper way to achieve this integration; (iii) the management of these complex data systems and the enhancement of their performance; (iv) the automation of some of their analytical capabilities with Complex Event Processing and Machine Learning; and, (v) the flexible and highly customizable visualization of their data, providing an advanced decision-making support environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)

    Article  Google Scholar 

  2. Dumbill, E.: Making sense of big data. Big Data 1, 1–2 (2013)

    Article  Google Scholar 

  3. Gandomi, A., Haider, M.: Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 35, 137–144 (2015)

    Article  Google Scholar 

  4. Philip Chen, C.L., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  5. Costa, C., Santos, M.Y.: Big data: state-of-the-art concepts, techniques, technologies, modeling approaches and research challenges. IAENG Int. J. Comput. Sci. 44, 285–301 (2017)

    Google Scholar 

  6. NBD-PWG: NIST Big Data Interoperability Framework (2015)

    Google Scholar 

  7. Krishnan, K.: Data Warehousing in the Age of Big Data. Elsevier, Burlington (2013)

    Google Scholar 

  8. Costa, C., Santos, M.Y.: Evaluating several design patterns and trends in big data warehousing systems. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 459–473. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_28

    Chapter  Google Scholar 

  9. Santos, M.Y., et al.: A Big Data system supporting Bosch Braga Industry 4.0 strategy. Int. J. Inf. Manag. 37, 750–760 (2017)

    Article  Google Scholar 

  10. Costa, C., Andrade, C., Santos, M.Y.: Big data warehouses for smart industries. In: Sakr, S., Zomaya, A. (eds.) Encyclopedia of Big Data Technologies, pp. 1–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63962-8_204-1

    Chapter  Google Scholar 

  11. Kimball, R., Ross, M.: The Data Warehouse Toolkit: The definitive Guide to Dimensional Modeling. Wiley, Indianapolis (2013)

    Google Scholar 

  12. Clegg, D.: Evolving data warehouse and BI architectures: the big data challenge. TDWI Bus. Intell. J. 20, 19–24 (2015)

    Google Scholar 

  13. Russom, P.: Data Warehouse Modernization in the Age of Big Data Analytics (2016)

    Google Scholar 

  14. Russom, P.: Evolving Data Warehouse Architectures in the Age of Big Data (2014)

    Google Scholar 

  15. Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Document-oriented models for data warehouses - NoSQL document-oriented for data warehouses. In: Proceedings of the 18th International Conference on Enterprise Information Systems, Rome, Italy, pp. 142–149 (2016). https://doi.org/10.5220/0005830801420149

  16. Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementing multidimensional data warehouses into NoSQL. In: 17th International Conference on Enterprise Information Systems (ICEIS), Barcelona, Spain (2015)

    Google Scholar 

  17. Gröger, C., Schwarz, H., Mitschang, B.: The deep data warehouse: link-based integration and enrichment of warehouse data and unstructured content. In: IEEE 18th International Enterprise Distributed Object Computing Conference (EDOC), pp. 210–217 (2014)

    Google Scholar 

  18. Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Record. 39, 12 (2011)

    Article  Google Scholar 

  19. Thusoo, A., et al.: Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010)

    Google Scholar 

  20. Pandis, I.: Impala: a modern, open-source SQL engine for hadoop. In: 7th Biennial Conference on Innovative Data Systems Research (CIDR), p. 10 (2015)

    Google Scholar 

  21. Huai, Y., et al.: Major technical advancements in apache hive. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD 2014, pp. 1235–1246. ACM Press, Snowbird (2014). https://doi.org/10.1145/2588555.2595630

  22. Li, X., Mao, Y.: Real-Time data ETL framework for big real-time data analysis. In: 2015 IEEE International Conference on Information and Automation, pp. 1289–1294. IEEE, Lijiang (2015). https://doi.org/10.1109/ICInfA.2015.7279485

  23. Song, J., Guo, C., Wang, Z., Zhang, Y., Yu, G., Pierson, J.-M.: HaoLap: a hadoop based OLAP system for big data. J. Syst. Softw. 102, 167–181 (2015)

    Article  Google Scholar 

  24. Wang, H., et al.: Efficient query processing framework for big data warehouse: an almost join-free approach. Front. Comput. Sci. 9, 224–236 (2015)

    Article  MathSciNet  Google Scholar 

  25. Tria, F.D., Lefons, E., Tangorra, F.: A framework for evaluating design methodologies for big data warehouses: measurement of the design process. Int. J. Data Warehouse. Min. 14(1), 15–39 (2018)

    Article  Google Scholar 

  26. Costa, C., Santos, M.Y.: The SusCity big data warehousing approach for smart cities. In: Proceedings of International Database Engineering & Applications Symposium. Bristol, United Kingdom (2017). https://doi.org/10.1145/3105831.3105841

  27. Costa, E., Costa, C., Santos, M.Y.: Efficient big data modelling and organization for hadoop hive-based data warehouses. In: Themistocleous, M., Morabito, V. (eds.) EMCIS 2017. LNBIP, vol. 299, pp. 3–16. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65930-5_1

    Chapter  Google Scholar 

  28. Rodrigues, M., Santos, M.Y., Bernardino, J.: Big data processing tools: an experimental performance evaluation. WIREs Data Min. Knowl. Discov. 9(2), e1297 (2019)

    Article  Google Scholar 

  29. Santos, M.Y., et al.: Evaluating SQL-on-hadoop for big data warehousing on not-so-good hardware. In: Proceedings of International Database Engineering & Applications Symposium (IDEAS 2017), pp. 242–252. ACM Press (2017). https://doi.org/10.1145/3105831.3105842

  30. León Palacio, A., Pastor López, Ó.: Smart data for genomic information systems: the SILE method. Complex Syst. Inf. Model. Q. 1–23 (2018). https://doi.org/10.7250/csimq.2018-17.01

  31. Palacio, A.L., López, Ó.P., Ródenas, J.C.C.: A method to identify relevant genome data: conceptual modeling for the medicine of precision. In: Trujillo, J.C., Davis, K.C., Du, X., Li, Z., Ling, T.W., Li, G., Lee, M.L. (eds.) ER 2018. LNCS, vol. 11157, pp. 597–609. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_44

    Chapter  Google Scholar 

  32. Hui, J., Li, L., Zhang, Z.: Integration of big data: a survey. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds.) ICPCSEE 2018. CCIS, vol. 901, pp. 101–121. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2203-7_9

    Chapter  Google Scholar 

  33. Maccioni, A., Torlone, R.: KAYAK: a framework for just-in-time data preparation in a data lake. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 474–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_29

    Chapter  Google Scholar 

  34. Flouris, I., Giatrakos, N., Deligiannakis, A., Garofalakis, M., Kamp, M., Mock, M.: Issues in complex event processing: status and prospects in the Big Data era. J. Syst. Softw. 127, 217–236 (2017). https://doi.org/10.1016/j.jss.2016.06.011

    Article  Google Scholar 

  35. Zhang, P., Shi, X., Khan, S.U.: QuantCloud: enabling big data complex event processing for quantitative finance through a data-driven execution. IEEE Trans. Big Data (2018). https://doi.org/10.1109/TBDATA.2018.2847629

  36. Hadar, E.: BIDCEP: a vision of big data complex event processing for near real-time data streaming: position paper, a practitioner view. In: CAiSE 2016 Industry Track, CEUR Workshop Proceedings (2016)

    Google Scholar 

  37. Flouris, I., et al.: FERARI: a prototype for complex event processing over streaming multi-cloud platforms. In: Proceedings of the 2016 International Conference on Management of Data - SIGMOD 2016, pp. 2093–2096. ACM Press, San Francisco (2016). https://doi.org/10.1145/2882903.2899395

  38. Bikakis, N.: Big data visualization tools. In: Sakr, S., Zomaya, A. (eds.) Encyclopedia of Big Data Technologies. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63962-8_109-1

    Chapter  Google Scholar 

  39. Iñiguez-Jarrín, C., Panach, J.I., Pastor López, O.: Defining interaction design patterns to extract knowledge from big data. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 490–504. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_30

    Chapter  Google Scholar 

Download references

Acknowledgements

This work has been supported by FCT – Fundação para a Ciência e Tecnologia, Projects Scope UID/CEC/00319/2019 and PDE/00040/2013, and the Doctoral scholarships PD/BDE/135100/2017 and PD/BDE/135101/2017. We also thank both the Spanish State Research Agency and the Generalitat Valenciana under the projects DataME TIN2016-80811-P, ACIF/2018/171, and PROMETEO/2018/176. This paper uses icons made by Freepik, from www.flaticon.com.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maribel Yasmina Santos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Santos, M.Y., Costa, C., Galvão, J., Andrade, C., Pastor, O., Marcén, A.C. (2019). Enhancing Big Data Warehousing for Efficient, Integrated and Advanced Analytics. In: Cappiello, C., Ruiz, M. (eds) Information Systems Engineering in Responsible Information Systems. CAiSE 2019. Lecture Notes in Business Information Processing, vol 350. Springer, Cham. https://doi.org/10.1007/978-3-030-21297-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21297-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21296-4

  • Online ISBN: 978-3-030-21297-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics