Abstract
The existing capacity to collect, store, process and analyze huge amounts of data that is rapidly generated, i.e., Big Data, is characterized by fast technological developments and by a limited set of conceptual advances that guide researchers and practitioners in the implementation of Big Data systems. New data stores or processing tools frequently appear, proposing new (and usually more efficient) ways to store and query data (like SQL-on-Hadoop). Although very relevant, the lack of common methodological guidelines or practices has motivated the implementation of Big Data systems based on use-case driven approaches. This is also the case for one of the most valuable organizational data assets, the Data Warehouse, which needs to be rethought in the way it is designed, modeled, implemented, managed and monitored. This paper addresses some of the research challenges in Big Data Warehousing systems, proposing a vision that looks into: (i) the integration of new business processes and data sources; (ii) the proper way to achieve this integration; (iii) the management of these complex data systems and the enhancement of their performance; (iv) the automation of some of their analytical capabilities with Complex Event Processing and Machine Learning; and, (v) the flexible and highly customizable visualization of their data, providing an advanced decision-making support environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)
Dumbill, E.: Making sense of big data. Big Data 1, 1–2 (2013)
Gandomi, A., Haider, M.: Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 35, 137–144 (2015)
Philip Chen, C.L., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Costa, C., Santos, M.Y.: Big data: state-of-the-art concepts, techniques, technologies, modeling approaches and research challenges. IAENG Int. J. Comput. Sci. 44, 285–301 (2017)
NBD-PWG: NIST Big Data Interoperability Framework (2015)
Krishnan, K.: Data Warehousing in the Age of Big Data. Elsevier, Burlington (2013)
Costa, C., Santos, M.Y.: Evaluating several design patterns and trends in big data warehousing systems. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 459–473. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_28
Santos, M.Y., et al.: A Big Data system supporting Bosch Braga Industry 4.0 strategy. Int. J. Inf. Manag. 37, 750–760 (2017)
Costa, C., Andrade, C., Santos, M.Y.: Big data warehouses for smart industries. In: Sakr, S., Zomaya, A. (eds.) Encyclopedia of Big Data Technologies, pp. 1–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63962-8_204-1
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The definitive Guide to Dimensional Modeling. Wiley, Indianapolis (2013)
Clegg, D.: Evolving data warehouse and BI architectures: the big data challenge. TDWI Bus. Intell. J. 20, 19–24 (2015)
Russom, P.: Data Warehouse Modernization in the Age of Big Data Analytics (2016)
Russom, P.: Evolving Data Warehouse Architectures in the Age of Big Data (2014)
Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Document-oriented models for data warehouses - NoSQL document-oriented for data warehouses. In: Proceedings of the 18th International Conference on Enterprise Information Systems, Rome, Italy, pp. 142–149 (2016). https://doi.org/10.5220/0005830801420149
Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementing multidimensional data warehouses into NoSQL. In: 17th International Conference on Enterprise Information Systems (ICEIS), Barcelona, Spain (2015)
Gröger, C., Schwarz, H., Mitschang, B.: The deep data warehouse: link-based integration and enrichment of warehouse data and unstructured content. In: IEEE 18th International Enterprise Distributed Object Computing Conference (EDOC), pp. 210–217 (2014)
Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Record. 39, 12 (2011)
Thusoo, A., et al.: Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010)
Pandis, I.: Impala: a modern, open-source SQL engine for hadoop. In: 7th Biennial Conference on Innovative Data Systems Research (CIDR), p. 10 (2015)
Huai, Y., et al.: Major technical advancements in apache hive. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD 2014, pp. 1235–1246. ACM Press, Snowbird (2014). https://doi.org/10.1145/2588555.2595630
Li, X., Mao, Y.: Real-Time data ETL framework for big real-time data analysis. In: 2015 IEEE International Conference on Information and Automation, pp. 1289–1294. IEEE, Lijiang (2015). https://doi.org/10.1109/ICInfA.2015.7279485
Song, J., Guo, C., Wang, Z., Zhang, Y., Yu, G., Pierson, J.-M.: HaoLap: a hadoop based OLAP system for big data. J. Syst. Softw. 102, 167–181 (2015)
Wang, H., et al.: Efficient query processing framework for big data warehouse: an almost join-free approach. Front. Comput. Sci. 9, 224–236 (2015)
Tria, F.D., Lefons, E., Tangorra, F.: A framework for evaluating design methodologies for big data warehouses: measurement of the design process. Int. J. Data Warehouse. Min. 14(1), 15–39 (2018)
Costa, C., Santos, M.Y.: The SusCity big data warehousing approach for smart cities. In: Proceedings of International Database Engineering & Applications Symposium. Bristol, United Kingdom (2017). https://doi.org/10.1145/3105831.3105841
Costa, E., Costa, C., Santos, M.Y.: Efficient big data modelling and organization for hadoop hive-based data warehouses. In: Themistocleous, M., Morabito, V. (eds.) EMCIS 2017. LNBIP, vol. 299, pp. 3–16. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65930-5_1
Rodrigues, M., Santos, M.Y., Bernardino, J.: Big data processing tools: an experimental performance evaluation. WIREs Data Min. Knowl. Discov. 9(2), e1297 (2019)
Santos, M.Y., et al.: Evaluating SQL-on-hadoop for big data warehousing on not-so-good hardware. In: Proceedings of International Database Engineering & Applications Symposium (IDEAS 2017), pp. 242–252. ACM Press (2017). https://doi.org/10.1145/3105831.3105842
León Palacio, A., Pastor López, Ó.: Smart data for genomic information systems: the SILE method. Complex Syst. Inf. Model. Q. 1–23 (2018). https://doi.org/10.7250/csimq.2018-17.01
Palacio, A.L., López, Ó.P., Ródenas, J.C.C.: A method to identify relevant genome data: conceptual modeling for the medicine of precision. In: Trujillo, J.C., Davis, K.C., Du, X., Li, Z., Ling, T.W., Li, G., Lee, M.L. (eds.) ER 2018. LNCS, vol. 11157, pp. 597–609. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_44
Hui, J., Li, L., Zhang, Z.: Integration of big data: a survey. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds.) ICPCSEE 2018. CCIS, vol. 901, pp. 101–121. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2203-7_9
Maccioni, A., Torlone, R.: KAYAK: a framework for just-in-time data preparation in a data lake. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 474–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_29
Flouris, I., Giatrakos, N., Deligiannakis, A., Garofalakis, M., Kamp, M., Mock, M.: Issues in complex event processing: status and prospects in the Big Data era. J. Syst. Softw. 127, 217–236 (2017). https://doi.org/10.1016/j.jss.2016.06.011
Zhang, P., Shi, X., Khan, S.U.: QuantCloud: enabling big data complex event processing for quantitative finance through a data-driven execution. IEEE Trans. Big Data (2018). https://doi.org/10.1109/TBDATA.2018.2847629
Hadar, E.: BIDCEP: a vision of big data complex event processing for near real-time data streaming: position paper, a practitioner view. In: CAiSE 2016 Industry Track, CEUR Workshop Proceedings (2016)
Flouris, I., et al.: FERARI: a prototype for complex event processing over streaming multi-cloud platforms. In: Proceedings of the 2016 International Conference on Management of Data - SIGMOD 2016, pp. 2093–2096. ACM Press, San Francisco (2016). https://doi.org/10.1145/2882903.2899395
Bikakis, N.: Big data visualization tools. In: Sakr, S., Zomaya, A. (eds.) Encyclopedia of Big Data Technologies. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63962-8_109-1
Iñiguez-Jarrín, C., Panach, J.I., Pastor López, O.: Defining interaction design patterns to extract knowledge from big data. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 490–504. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_30
Acknowledgements
This work has been supported by FCT – Fundação para a Ciência e Tecnologia, Projects Scope UID/CEC/00319/2019 and PDE/00040/2013, and the Doctoral scholarships PD/BDE/135100/2017 and PD/BDE/135101/2017. We also thank both the Spanish State Research Agency and the Generalitat Valenciana under the projects DataME TIN2016-80811-P, ACIF/2018/171, and PROMETEO/2018/176. This paper uses icons made by Freepik, from www.flaticon.com.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Santos, M.Y., Costa, C., Galvão, J., Andrade, C., Pastor, O., Marcén, A.C. (2019). Enhancing Big Data Warehousing for Efficient, Integrated and Advanced Analytics. In: Cappiello, C., Ruiz, M. (eds) Information Systems Engineering in Responsible Information Systems. CAiSE 2019. Lecture Notes in Business Information Processing, vol 350. Springer, Cham. https://doi.org/10.1007/978-3-030-21297-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-21297-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21296-4
Online ISBN: 978-3-030-21297-1
eBook Packages: Computer ScienceComputer Science (R0)