Abstract
ETL tasks are quite complex often leading to a very complex network of working processes. Many difficulties of their development come from the number of sources of information we need to work, the heterogeneity and dispersion of data, and from the complexity of the tasks to implement, in order to populate appropriately a data warehouse. Thus, it is not difficult to occur some undesirable situations related to ETL system design errors or to the implementation of faulty or inefficient tasks. Many of these situations are only detectable at run time. In this paper, we discuss in particular the case of ETL bottleneck situations - ETL black points -, which can occur during the execution of an ETL system, identifying them and characterizing them using process mining. Based on the process mining results analysis, it is possible to develop alternative implementations for inefficient tasks and improve the overall system performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Van der Aalst, W.: Extracting event data from databases to unleash process mining. BPM Driv. Innov. Digit. World SE 8, 105–128 (2015)
Van der Aalst, W., Reijers, H., Song, M.: Discovering social networks from event logs. Comput. Support. Coop. Work 14(6), 549–593 (2005)
Bose, R., Mans, R., Van Der Aalst, W.: Wanna improve process mining results? In: Proceedings of the 2013 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013, pp. 127–134 (2013)
Disco, “Discover your processes”. https://fluxicon.com/disco/. Accessed 15 August 2016
Hompes, B., Buijs, J., Van der Aalst, W., Dixit, P., Buurman, J.: Discovering deviating cases and process variants using trace clustering. In: Proceedings of the 27th Benelux Conference on Artificial Intelligence (BNAIC), Hasselt, Belgium, vol. 11, 5–6 November 2015
Ingvaldsen, J.E., Gulla, J.A.: Preprocessing support for large scale process mining of SAP transactions. In: Hofstede, A., Benatallah, B., Paik, H.-Y. (eds.) BPM 2007. LNCS, vol. 4928, pp. 30–41. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78238-4_5
Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning Conforming, and Delivering Data, 1st edn. Wiley, Hoboken (2004)
Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., Becker, B.: The Data Warehouse Lifecycle Toolkit, 2nd edn. Wiley, Indianapolis (2008)
Mans, R.S., Schonenberg, M.H., Song, M., Aalst, W.M.P., Bakker, P.J.M.: Application of process mining in healthcare – a case study in a Dutch Hospital. In: Fred, A., Filipe, J., Gamboa, H. (eds.) BIOSTEC 2008. CCIS, vol. 25, pp. 425–438. Springer, Heidelberg (2008). doi:10.1007/978-3-540-92219-3_32
Oliveira, B., Belo, O.: Pattern-based ETL conceptual modelling. In: Third International Conference on Model & Data Engineering (MEDI 2013), Amantea, Calabria, Italy, 25–27 September 2013
Pentaho, “Pentaho Data Integration”: http://www.pentaho.com/product/data-integration. Accessed 14 Aug 2016
Rozinat, A., Zickler, S., Veloso, M., Van Der Aalst, W:, McMillen, C.: Analyzing multi-agent activity logs using process mining techniques. In: Distributed Autonomous Robotic Systems, vol. 8, pp. 251–260 (2009)
Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00328-8_11
Uskenbayeva, R., Kurmangaliyeva, B., Shynybekov, D., Temirbolatova, T.: Application process mining techniques to optimize the business process models based on information systems issuing licenses and permits e-license: Practical research. In: 2015 54th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), pp. 298–300 (2015)
Xavier, C., Moreira, F.: Agile ETL. Procedia Technol. 9, 381–387 (2013)
Acknowledgments
This work has been supported by COMPETE: POCI-01-0145-FEDER-007043 and FCT - Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Belo, O., Dias, N., Ferreira, C., Pinto, F. (2017). A Process Mining Approach for Discovering ETL Black Points. In: Rocha, Á., Correia, A., Adeli, H., Reis, L., Costanzo, S. (eds) Recent Advances in Information Systems and Technologies. WorldCIST 2017. Advances in Intelligent Systems and Computing, vol 570. Springer, Cham. https://doi.org/10.1007/978-3-319-56538-5_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-56538-5_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56537-8
Online ISBN: 978-3-319-56538-5
eBook Packages: EngineeringEngineering (R0)