Skip to main content

A Process Mining Approach for Discovering ETL Black Points

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 570))

Abstract

ETL tasks are quite complex often leading to a very complex network of working processes. Many difficulties of their development come from the number of sources of information we need to work, the heterogeneity and dispersion of data, and from the complexity of the tasks to implement, in order to populate appropriately a data warehouse. Thus, it is not difficult to occur some undesirable situations related to ETL system design errors or to the implementation of faulty or inefficient tasks. Many of these situations are only detectable at run time. In this paper, we discuss in particular the case of ETL bottleneck situations - ETL black points -, which can occur during the execution of an ETL system, identifying them and characterizing them using process mining. Based on the process mining results analysis, it is possible to develop alternative implementations for inefficient tasks and improve the overall system performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Van der Aalst, W.: Extracting event data from databases to unleash process mining. BPM Driv. Innov. Digit. World SE 8, 105–128 (2015)

    Google Scholar 

  2. Van der Aalst, W., Reijers, H., Song, M.: Discovering social networks from event logs. Comput. Support. Coop. Work 14(6), 549–593 (2005)

    Article  Google Scholar 

  3. Bose, R., Mans, R., Van Der Aalst, W.: Wanna improve process mining results? In: Proceedings of the 2013 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013, pp. 127–134 (2013)

    Google Scholar 

  4. Disco, “Discover your processes”. https://fluxicon.com/disco/. Accessed 15 August 2016

  5. Hompes, B., Buijs, J., Van der Aalst, W., Dixit, P., Buurman, J.: Discovering deviating cases and process variants using trace clustering. In: Proceedings of the 27th Benelux Conference on Artificial Intelligence (BNAIC), Hasselt, Belgium, vol. 11, 5–6 November 2015

    Google Scholar 

  6. Ingvaldsen, J.E., Gulla, J.A.: Preprocessing support for large scale process mining of SAP transactions. In: Hofstede, A., Benatallah, B., Paik, H.-Y. (eds.) BPM 2007. LNCS, vol. 4928, pp. 30–41. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78238-4_5

    Chapter  Google Scholar 

  7. Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning Conforming, and Delivering Data, 1st edn. Wiley, Hoboken (2004)

    Google Scholar 

  8. Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., Becker, B.: The Data Warehouse Lifecycle Toolkit, 2nd edn. Wiley, Indianapolis (2008)

    Google Scholar 

  9. Mans, R.S., Schonenberg, M.H., Song, M., Aalst, W.M.P., Bakker, P.J.M.: Application of process mining in healthcare – a case study in a Dutch Hospital. In: Fred, A., Filipe, J., Gamboa, H. (eds.) BIOSTEC 2008. CCIS, vol. 25, pp. 425–438. Springer, Heidelberg (2008). doi:10.1007/978-3-540-92219-3_32

    Chapter  Google Scholar 

  10. Oliveira, B., Belo, O.: Pattern-based ETL conceptual modelling. In: Third International Conference on Model & Data Engineering (MEDI 2013), Amantea, Calabria, Italy, 25–27 September 2013

    Google Scholar 

  11. Pentaho, “Pentaho Data Integration”: http://www.pentaho.com/product/data-integration. Accessed 14 Aug 2016

  12. Rozinat, A., Zickler, S., Veloso, M., Van Der Aalst, W:, McMillen, C.: Analyzing multi-agent activity logs using process mining techniques. In: Distributed Autonomous Robotic Systems, vol. 8, pp. 251–260 (2009)

    Google Scholar 

  13. Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00328-8_11

    Chapter  Google Scholar 

  14. Uskenbayeva, R., Kurmangaliyeva, B., Shynybekov, D., Temirbolatova, T.: Application process mining techniques to optimize the business process models based on information systems issuing licenses and permits e-license: Practical research. In: 2015 54th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), pp. 298–300 (2015)

    Google Scholar 

  15. Xavier, C., Moreira, F.: Agile ETL. Procedia Technol. 9, 381–387 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This work has been supported by COMPETE: POCI-01-0145-FEDER-007043 and FCT - Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Orlando Belo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Belo, O., Dias, N., Ferreira, C., Pinto, F. (2017). A Process Mining Approach for Discovering ETL Black Points. In: Rocha, Á., Correia, A., Adeli, H., Reis, L., Costanzo, S. (eds) Recent Advances in Information Systems and Technologies. WorldCIST 2017. Advances in Intelligent Systems and Computing, vol 570. Springer, Cham. https://doi.org/10.1007/978-3-319-56538-5_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56538-5_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56537-8

  • Online ISBN: 978-3-319-56538-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics