Skip to main content

A Generic Data Warehouse Architecture for Analyzing Workflow Logs

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9282))

Abstract

This paper proposes an approach to represent and analyze the content of workflow logs in a data warehouse. When analyzing workflow logs one big problem arises: typically, an underlying workflow model consists of loops (frequently interleaving), often implemented by using goto-statements. These structures increase the number of possible execution paths significantly - in theory even indefinitely. In a naive Data Warehouse (DWH) implementation one would represent all possible execution paths by means of a dimension. However, this would lead to a huge or even infinite number of elements in the dimension. In this paper, we present a novel approach for analyzing workflow logs including loops and goto-statements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    PHP PEG has been developed by Hamish Friedlander. Available at: https://github.com/hafriedlander/php-peg.

References

  1. http://www.teradata.com/Teradata-Aster/overview/. Accessed 04 December 2014

  2. http://www.xes-standard.org/. Accessed 04 December 2014

  3. Process mining manifesto. IEEE CIS Task Force on Process Mining. http://www.win.tue.nl/ieeetfpm/doku.php?id=shared:process_mining_manifesto. Accessed 04 December 2014

  4. Sql for pattern matching. https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956. Accessed 04 December 2014

  5. Streaminsight. http://msdn.microsoft.com/en-us/library/ee391416. Accessed 04 December 2014

  6. Andrzejewski, W., BÈ©bel, B.: FOCUS: An Index FOr ContinuoUS subsequence pattern queries. In: Morzy, T., HĂ€rder, T., Wrembel, R. (eds.) ADBIS 2012. LNCS, vol. 7503, pp. 29–42. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. BÈ©bel, B., Morzy, M., Morzy, T., KrĂłlikowski, Z., Wrembel, R.: OLAP-like analysis of time point-based sequential data. In: Castano, S., Vassiliadis, P., Lakshmanan, L.V.S., Lee, M.L. (eds.) ER 2012 Workshops 2012. LNCS, vol. 7518, pp. 153–161. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Bebel, B., Morzy, T., Królikowski, Z., Wrembel, R.: Formal model of time point-based sequential data for OLAP-like analysis. Bull. Pol. Acad. Sci. Tech. Sci. 62(2), 331–340 (2014)

    Google Scholar 

  9. Buchmann, A.P., Koldehofe, B.: Complex event processing. Inf.Technol. 51(5), 241–242 (2009)

    Google Scholar 

  10. Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54(8), 88–98 (2011)

    Article  Google Scholar 

  11. Chawathe, S.S., Krishnamurthy, V., Ramachandran, S., Sarma,S.: Managing RFID data. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2004)

    Google Scholar 

  12. Chui, C.K., Kao, B. Lo, E.Cheung, D.: S-OLAP: an olap system for analyzing sequence data. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2010)

    Google Scholar 

  13. Chui, C.K. Lo, E., Kao, B., Ho, W.-S.: Supporting ranking pattern-based aggregate queries in sequence data cubes. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM) (2009)

    Google Scholar 

  14. Dong, G., Pei, J.: Sequence Data Mining, vol. 33. Springer, New York (2007)

    MATH  Google Scholar 

  15. Eder, J., Olivotto, G.E., Gruber, W.: A data warehouse for workflow logs. In: Han, Y., Tai, S., Wikarski, D. (eds.) EDCIS 2002. LNCS, vol. 2480, pp. 1–15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Ezeife, C., Monwar, M.: Ssm : A frequent sequential data stream patterns miner. In: Proceedings of IEEE Symposium on Computational Intelligence and Data Mining (2007)

    Google Scholar 

  17. Gonzalez, H., Han, J., Li, X.: FlowCube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2006)

    Google Scholar 

  18. Gonzalez, H., Han, J., Li, X., Klabjan, D.: Warehousing and analyzing massive RFID data sets. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 83-93 (2006)

    Google Scholar 

  19. Han, J., Chen, Y., Dong, G., Pei, J., Wah, B.W., Wang, J., Cai, Y.D.: Stream cube: an architecture for multi-dimensional analysis of data streams. Distributed and Parallel Databases 18(2), 173–197 (2005)

    Article  Google Scholar 

  20. Han, J.-W., Pei, J., Yan, X.-F.: From sequential pattern mining to structured pattern mining: a pattern-growth approach. J. Comput. Sci. Technol. 19(3), 257–279 (2004)

    Article  MathSciNet  Google Scholar 

  21. Koncilia, C., Morzy, T., Wrembel, R., Eder, J.: Interval OLAP: analyzing interval data. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 233–244. Springer, Heidelberg (2014)

    Google Scholar 

  22. Liu, M. Rundensteiner, E., Greenfield, K., Gupta, C., Wang, S., Ari, I., Mehta, A.: E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2011)

    Google Scholar 

  23. Liu, M., Rundensteiner, E.A.: Event sequence processing: new models and optimization techniques. In: Proceedings of SIGMOD Ph.D. Workshop on Innovative Database Research (IDAR) (2010)

    Google Scholar 

  24. Lo, E., Kao, B., Ho, W.-S., Lee, S.D., Chui, C.K., Cheung, D.W.: OLAP on sequence data. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2008)

    Google Scholar 

  25. Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 43(1), 1–41 (2010)

    Article  Google Scholar 

  26. Marascu, A., Masseglia, F.: Mining sequential patterns from data streams: a centroid approach. J. Intell. Inf. Syst. 27(3), 291–307 (2006)

    Article  Google Scholar 

  27. Masseglia, F., Teisseire, M., Poncelet, P.: Sequential pattern mining. In: Wang, J. (ed.) Encyclopedia of Data Warehousing and Mining. IGI Global, Hershey (2009)

    Google Scholar 

  28. Melton, J. (ed.).: Working draft database language sql - part 15: Row pattern recognition (sql/rpr). ANSI INCITS DM32.2-2011-00005 (2011)

    Google Scholar 

  29. Mendes, L.F., Ding, B., Han, J.: Stream sequential pattern mining with precise error bounds. In: Proceedings of the IEEE International Conference on Data Mining (ICDM) (2008)

    Google Scholar 

  30. Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches and algorithms. ACM Comput.Surv. 45(2), 19 (2013)

    Article  MATH  Google Scholar 

  31. Pei, J., Han, J., Mortazavi-asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of Internatiional Conference on Data Engineering (ICDE) (2001)

    Google Scholar 

  32. Ramakrishnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K.S., Krishnaprasad, M.: SRQL: Sorted relational query language. In: Proceedings of Internatonal Conference on Scientific and Statistical Database Management (SSDBM) (1998)

    Google Scholar 

  33. Sadri, R., Zaniolo, C., Zarkesh, A., Adibi, J.: Optimization of sequence queries in database systems. In: Procedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database System (PODS) (2001)

    Google Scholar 

  34. Sadri, R., Zaniolo, C., Zarkesh, A.M., Adibi, J.: A sequential pattern query language for supporting instant data mining for e-services. In: Proceedings of International Conference on Very Large Data Bases (VLDB) (2001)

    Google Scholar 

  35. Seshadri, P., Livny, M., Ramakrishnan, R.: Sequence query processing. SIGMOD Record 23(2), 430–441 (1994)

    Article  Google Scholar 

  36. Seshadri, P., Livny, M., Ramakrishnan, R.: SEQ: A model for sequence databases. In: Proceedings of International Conference on Data Engineering (ICDE) (1995)

    Google Scholar 

  37. Seshadri, P., Livny, M., Ramakrishnan, R.: The design and implementation of a sequence database system. In: Proceedings of Interntional Conference on Very Large Data Bases (VLDB) (1996)

    Google Scholar 

  38. Vaisman, A., ZimĂĄnyi, E.: Data Warehouse Systems. Springer, Heidelberg (2014). ISBN 978-3-642-54655-6

    Book  Google Scholar 

  39. van der Aalst, W.M.P.: Process cubes: slicing, dicing, rolling up and drilling down event data for process mining. In: Song, M., Wynn, M.T., Liu, J. (eds.) AP-BPM 2013. LNBIP, vol. 159, pp. 1–22. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  40. van Dongen, B., van der Aalst, W.M.P.: A meta model for process mining data. In: Proceedings of of CAiSE Workshops (2005)

    Google Scholar 

  41. Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  42. Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over streams. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2006)

    Google Scholar 

  43. Zheng, Q., Xu, K., Ma, S.: When to update the sequential patterns of stream data? In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 545–550. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Koncilia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Koncilia, C., Pichler, H., Wrembel, R. (2015). A Generic Data Warehouse Architecture for Analyzing Workflow Logs. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23135-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23134-1

  • Online ISBN: 978-3-319-23135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics