Skip to main content

A Generic Data Warehouse Architecture for Analyzing Workflow Logs

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9282))

  • 1024 Accesses

Abstract

This paper proposes an approach to represent and analyze the content of workflow logs in a data warehouse. When analyzing workflow logs one big problem arises: typically, an underlying workflow model consists of loops (frequently interleaving), often implemented by using goto-statements. These structures increase the number of possible execution paths significantly - in theory even indefinitely. In a naive Data Warehouse (DWH) implementation one would represent all possible execution paths by means of a dimension. However, this would lead to a huge or even infinite number of elements in the dimension. In this paper, we present a novel approach for analyzing workflow logs including loops and goto-statements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    PHP PEG has been developed by Hamish Friedlander. Available at: https://github.com/hafriedlander/php-peg.

References

  1. http://www.teradata.com/Teradata-Aster/overview/. Accessed 04 December 2014

  2. http://www.xes-standard.org/. Accessed 04 December 2014

  3. Process mining manifesto. IEEE CIS Task Force on Process Mining. http://www.win.tue.nl/ieeetfpm/doku.php?id=shared:process_mining_manifesto. Accessed 04 December 2014

  4. Sql for pattern matching. https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956. Accessed 04 December 2014

  5. Streaminsight. http://msdn.microsoft.com/en-us/library/ee391416. Accessed 04 December 2014

  6. Andrzejewski, W., BÈ©bel, B.: FOCUS: An Index FOr ContinuoUS subsequence pattern queries. In: Morzy, T., HĂ€rder, T., Wrembel, R. (eds.) ADBIS 2012. LNCS, vol. 7503, pp. 29–42. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. BÈ©bel, B., Morzy, M., Morzy, T., KrĂłlikowski, Z., Wrembel, R.: OLAP-like analysis of time point-based sequential data. In: Castano, S., Vassiliadis, P., Lakshmanan, L.V.S., Lee, M.L. (eds.) ER 2012 Workshops 2012. LNCS, vol. 7518, pp. 153–161. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Bebel, B., Morzy, T., Królikowski, Z., Wrembel, R.: Formal model of time point-based sequential data for OLAP-like analysis. Bull. Pol. Acad. Sci. Tech. Sci. 62(2), 331–340 (2014)

    Google Scholar 

  9. Buchmann, A.P., Koldehofe, B.: Complex event processing. Inf.Technol. 51(5), 241–242 (2009)

    Google Scholar 

  10. Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54(8), 88–98 (2011)

    Article  Google Scholar 

  11. Chawathe, S.S., Krishnamurthy, V., Ramachandran, S., Sarma,S.: Managing RFID data. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2004)

    Google Scholar 

  12. Chui, C.K., Kao, B. Lo, E.Cheung, D.: S-OLAP: an olap system for analyzing sequence data. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2010)

    Google Scholar 

  13. Chui, C.K. Lo, E., Kao, B., Ho, W.-S.: Supporting ranking pattern-based aggregate queries in sequence data cubes. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM) (2009)

    Google Scholar 

  14. Dong, G., Pei, J.: Sequence Data Mining, vol. 33. Springer, New York (2007)

    MATH  Google Scholar 

  15. Eder, J., Olivotto, G.E., Gruber, W.: A data warehouse for workflow logs. In: Han, Y., Tai, S., Wikarski, D. (eds.) EDCIS 2002. LNCS, vol. 2480, pp. 1–15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Ezeife, C., Monwar, M.: Ssm : A frequent sequential data stream patterns miner. In: Proceedings of IEEE Symposium on Computational Intelligence and Data Mining (2007)

    Google Scholar 

  17. Gonzalez, H., Han, J., Li, X.: FlowCube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2006)

    Google Scholar 

  18. Gonzalez, H., Han, J., Li, X., Klabjan, D.: Warehousing and analyzing massive RFID data sets. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 83-93 (2006)

    Google Scholar 

  19. Han, J., Chen, Y., Dong, G., Pei, J., Wah, B.W., Wang, J., Cai, Y.D.: Stream cube: an architecture for multi-dimensional analysis of data streams. Distributed and Parallel Databases 18(2), 173–197 (2005)

    Article  Google Scholar 

  20. Han, J.-W., Pei, J., Yan, X.-F.: From sequential pattern mining to structured pattern mining: a pattern-growth approach. J. Comput. Sci. Technol. 19(3), 257–279 (2004)

    Article  MathSciNet  Google Scholar 

  21. Koncilia, C., Morzy, T., Wrembel, R., Eder, J.: Interval OLAP: analyzing interval data. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 233–244. Springer, Heidelberg (2014)

    Google Scholar 

  22. Liu, M. Rundensteiner, E., Greenfield, K., Gupta, C., Wang, S., Ari, I., Mehta, A.: E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2011)

    Google Scholar 

  23. Liu, M., Rundensteiner, E.A.: Event sequence processing: new models and optimization techniques. In: Proceedings of SIGMOD Ph.D. Workshop on Innovative Database Research (IDAR) (2010)

    Google Scholar 

  24. Lo, E., Kao, B., Ho, W.-S., Lee, S.D., Chui, C.K., Cheung, D.W.: OLAP on sequence data. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2008)

    Google Scholar 

  25. Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 43(1), 1–41 (2010)

    Article  Google Scholar 

  26. Marascu, A., Masseglia, F.: Mining sequential patterns from data streams: a centroid approach. J. Intell. Inf. Syst. 27(3), 291–307 (2006)

    Article  Google Scholar 

  27. Masseglia, F., Teisseire, M., Poncelet, P.: Sequential pattern mining. In: Wang, J. (ed.) Encyclopedia of Data Warehousing and Mining. IGI Global, Hershey (2009)

    Google Scholar 

  28. Melton, J. (ed.).: Working draft database language sql - part 15: Row pattern recognition (sql/rpr). ANSI INCITS DM32.2-2011-00005 (2011)

    Google Scholar 

  29. Mendes, L.F., Ding, B., Han, J.: Stream sequential pattern mining with precise error bounds. In: Proceedings of the IEEE International Conference on Data Mining (ICDM) (2008)

    Google Scholar 

  30. Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches and algorithms. ACM Comput.Surv. 45(2), 19 (2013)

    Article  MATH  Google Scholar 

  31. Pei, J., Han, J., Mortazavi-asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of Internatiional Conference on Data Engineering (ICDE) (2001)

    Google Scholar 

  32. Ramakrishnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K.S., Krishnaprasad, M.: SRQL: Sorted relational query language. In: Proceedings of Internatonal Conference on Scientific and Statistical Database Management (SSDBM) (1998)

    Google Scholar 

  33. Sadri, R., Zaniolo, C., Zarkesh, A., Adibi, J.: Optimization of sequence queries in database systems. In: Procedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database System (PODS) (2001)

    Google Scholar 

  34. Sadri, R., Zaniolo, C., Zarkesh, A.M., Adibi, J.: A sequential pattern query language for supporting instant data mining for e-services. In: Proceedings of International Conference on Very Large Data Bases (VLDB) (2001)

    Google Scholar 

  35. Seshadri, P., Livny, M., Ramakrishnan, R.: Sequence query processing. SIGMOD Record 23(2), 430–441 (1994)

    Article  Google Scholar 

  36. Seshadri, P., Livny, M., Ramakrishnan, R.: SEQ: A model for sequence databases. In: Proceedings of International Conference on Data Engineering (ICDE) (1995)

    Google Scholar 

  37. Seshadri, P., Livny, M., Ramakrishnan, R.: The design and implementation of a sequence database system. In: Proceedings of Interntional Conference on Very Large Data Bases (VLDB) (1996)

    Google Scholar 

  38. Vaisman, A., ZimĂĄnyi, E.: Data Warehouse Systems. Springer, Heidelberg (2014). ISBN 978-3-642-54655-6

    Book  Google Scholar 

  39. van der Aalst, W.M.P.: Process cubes: slicing, dicing, rolling up and drilling down event data for process mining. In: Song, M., Wynn, M.T., Liu, J. (eds.) AP-BPM 2013. LNBIP, vol. 159, pp. 1–22. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  40. van Dongen, B., van der Aalst, W.M.P.: A meta model for process mining data. In: Proceedings of of CAiSE Workshops (2005)

    Google Scholar 

  41. Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  42. Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over streams. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2006)

    Google Scholar 

  43. Zheng, Q., Xu, K., Ma, S.: When to update the sequential patterns of stream data? In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 545–550. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Koncilia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Koncilia, C., Pichler, H., Wrembel, R. (2015). A Generic Data Warehouse Architecture for Analyzing Workflow Logs. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23135-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23134-1

  • Online ISBN: 978-3-319-23135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics