Abstract
Process mining is, by now, a well-established discipline focussing on process-oriented data analysis. As with other forms of data analysis, the quality and reliability of insights derived through analysis is directly related to the quality of the input (garbage in - garbage out). In the case of process mining, the input is an event log comprised of event data captured (in information systems) during the execution of the process. It is crucial then that the event log be treated as a first-class citizen. While data quality is an easily understood concept little effort has been directed towards systematically detecting data quality issues in event logs. Analysts still spend a large proportion of any project in ‘data cleaning’, often involving manual and ad hoc tasks, and requiring more than one tool. While there are existing tools and languages that query event logs, the problem of different approaches for different log imperfections remains. In this paper we take the first steps to developing QUELI (Querying Event Log for Imperfections) a log query language that provides direct support for detecting log imperfections. We develop an approach that identifies capabilities required of QUELI and illustrate the approach by applying it to 5 of the 11 event log imperfection patterns described in [29]. We view this as a first step towards operationalising systematic, automated support for log cleaning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ISO/IEC 25010:2011: Systems and software engineering - Systems and software product Quality Requirements and Evaluation (SQuaRE) - System and software quality models (2011)
van der Aalst, W., et al.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28108-2_19
van der Aalst, W.: Process Mining: Discovery Conformance and Enhancement of Business Processes. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19345-3
Batini, C., Palmonari, M., Viscusi, G.: Opening the closed world: a survey of information quality research in the wild. In: Floridi, L., Illari, P. (eds.) The Philosophy of Information Quality. SL, vol. 358, pp. 43–73. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07121-3_4
Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-33173-5
Beheshti, S.-M.-R., Benatallah, B., Motahari-Nezhad, H.R.: Scalable graph-based OLAP analytics over process execution data. Distrib. Parallel Datab. 34(3), 379–423 (2016)
Beheshti, S.-M.-R., Benatallah, B., Motahari-Nezhad, H.R., Sakr, S.: A query language for analyzing business processes execution. In: Rinderle-Ma, S., Toumani, F., Wolf, K. (eds.) BPM 2011. LNCS, vol. 6896, pp. 281–297. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23059-2_22
Jagadeesh Chandra Bose, R.P., van der Aalst, W.M.P.: Abstractions in process mining: a taxonomy of patterns. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 159–175. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03848-8_12
Jagadeesh Chandra Bose, R.P., Mans, R.S., van der Aalst, W.M.: Wanna improve process mining results? CIDM 2013, 127–134 (2013)
Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31164-2
CrowdFlower: 2017 Data Scientist Report (2017). https://visit.crowdflower.com. Accessed 25 July 2018
Dijkman, R., Gao, J., Grefen, P., ter Hofstede, A.: Relational algebra for in-database process mining. arXiv preprint arXiv:1706.08259 (2017)
Dixit, P.M., et al.: Detection and interactive repair of event ordering imperfection in process logs. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 274–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_17
Durand, J., Cho, H., Moberg, D., Woo, J.: XTemp: event-driven testing and monitoring of business processes. In: Proceedings of Balisage, The Markup Conference 2011, vol. 7. Balisage Series on Markup Technologies (2011)
Günther, C.W., Rozinat, A.: Disco: discover your processes. BPM (Demos) 940, 40–44 (2012)
Laranjeiro, N., Soydemir, S.N., Bernardino, J.: A survey on data quality: classifying poor data. In: PRDC 2015, pp. 179–188. IEEE (2015)
Leemans, M., van der Aalst, W.M.P.: Discovery of frequent episodes in event logs. In: Ceravolo, P., Russo, B., Accorsi, R. (eds.) SIMPDA 2014. LNBIP, vol. 237, pp. 1–31. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27243-6_1
Lohr, S.: For big-data scientists, ‘janitor work’ is key hurdle to insights. New York Times, 17 August 2014
Lu, X., et al.: Semi-supervised log pattern detection and exploration using event concurrence and contextual information. In: Panetto, H., et al. (eds.) OTM On the Move to Meaningful Internet Systems, pp. 154–174. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69462-7_11
Mannhardt, F., de Leoni, M., Reijers, H.A., van der Aalst, W.M.P., Toussaint, P.J.: From low-level events to activities - a pattern-based approach. In: La Rosa, M., Loos, P., Pastor, O. (eds.) BPM 2016. LNCS, vol. 9850, pp. 125–141. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45348-4_8
Mans, R.S., van der Aalst, W.M., Vanwersch, R., Moleman, A.: Process Support and Knowledge Representation in Health Care. LNCS, vol. 7738, pp. 140–153. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36438-9
González López de Murillas, E., Reijers, H.A., van der Aalst, W.M.P.: Everything you always wanted to know about your process, but did not know how to ask. In: Dumas, M., Fantinato, M. (eds.) BPM 2016. LNBIP, vol. 281, pp. 296–309. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58457-7_22
Perez-Alvarez, J.M., Gomez-Lopez, M.T., Parody, L., Gasca, R.M.: Process instance query language to include process performance indicators in DMN. In: EDOCW 2016, pp. 1–8. IEEE (2016)
Prud‘hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C recommendation, January 2008 (2008)
Schönig, S., Rogge-Solti, A., Cabanillas, C., Jablonski, S., Mendling, J.: Efficient and customisable declarative process mining with SQL. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 290–305. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39696-5_18
Shabani, S., et al.: Relational XES: data management for process mining. In: CAiSE 2015. CEUR-WS. org (2015)
Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)
Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Commun. ACM 40(5), 103–110 (1997)
Suriadi, S., Andrews, R., ter Hofstede, A., Wynn, M.: Event log imperfection patterns for process mining: towards a systematic approach to cleaning event logs. Inf. Syst. 64, 132–150 (2017)
Suriadi, S., Wynn, M.T., Ouyang, C., ter Hofstede, A.H.M., van Dijk, N.J.: Understanding process behaviours in a large insurance company in australia: a case study. In: Salinesi, C., Norrie, M.C., Pastor, Ó. (eds.) CAiSE 2013. LNCS, vol. 7908, pp. 449–464. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38709-8_29
Vázquez-Barreiros, B., Mucientes, M., Lama, M.: Mining duplicate tasks from discovered processes. In: ATAED@ Petri Nets/ACSD, pp. 78–82 (2015)
Verhulst, R.: Evaluating quality of event data within event logs: an extensible framework. Ph.D. thesis, Technische Universiteit Eindhoven (2016)
Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996)
Wang, R.Y., Storey, V., Firth, C.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995)
Acknowledgement
The contributions to this paper of Robert Andrews and Chun Ouyang were supported through ARC Discovery Grant DP150103356.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Andrews, R., Suriadi, S., Ouyang, C., Poppe, E. (2018). Towards Event Log Querying for Data Quality. In: Panetto, H., Debruyne, C., Proper, H., Ardagna, C., Roman, D., Meersman, R. (eds) On the Move to Meaningful Internet Systems. OTM 2018 Conferences. OTM 2018. Lecture Notes in Computer Science(), vol 11229. Springer, Cham. https://doi.org/10.1007/978-3-030-02610-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-02610-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02609-7
Online ISBN: 978-3-030-02610-3
eBook Packages: Computer ScienceComputer Science (R0)