Abstract
Timestamp information recorded in event logs plays a crucial role in uncovering meaningful insights into business process performance and behaviour via Process Mining techniques. Inaccurate or incomplete timestamps may cause activities in a business process to be ordered incorrectly, leading to unrepresentative process models and incorrect process performance analysis results. Thus, the quality of timestamps in an event log should be evaluated thoroughly before the log is used as input for any Process Mining activity. To the best of our knowledge, research on the (automated) quality assessment of event logs remains scarce. Our work presents an automated approach for detecting and quantifying timestamp-related issues (timestamp imperfections) in an event log. We define 15 metrics related to timestamp quality across two axes: four levels of abstraction (event, activity, trace, log) and four quality dimensions (accuracy, completeness, consistency, uniqueness). We adopted the design science research paradigm and drew from knowledge related to data quality as well as event log quality. The approach has been implemented as a prototype within the open-source Process Mining framework ProM and evaluated using three real-life event logs and involving experts from practice. This approach paves the way for a systematic and interactive enhancement of timestamp imperfections during the data pre-processing phase of Process Mining projects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
van der Aalst, W.M.P.: Process Mining: Data Science in Action, vol. 2. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4
van der Aalst, W.M.P., Bichler, M., Heinzl, A.: Responsible data science. Bus. Inf. Syst. Eng. 59(5), 311–313 (2017). https://doi.org/10.1007/s12599-017-0487-z
Alkhattabi, M., Neagu, D., Cullen, A.: Assessing information quality of e-learning systems. Comput. Hum. Behav. 27(2), 862–873 (2011). https://doi.org/10.1016/j.chb.2010.11.011
Andrews, R., van Dun, C.G.J., Wynn, M.T., Kratsch, W., Röglinger, M.K.E., ter Hofstede, A.H.M.: Quality-informed semi-automated event log generation for process mining. Decis. Support Syst. 132(3) (2020). https://doi.org/10.1016/j.dss.2020.113265
Askham, N., et al.: The six primary dimensions for data quality assessment (2013)
Awad, A., Zaki, N.M., Di Francescomarino, C.: Analyzing and repairing overlapping work items. Inf. Softw. Technol. 80, 110–123 (2016). https://doi.org/10.1016/j.infsof.2016.08.010
Bose, R.P.J.C., Mans, R.S., van der Aalst, W.M.P.: Wanna improve process mining results? In: CIDM 2013, pp. 127–134. IEEE (2013). https://doi.org/10.1109/CIDM.2013.6597227
Conforti, R., la Rosa, M., ter Hofstede, A.H.M.: Timestamp repair for business process event logs. Technical report, University of Melbourne (2018)
Dixit, P.M., et al.: Detection and interactive repair of event ordering imperfection in process logs. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 274–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_17
Emamjome, F., Andrews, R., ter Hofstede, A.H.M.: A case study lens on process mining in practice. In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R. (eds.) OTM 2019. LNCS, vol. 11877, pp. 127–145. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33246-4_8
Gregor, S., Hevner, A.R.: Positioning and presenting design science research for maximum impact. MIS Q. 337–355 (2013). https://doi.org/10.25300/MISQ/2013/37.2.01
Gschwandtner, T., Gärtner, J., Aigner, W., Miksch, S.: A taxonomy of dirty time-oriented data. In: Quirchmayr, G., Basl, J., You, I., Xu, L., Weippl, E. (eds.) CD-ARES 2012. LNCS, vol. 7465, pp. 58–72. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32498-7_5
van der Aalst, W., et al.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28108-2_19
Johnson, A.E.W., et al.: MIMIC-III, a freely accessible database. Sci. Data 3, 160035 (2016). https://doi.org/10.1038/sdata.2016.35
Kherbouche, M.O., Laga, N., Masse, P.A.: Towards a better assessment of event logs quality. In: IEEE SSCI 2016, pp. 1–8. IEEE (2016). https://doi.org/10.1109/SSCI.2016.7849946
Krippendorff, K.: Reliability in content analysis. Hum. Commun. Res. 30(3), 411–433 (2004). https://doi.org/10.1111/j.1468-2958.2004.tb00738.x
Lee, Y.W., Pipino, L.L., Funk, J.D., Wang, R.Y.: Journey to Data Quality. The MIT Press, Cambridge (2009). https://doi.org/10.7551/mitpress/4037.001.0001
Lee, Y.W., Strong, D.M., Kahn, B.K., Wang, R.Y.: AIMQ: a methodology for information quality assessment. Inf. Manag. 40(2), 133–146 (2002). https://doi.org/10.1016/S0378-7206(02)00043-5
Lu, X., et al.: Semi-supervised log pattern detection and exploration using event concurrence and contextual information. In: Panetto, H., et al. (eds.) OTM 2017. LNCS, vol. 10573, pp. 154–174. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69462-7_11
Martin, N., Swennen, M., Depaire, B., Jans, M., Caris, A., Vanhoof, K.: Retrieving batch organisation of work insights from event logs. Decis. Support Syst. 100, 119–128 (2017). https://doi.org/10.1016/j.dss.2017.02.012
Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research methodology for information systems research. J. Manag. Inf. Syst. 24(3), 45–77 (2007). https://doi.org/10.2753/MIS0742-1222240302
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002). https://doi.org/10.1145/505248.506010
Sattler, K.U.: Data quality dimensions. In: Liu, L., Özsu, T.M. (eds.) Encyclopedia of Database Systems, pp. 612–615. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-39940-9_108
Sonnenberg, C., vom Brocke, J.: Evaluations in the science of the artificial – reconsidering the build-evaluate pattern in design science research. In: Peffers, K., Rothenberger, M., Kuechler, B. (eds.) DESRIST 2012. LNCS, vol. 7286, pp. 381–397. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29863-9_28
Stvilia, B., Gasser, L., Twidale, M.B., Smith, L.C.: A framework for information quality assessment. J. Am. Soc. Inf. Sci. Technol. 58, 1720–1733 (2007). https://doi.org/10.1002/asi.20652
Suriadi, S., Andrews, R., ter Hofstede, A.H.M., Wynn, M.T.: Event log imperfection patterns for process mining. Inf. Syst. 64, 132–150 (2017). https://doi.org/10.1016/j.is.2016.07.011
Tax, N., Lu, X., Sidorova, N., Fahland, D., van der Aalst, W.M.P.: The imprecisions of precision measures in process mining. Inf. Process. Lett. 135, 1–8 (2018). https://doi.org/10.1016/j.ipl.2018.01.013
Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17722-4_5
Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996). https://doi.org/10.1145/240455.240479
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996). https://doi.org/10.1080/07421222.1996.11518099
Webster, J., Watson, R.T.: Analyzing the past to prepare for the future: writing a literature review. MIS Q. 26(2), 13–23 (2002). https://doi.org/10.5555/2017160.2017162
Wynn, M.T., Sadiq, S.: Responsible process mining - a data quality perspective. In: Hildebrandt, T., van Dongen, B.F., Röglinger, M., Mendling, J. (eds.) BPM 2019. LNCS, vol. 11675, pp. 10–15. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26619-6_2
Acknowledgements
We would like to thank Queensland’s Motor Accident Insurance Commission and the Queensland University of Technology for allowing us access to their datasets.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fischer, D.A., Goel, K., Andrews, R., van Dun, C.G.J., Wynn, M.T., Röglinger, M. (2020). Enhancing Event Log Quality: Detecting and Quantifying Timestamp Imperfections. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds) Business Process Management. BPM 2020. Lecture Notes in Computer Science(), vol 12168. Springer, Cham. https://doi.org/10.1007/978-3-030-58666-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-58666-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58665-2
Online ISBN: 978-3-030-58666-9
eBook Packages: Computer ScienceComputer Science (R0)