Abstract
Privacy regulations for data can be regarded as a major driver for data sovereignty measures. A specific example for this is the case of event data that is recorded by information systems during the processing of entities in domains such as e-commerce or health care. Since such data, typically available in the form of event log files, contains personalized information on the specific processed entities, it can expose sensitive information that may be traced back to individuals. In recent years, a plethora of methods have been developed to analyse event logs under the umbrella of process mining. However, the impact of privacy regulations on the technical design as well as the organizational application of process mining has been largely neglected. This paper set out to develop a protection model for event data privacy which applies the well-established notion of differential privacy. Starting from common assumptions about the event logs used in process mining, this paper presents potential privacy leakages and means to protect against them. The paper also shows at which stages of privacy leakages a protection model for event logs should be used. Relying on this understanding, the notion of differential privacy for process discovery methods is instantiated, i.e., algorithms that aim at the construction of a process model from an event log. The general feasibility of our approach is demonstrated by its application to two publicly available real-life events logs.









Similar content being viewed by others
Notes
Here, \(Range({\mathcal {K}})\) denotes the set of possible outputs of \({\mathcal {K}}\) and \(\Pr\) denotes probability.
In the case of a combination of multiple attributes signalling the activity, we can always create a single attribute by concatenation of the multiple attributes.
The source code of the privacy engine based on PINQ is available as C# application at: https://github.com/fmannhardt/pddp/.
The keys for the partitioning operation need to be user-defined since we do not want to leak information on which keys are present in the unprotected event log.
We chose the Inductive Miner since it is the only process discovery algorithm available in the open-source framework ProM 6.8 that allows to use both directly-follows relations and trace variants as input.
The Petri net models are visualized using the compact Inductive Visual Miner notation as described in Leemans et al. (2014).
References
Accorsi R, Stocker T, Müller G (2013) On the exploitation of process mining for security audits: the process discovery case. In: Shin Sung Y, Maldonado JC (eds) Proceedings of the 28th annual ACM symposium on applied computing, SAC ’13, Coimbra, Portugal, March 18–22. ACM, pp 1462–1468
Adam K, Netz L, Varga S, Michael J, Rumpe B, Heuser P, Letmathe P (2018) Model-based generation of enterprise information systems. In: Fellmann M, Sandkuhl K (eds) Enterprise modeling and information systems architectures (EMISA’18), volume 2097 of CEUR workshop proceedings, pp 75–79. CEUR-WS.org
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00. ACM, New York, NY, pp 439–450
Aldeen YAAS, Salleh M, Razzaque MA (2015) A comprehensive review on privacy preserving data mining. SpringerPlus 4(1):694
Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2016) STREAM: the Stanford data stream management system. In: Garofalakis MN, Gehrke J, Rastogi R (eds) Data stream management: processing high-speed data streams, data-centric systems and applications. Springer, Berlin, pp 317–336
Augusto A, Conforti R, Dumas M, La Rosa M, Maggi FM, Marrella A, Mecella M, Soo A (2017) Automated discovery of process models from event logs: review and benchmark. IEEE Trans Knowl Data Eng (accepted)
Bergeron E (2000) The difference between security and privacy
Bertino E, Lin D, Jiang W (2008) A survey of quantification of privacy preserving data mining algorithms. Springer, Boston, MA, pp 183–205
Bhowmick SS, Gruenwald L, Iwaihara M, Chatvichienchai S (2006) PRIVATE-IYE: a framework for privacy preserving data integration. In: 22nd international conference on data engineering workshops (ICDEW’06), pp 91–91
Blum A, Dwork C, McSherry F, Nissim K (2005) Practical privacy: the SuLQ framework. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 128–138
Bonomi L, Xiong L (2013) A two-phase algorithm for mining sequential patterns with differential privacy. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management-CIKM ’13. ACM Press, New York
Colombo P, Ferrari E (2015) Privacy aware access control for big data: a research roadmap. Big Data Res 2:145–154
D’Acquisto G, Domingo-Ferrer J, Kikiras P, Torra V, de Montjoye Y-A, Bourka A (2015a) Privacy by design in big data: an overview of privacy enhancing technologies in the era of big data analytics. CoRR arXiv:abs/1512.06000
D’Acquisto G, Domingo-Ferrer J, Kikiras P, Torra V, de Montjoye Y-A, Bourka A (2015b) Privacy by design in big data: an overview of privacy enhancing technologies in the era of big data analytics
Dankar FK, El Emam K (2013) Practicing differential privacy in health care: a review. Trans Data Priv 6(1):35–67
de Leoni M, Mannhardt F (2015) Road traffic fine management process. Eindhoven University of Technology, Eindhoven (Dataset)
Dwork C (2008) Differential privacy: a survey of results. In: International conference on theory and applications of models of computation, Springer, Berlin, pp 1–19
Dwork C, Naor M, Pitassi T, Rothblum GN (2010) Differential privacy under continual observation. In: Proceedings of the 42nd ACM symposium on theory of computing-STOC ’10. ACM Press, New York
Dwork C, Roth A et al (2014) The algorithmic foundations of differential privacy. Found Trends® Theor Comput Sci 9(3–4):211–407
Eibl G, Ferner C, Hildebrandt T, Stertz F, Burkhart S, Rinderle-Ma S, Engel D (2017) Exploration of the potential of process mining for intrusion detection in smart metering. In: ICISSP
ElSalamouny E, Gambs S (2016) Differential privacy models for location-based services. Trans Data Priv 9(1):15–48
Fazzinga B, Flesca S, Furfaro F, Pontieri L (2018) Online and offline classification of traces of event logs on the basis of security risks. J Intell Inf Syst 50(1):195–230
Hoepman J-H (2014) Privacy design strategies. In: Cuppens-Boulahia N, Cuppens F, Jajodia S, Kalam AAE, Sans T (eds) ICT systems security and privacy protection. Springer, Berlin, pp 446–459
Hoepman J-H (2018) Making privacy by design concrete. In: European cyber security perspectives 2018. Radboud Repository, pp 26–28
Hsu J, Gaboardi M, Haeberlen A, Khanna S, Narayan A, Pierce BC, Roth A (2014) Differential privacy: an economic method for choosing epsilon. In: Proceedings of the 2014 IEEE 27th computer security foundations symposium, CSF ’14. IEEE Computer Society, Washington, DC, pp 398–410
ISO/IEC 27000 (2018) Information technology-security techniques-information security management systems-overview and vocabulary, fifth edn. Standard, International Organization for Standardization
Kim JJ, Kim JJ, Winkler WE, Winkler WE (2003) Multiplicative noise for masking continuous data. Technical report, Statistical Research Division, US Bureau of the Census, Washington, DC
Leemans SJJ, Fahland D, vander Aalst WMP (2013) Discovering block-structured process models from event logs containing infrequent behaviour. In: BPM 2013 workshops, volume 171 of LNBIP. Springer, pp 66–78
Leemans SJJ, Fahland D, van der Aalst WMP (2014) Process and deviation exploration with inductive visual miner. In: BPM 2014 demos, volume 1295 of CEUR workshop proceedings, p 46. CEUR-WS.org
Leemans SJJ, Fahland D, van der Aalst WMP (2018) Scalable process discovery and conformance checking. Softw Syst Model 17(2):599–631
Macedo R, Paulo J, Pontes R, Portela B, Oliveira T, Matos M, Oliveira R (2017) A practical framework for privacy-preserving NoSQL databases. In: SRDS. IEEE Computer Society, pp 11–20
Mannhardt F (2016) Sepsis cases-event log. Eindhoven University of Technology, Eindhoven (Dataset)
Mannhardt F, Blinde D (2017) Analyzing the trajectories of patients with sepsis using process mining. In: RADAR+EMISA 2017, volume 1859 of CEUR workshop proceedings, pp 72–80. CEUR-WS.org
Mannhardt F, Petersen S, de Oliveira MFD (2018) Privacy challenges for process mining in human-centered industrial environments. In: 14th international conference on intelligent environments (IE). IEEE Xplore, pp 64–71
Mans RS, van der Aalst WMP, Vanwersch RJB, Moleman AJ (2013) Process mining in healthcare: data challenges when answering frequently posed questions. In: Lenz R, Miksch S, Peleg M, Reichert M, Riaño D, ten Teije A (eds) Process support and knowledge representation in health care. Springer, Berlin, pp 140–153
McSherry F (2010) Privacy integrated queries. Commun ACM 53(9):89
McSherry F, Mahajan R (2011) Differentially-private network trace analysis. ACM SIGCOMM Comput Commun Rev 41(4):123–134
Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582
Mettler M (2016) Blockchain technology in healthcare: the revolution starts here. In: 2016 IEEE 18th international conference on e-health networking, applications and services (Healthcom), pp 1–3
Michael J, Steinberger C (2017) Context modeling for active assistance. In: Cabanillas C, España S, Farshidi S (eds) Proceedings of the ER forum 2017 and the ER 2017 demo track co-located with the 36th international conference on conceptual modelling (ER 2017), pp 221–234
Michael J, Koschmider A, Mannhardt F, Baracaldo N, Rumpe B (2019) User-centered and privacy-driven process mining system design for IoT. In: information systems engineering in responsible information systems-CAiSE forum 2019, Rome, Proceedings, pp 194–206
Myers D, Radke K, Suriadi S, Foo E (2017) Process discovery for industrial control system cyber attack detection. In: De Capitani di Vimercati S, Martinelli F (eds) ICT systems security and privacy protection. Springer, Cham, pp 61–75
Peterson ZNJ, Gondree M, Beverly R (2011) A position paper on data sovereignty: the importance of geolocating data in the cloud. In: Proceedings of the 3rd USENIX conference on hot topics in cloud computing, HotCloud’11. USENIX Association, Berkeley, CA, pp 9–9
Rozinat A, van der Aalst WMP (2006) Decision mining in ProM. In: Lecture notes in computer science. Springer, Berlin, pp 420–425
Sacco O, Breslin JG, Decker S (2013) Fine-grained trust assertions for privacy management in the social semantic web. In: 2013 12th IEEE international conference on trust, security and privacy in computing and communications, pp 218–225
Sicari S, Rizzardi A, Grieco LA, Coen-Porisini A (2015) Security, privacy and trust in Internet of Things: the road ahead. Comput Netw 76:146–164
Stocker T, Accorsi R (2014) SecSy: a security-oriented tool for synthesizing process event logs. In: Limonad L, Weber B (eds) Proceedings of the BPM demo sessions 2014 co-located with the 12th international conference on business process management (BPM 2014), Eindhoven, The Netherlands, September 10, 2014, volume 1295 of CEUR workshop proceedings, p 71. CEUR-WS.org
van der Aalst WMP (2016) Process mining: data science in action, 2nd edn. Springer, Berlin
van der Aalst W, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192
van Eck ML, Lu X, Leemans SJJ, van der Aalst WMP (2015) \(\text{PM}^{2}\): a process mining project methodology. In: Advanced information systems engineering. Springer, pp 297–313
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Rec 33(1):50–57
Yu WE (2014) Data privacy and big data-compliance issues and considerations. ISACA J 3:27–31
Yu X, Wen Q (2010) A view about cloud data security from data life cycle. In: 2010 international conference on computational intelligence and software engineering, pp 1–4
Zhang Z, Qin Z, Zhu L, Weng J, Ren K (2017) Cost-friendly differential privacy for smart meters: exploiting the dual roles of the noise. IEEE Trans Smart Grid 8(2):619–626
Zhiqiang G, Longjun Z (2018) Privacy preserving data mining on big data computing platform: trends and future. In: Barolli L, Woungang I, Hussain OK (eds) Advances in intelligent networking and collaborative systems. Springer, Cham, pp 491–502
Author information
Authors and Affiliations
Corresponding author
Additional information
Accepted after two revisions by the editors of the special edition.
Rights and permissions
About this article
Cite this article
Mannhardt, F., Koschmider, A., Baracaldo, N. et al. Privacy-Preserving Process Mining. Bus Inf Syst Eng 61, 595–614 (2019). https://doi.org/10.1007/s12599-019-00613-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12599-019-00613-3