Skip to main content
Log in

Privacy-Preserving Process Mining

Differential Privacy for Event Logs

  • Research Paper
  • Published:
Business & Information Systems Engineering Aims and scope Submit manuscript

Abstract

Privacy regulations for data can be regarded as a major driver for data sovereignty measures. A specific example for this is the case of event data that is recorded by information systems during the processing of entities in domains such as e-commerce or health care. Since such data, typically available in the form of event log files, contains personalized information on the specific processed entities, it can expose sensitive information that may be traced back to individuals. In recent years, a plethora of methods have been developed to analyse event logs under the umbrella of process mining. However, the impact of privacy regulations on the technical design as well as the organizational application of process mining has been largely neglected. This paper set out to develop a protection model for event data privacy which applies the well-established notion of differential privacy. Starting from common assumptions about the event logs used in process mining, this paper presents potential privacy leakages and means to protect against them. The paper also shows at which stages of privacy leakages a protection model for event logs should be used. Relying on this understanding, the notion of differential privacy for process discovery methods is instantiated, i.e., algorithms that aim at the construction of a process model from an event log. The general feasibility of our approach is demonstrated by its application to two publicly available real-life events logs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Here, \(Range({\mathcal {K}})\) denotes the set of possible outputs of \({\mathcal {K}}\) and \(\Pr\) denotes probability.

  2. In the case of a combination of multiple attributes signalling the activity, we can always create a single attribute by concatenation of the multiple attributes.

  3. The source code of the privacy engine based on PINQ is available as C# application at: https://github.com/fmannhardt/pddp/.

  4. The keys for the partitioning operation need to be user-defined since we do not want to leak information on which keys are present in the unprotected event log.

  5. We chose the Inductive Miner since it is the only process discovery algorithm available in the open-source framework ProM 6.8 that allows to use both directly-follows relations and trace variants as input.

  6. https://data.4tu.nl/repository/collection:event_logs_real.

  7. The Petri net models are visualized using the compact Inductive Visual Miner notation as described in Leemans et al. (2014).

References

  • Accorsi R, Stocker T, Müller G (2013) On the exploitation of process mining for security audits: the process discovery case. In: Shin Sung Y, Maldonado JC (eds) Proceedings of the 28th annual ACM symposium on applied computing, SAC ’13, Coimbra, Portugal, March 18–22. ACM, pp 1462–1468

  • Adam K, Netz L, Varga S, Michael J, Rumpe B, Heuser P, Letmathe P (2018) Model-based generation of enterprise information systems. In: Fellmann M, Sandkuhl K (eds) Enterprise modeling and information systems architectures (EMISA’18), volume 2097 of CEUR workshop proceedings, pp 75–79. CEUR-WS.org

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00. ACM, New York, NY, pp 439–450

  • Aldeen YAAS, Salleh M, Razzaque MA (2015) A comprehensive review on privacy preserving data mining. SpringerPlus 4(1):694

    Article  Google Scholar 

  • Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2016) STREAM: the Stanford data stream management system. In: Garofalakis MN, Gehrke J, Rastogi R (eds) Data stream management: processing high-speed data streams, data-centric systems and applications. Springer, Berlin, pp 317–336

    Chapter  Google Scholar 

  • Augusto A, Conforti R, Dumas M, La Rosa M, Maggi FM, Marrella A, Mecella M, Soo A (2017) Automated discovery of process models from event logs: review and benchmark. IEEE Trans Knowl Data Eng (accepted)

  • Bergeron E (2000) The difference between security and privacy

  • Bertino E, Lin D, Jiang W (2008) A survey of quantification of privacy preserving data mining algorithms. Springer, Boston, MA, pp 183–205

    Book  Google Scholar 

  • Bhowmick SS, Gruenwald L, Iwaihara M, Chatvichienchai S (2006) PRIVATE-IYE: a framework for privacy preserving data integration. In: 22nd international conference on data engineering workshops (ICDEW’06), pp 91–91

  • Blum A, Dwork C, McSherry F, Nissim K (2005) Practical privacy: the SuLQ framework. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 128–138

  • Bonomi L, Xiong L (2013) A two-phase algorithm for mining sequential patterns with differential privacy. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management-CIKM ’13. ACM Press, New York

  • Colombo P, Ferrari E (2015) Privacy aware access control for big data: a research roadmap. Big Data Res 2:145–154

    Article  Google Scholar 

  • D’Acquisto G, Domingo-Ferrer J, Kikiras P, Torra V, de Montjoye Y-A, Bourka A (2015a) Privacy by design in big data: an overview of privacy enhancing technologies in the era of big data analytics. CoRR arXiv:abs/1512.06000

  • D’Acquisto G, Domingo-Ferrer J, Kikiras P, Torra V, de Montjoye Y-A, Bourka A (2015b) Privacy by design in big data: an overview of privacy enhancing technologies in the era of big data analytics

  • Dankar FK, El Emam K (2013) Practicing differential privacy in health care: a review. Trans Data Priv 6(1):35–67

    Google Scholar 

  • de Leoni M, Mannhardt F (2015) Road traffic fine management process. Eindhoven University of Technology, Eindhoven (Dataset)

    Google Scholar 

  • Dwork C (2008) Differential privacy: a survey of results. In: International conference on theory and applications of models of computation, Springer, Berlin, pp 1–19

  • Dwork C, Naor M, Pitassi T, Rothblum GN (2010) Differential privacy under continual observation. In: Proceedings of the 42nd ACM symposium on theory of computing-STOC ’10. ACM Press, New York

  • Dwork C, Roth A et al (2014) The algorithmic foundations of differential privacy. Found Trends® Theor Comput Sci 9(3–4):211–407

    Google Scholar 

  • Eibl G, Ferner C, Hildebrandt T, Stertz F, Burkhart S, Rinderle-Ma S, Engel D (2017) Exploration of the potential of process mining for intrusion detection in smart metering. In: ICISSP

  • ElSalamouny E, Gambs S (2016) Differential privacy models for location-based services. Trans Data Priv 9(1):15–48

    Google Scholar 

  • Fazzinga B, Flesca S, Furfaro F, Pontieri L (2018) Online and offline classification of traces of event logs on the basis of security risks. J Intell Inf Syst 50(1):195–230

    Article  Google Scholar 

  • Hoepman J-H (2014) Privacy design strategies. In: Cuppens-Boulahia N, Cuppens F, Jajodia S, Kalam AAE, Sans T (eds) ICT systems security and privacy protection. Springer, Berlin, pp 446–459

    Chapter  Google Scholar 

  • Hoepman J-H (2018) Making privacy by design concrete. In: European cyber security perspectives 2018. Radboud Repository, pp 26–28

  • Hsu J, Gaboardi M, Haeberlen A, Khanna S, Narayan A, Pierce BC, Roth A (2014) Differential privacy: an economic method for choosing epsilon. In: Proceedings of the 2014 IEEE 27th computer security foundations symposium, CSF ’14. IEEE Computer Society, Washington, DC, pp 398–410

  • ISO/IEC 27000 (2018) Information technology-security techniques-information security management systems-overview and vocabulary, fifth edn. Standard, International Organization for Standardization

  • Kim JJ, Kim JJ, Winkler WE, Winkler WE (2003) Multiplicative noise for masking continuous data. Technical report, Statistical Research Division, US Bureau of the Census, Washington, DC

  • Leemans SJJ, Fahland D, vander Aalst WMP (2013) Discovering block-structured process models from event logs containing infrequent behaviour. In: BPM 2013 workshops, volume 171 of LNBIP. Springer, pp 66–78

  • Leemans SJJ, Fahland D, van der Aalst WMP (2014) Process and deviation exploration with inductive visual miner. In: BPM 2014 demos, volume 1295 of CEUR workshop proceedings, p 46. CEUR-WS.org

  • Leemans SJJ, Fahland D, van der Aalst WMP (2018) Scalable process discovery and conformance checking. Softw Syst Model 17(2):599–631

    Article  Google Scholar 

  • Macedo R, Paulo J, Pontes R, Portela B, Oliveira T, Matos M, Oliveira R (2017) A practical framework for privacy-preserving NoSQL databases. In: SRDS. IEEE Computer Society, pp 11–20

  • Mannhardt F (2016) Sepsis cases-event log. Eindhoven University of Technology, Eindhoven (Dataset)

    Google Scholar 

  • Mannhardt F, Blinde D (2017) Analyzing the trajectories of patients with sepsis using process mining. In: RADAR+EMISA 2017, volume 1859 of CEUR workshop proceedings, pp 72–80. CEUR-WS.org

  • Mannhardt F, Petersen S, de Oliveira MFD (2018) Privacy challenges for process mining in human-centered industrial environments. In: 14th international conference on intelligent environments (IE). IEEE Xplore, pp 64–71

  • Mans RS, van der Aalst WMP, Vanwersch RJB, Moleman AJ (2013) Process mining in healthcare: data challenges when answering frequently posed questions. In: Lenz R, Miksch S, Peleg M, Reichert M, Riaño D, ten Teije A (eds) Process support and knowledge representation in health care. Springer, Berlin, pp 140–153

    Chapter  Google Scholar 

  • McSherry F (2010) Privacy integrated queries. Commun ACM 53(9):89

    Article  Google Scholar 

  • McSherry F, Mahajan R (2011) Differentially-private network trace analysis. ACM SIGCOMM Comput Commun Rev 41(4):123–134

    Article  Google Scholar 

  • Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582

    Article  Google Scholar 

  • Mettler M (2016) Blockchain technology in healthcare: the revolution starts here. In: 2016 IEEE 18th international conference on e-health networking, applications and services (Healthcom), pp 1–3

  • Michael J, Steinberger C (2017) Context modeling for active assistance. In: Cabanillas C, España S, Farshidi S (eds) Proceedings of the ER forum 2017 and the ER 2017 demo track co-located with the 36th international conference on conceptual modelling (ER 2017), pp 221–234

  • Michael J, Koschmider A, Mannhardt F, Baracaldo N, Rumpe B (2019) User-centered and privacy-driven process mining system design for IoT. In: information systems engineering in responsible information systems-CAiSE forum 2019, Rome, Proceedings, pp 194–206

  • Myers D, Radke K, Suriadi S, Foo E (2017) Process discovery for industrial control system cyber attack detection. In: De Capitani di Vimercati S, Martinelli F (eds) ICT systems security and privacy protection. Springer, Cham, pp 61–75

    Chapter  Google Scholar 

  • Peterson ZNJ, Gondree M, Beverly R (2011) A position paper on data sovereignty: the importance of geolocating data in the cloud. In: Proceedings of the 3rd USENIX conference on hot topics in cloud computing, HotCloud’11. USENIX Association, Berkeley, CA, pp 9–9

  • Rozinat A, van der Aalst WMP (2006) Decision mining in ProM. In: Lecture notes in computer science. Springer, Berlin, pp 420–425

  • Sacco O, Breslin JG, Decker S (2013) Fine-grained trust assertions for privacy management in the social semantic web. In: 2013 12th IEEE international conference on trust, security and privacy in computing and communications, pp 218–225

  • Sicari S, Rizzardi A, Grieco LA, Coen-Porisini A (2015) Security, privacy and trust in Internet of Things: the road ahead. Comput Netw 76:146–164

    Article  Google Scholar 

  • Stocker T, Accorsi R (2014) SecSy: a security-oriented tool for synthesizing process event logs. In: Limonad L, Weber B (eds) Proceedings of the BPM demo sessions 2014 co-located with the 12th international conference on business process management (BPM 2014), Eindhoven, The Netherlands, September 10, 2014, volume 1295 of CEUR workshop proceedings, p 71. CEUR-WS.org

  • van der Aalst WMP (2016) Process mining: data science in action, 2nd edn. Springer, Berlin

    Book  Google Scholar 

  • van der Aalst W, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192

    Article  Google Scholar 

  • van Eck ML, Lu X, Leemans SJJ, van der Aalst WMP (2015) \(\text{PM}^{2}\): a process mining project methodology. In: Advanced information systems engineering. Springer, pp 297–313

  • Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Rec 33(1):50–57

    Article  Google Scholar 

  • Yu WE (2014) Data privacy and big data-compliance issues and considerations. ISACA J 3:27–31

    Google Scholar 

  • Yu X, Wen Q (2010) A view about cloud data security from data life cycle. In: 2010 international conference on computational intelligence and software engineering, pp 1–4

  • Zhang Z, Qin Z, Zhu L, Weng J, Ren K (2017) Cost-friendly differential privacy for smart meters: exploiting the dual roles of the noise. IEEE Trans Smart Grid 8(2):619–626

    Google Scholar 

  • Zhiqiang G, Longjun Z (2018) Privacy preserving data mining on big data computing platform: trends and future. In: Barolli L, Woungang I, Hussain OK (eds) Advances in intelligent networking and collaborative systems. Springer, Cham, pp 491–502

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Agnes Koschmider.

Additional information

Accepted after two revisions by the editors of the special edition.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mannhardt, F., Koschmider, A., Baracaldo, N. et al. Privacy-Preserving Process Mining. Bus Inf Syst Eng 61, 595–614 (2019). https://doi.org/10.1007/s12599-019-00613-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12599-019-00613-3

Keywords

Navigation