Abstract
The omnipresence of event data and powerful process mining techniques make it possible to quickly learn process models describing what people and organizations really do. Recent breakthroughs in process mining resulted in powerful techniques to discover the real processes, to detect deviations from normative process models, and to analyze bottlenecks and waste. Process mining and other data science techniques can be used to improve processes within any organization. However, there are also great concerns about the use of data for such purposes. Increasingly, customers, patients, and other stakeholders worry about “irresponsible” forms of data science. Automated data decisions may be unfair or non-transparent. Confidential data may be shared unintentionally or abused by third parties. Each step in the “data science pipeline” (from raw data to decisions) may create inaccuracies, e.g., if the data used to learn a model reflects existing social biases, the algorithm is likely to incorporate these biases. These concerns could lead to resistance against the large-scale use of data and make it impossible to reap the benefits of process mining and other data science approaches. This paper discusses Responsible Process Mining (RPM) as a new challenge in the broader field of Responsible Data Science (RDS). Rather than avoiding the use of (event) data altogether, we strongly believe that techniques, infrastructures and approaches can be made responsible by design. Not addressing the challenges related to RPM/RDS may lead to a society where (event) data are misused or analysis results are deeply mistrusted.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer, Berlin (2011)
van der Aalst, W.M.P., Management, B.P.: A comprehensive survey. ISRN Softw. Eng. 1–37 (2013). doi:10.1155/2013/507984
Aalst, W.M.P.: Data scientist: the engineer of the future. In: Mertins, K., Bénaben, F., Poler, R., Bourrières, J.-P. (eds.) Enterprise Interoperability VI. PIC, vol. 7, pp. 13–26. Springer, Cham (2014). doi:10.1007/978-3-319-04948-9_2
van der Aalst, W.M.P.: Green data science: using big data in an “environmentally friendly” manner. In: Camp, O., Cordeiro, J. (eds.) Proceedings of the 18th International Conference on Enterprise Information Systems (ICEIS 2016), pp. 9–21. Science and Technology Publications, Portugal (2016)
van der Aalst, W.M.P.: Process Mining: Data Science in Action. Springer, Berlin (2016)
van der Aalst, W.M.P., Adriansyah, A., van Dongen, B.: Replaying history on process models for conformance checking and performance analysis. WIREs Data Mining Knowl. Discov. 2(2), 182–192 (2012)
Burattin, A., Sperduti, A., van der Aalst, W.M.P.: Control-flow discovery from event streams. In: IEEE Congress on Evolutionary Computation (CEC 2014), pp. 2420–2427. IEEE Computer Society (2014)
Calders, T., Verwer, S.: Three naive bayes approaches for discrimination-aware classification. Data Min. Knowl. Disc. 21(2), 277–292 (2010)
Casella, G., Berger, R.L.: Statistical Inference, 2nd edn. Duxbury Press, Delhi (2002)
Council of the European Union. General Data Protection Regulation (GDPR). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC, April 2016
Donoho, D.: 50 years of Data Science. Technical report, Stanford University, September 2015. Based on a Presentation at the Tukey Centennial Workshop, Princeton, NJ, 18 September 2015
European Commission: Directive 95/46/EC of the European Parliament and of the Council on the Protection of Individuals with Wegard to the Processing of Personal Data and on the Free Movement of Such Data. Official Journal of the European Communities, No L 281/31, October 1995
IEEE Task Force on Process Mining: XES Standard Definition (2013). www.xes-standard.org
Kamiran, F., Calders, T., Pechenizkiy, M.: Discrimination-aware decision-tree learning. In: Proceedings of the IEEE International Conference on Data Mining (ICDM 2010), pp. 869–874 (2010)
Koops, B.J., Oosterlaken, I., Romijn, H., Swierstra, T., Van den Hoven, J. (eds.): Responsible Innovation 2: Concepts, Approaches, and Applications. Springer, Berlin (2015)
Leemans, S.J.J., Fahland, D., Aalst, W.M.P.: Exploring processes and deviations. In: Fournier, F., Mendling, J. (eds.) BPM 2014. LNBIP, vol. 202, pp. 304–316. Springer, Cham (2015). doi:10.1007/978-3-319-15895-2_26
Miller, R.G.: Simultaneous Statistical Inference. Springer, Berlin (1981)
Monreale, A., Rinzivillo, S., Pratesi, F., Giannotti, F., Pedreschi, D.: Privacy-by-design in big data analytics and social mining. EPJ Data Sci. 1(10), 1–26 (2014)
Naur, P.: Concise Survey of Computer Methods. Studentlitteratur Lund, Akademisk Forlag, Kobenhaven (1974)
Nelson, G.S.: Practical Implications of Sharing Data: A Primer on Data Privacy, Anonymization, and De-Identification. Paper 1884–2015, ThotWave Technologies, Chapel Hill (2015)
Owen, R., Bessant, J., Heintz, M. (eds.): Responsible Innovation. Wiley, Hoboken (2013)
Pedreshi, D., Ruggieri, S., Turini, F.: Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 560–568. ACM (2008)
President’s Council of Advisors on Science and Technology: Big Data and Privacy: A Technological Perspective (Report to the President). Executive Office of the President, US-PCAST, May 2014
Press, G.: A very short history of data science. Forbes Technology (2013). http://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)
Ruggieri, S., Pedreshi, D., Turini, F.: DCUBE: discrimination discovery in databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1127–1130. ACM (2010)
Tukey, J.W.: The future of data analysis. Ann. Math. Stat. 33(1), 1–67 (1962)
Vigen, T.: Spurious Correlations. Hachette Books, New York (2015)
van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P.: Know what you stream: generating event streams from CPN models in ProM 6. In: Proceedings of the BPM2015 Demo Session. CEURWorkshop Proceedings, vol. 1418, pp. 85–89 (2015). http://ceur-ws.org/
Acknowledgements
This work is partly based by discussions in the context of the Responsible Data Science (RDS) collaboration involving principal scientists from Eindhoven University of Technology, Leiden University, University of Amsterdam, Radboud University Nijmegen, Tilburg University, VU University, Amsterdam Medical Center, VU Medical Center, Leiden University Medical Center, Delft University of Technology, and CWI.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
van der Aalst, W.M.P. (2017). Responsible Data Science: Using Event Data in a “People Friendly” Manner. In: Hammoudi, S., Maciaszek, L., Missikoff, M., Camp, O., Cordeiro, J. (eds) Enterprise Information Systems. ICEIS 2016. Lecture Notes in Business Information Processing, vol 291. Springer, Cham. https://doi.org/10.1007/978-3-319-62386-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-62386-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62385-6
Online ISBN: 978-3-319-62386-3
eBook Packages: Computer ScienceComputer Science (R0)