Skip to main content

Responsible Data Science: Using Event Data in a “People Friendly” Manner

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2016)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 291))

Included in the following conference series:

Abstract

The omnipresence of event data and powerful process mining techniques make it possible to quickly learn process models describing what people and organizations really do. Recent breakthroughs in process mining resulted in powerful techniques to discover the real processes, to detect deviations from normative process models, and to analyze bottlenecks and waste. Process mining and other data science techniques can be used to improve processes within any organization. However, there are also great concerns about the use of data for such purposes. Increasingly, customers, patients, and other stakeholders worry about “irresponsible” forms of data science. Automated data decisions may be unfair or non-transparent. Confidential data may be shared unintentionally or abused by third parties. Each step in the “data science pipeline” (from raw data to decisions) may create inaccuracies, e.g., if the data used to learn a model reflects existing social biases, the algorithm is likely to incorporate these biases. These concerns could lead to resistance against the large-scale use of data and make it impossible to reap the benefits of process mining and other data science approaches. This paper discusses Responsible Process Mining (RPM) as a new challenge in the broader field of Responsible Data Science (RDS). Rather than avoiding the use of (event) data altogether, we strongly believe that techniques, infrastructures and approaches can be made responsible by design. Not addressing the challenges related to RPM/RDS may lead to a society where (event) data are misused or analysis results are deeply mistrusted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  2. van der Aalst, W.M.P., Management, B.P.: A comprehensive survey. ISRN Softw. Eng. 1–37 (2013). doi:10.1155/2013/507984

  3. Aalst, W.M.P.: Data scientist: the engineer of the future. In: Mertins, K., Bénaben, F., Poler, R., Bourrières, J.-P. (eds.) Enterprise Interoperability VI. PIC, vol. 7, pp. 13–26. Springer, Cham (2014). doi:10.1007/978-3-319-04948-9_2

    Chapter  Google Scholar 

  4. van der Aalst, W.M.P.: Green data science: using big data in an “environmentally friendly” manner. In: Camp, O., Cordeiro, J. (eds.) Proceedings of the 18th International Conference on Enterprise Information Systems (ICEIS 2016), pp. 9–21. Science and Technology Publications, Portugal (2016)

    Google Scholar 

  5. van der Aalst, W.M.P.: Process Mining: Data Science in Action. Springer, Berlin (2016)

    Book  Google Scholar 

  6. van der Aalst, W.M.P., Adriansyah, A., van Dongen, B.: Replaying history on process models for conformance checking and performance analysis. WIREs Data Mining Knowl. Discov. 2(2), 182–192 (2012)

    Article  Google Scholar 

  7. Burattin, A., Sperduti, A., van der Aalst, W.M.P.: Control-flow discovery from event streams. In: IEEE Congress on Evolutionary Computation (CEC 2014), pp. 2420–2427. IEEE Computer Society (2014)

    Google Scholar 

  8. Calders, T., Verwer, S.: Three naive bayes approaches for discrimination-aware classification. Data Min. Knowl. Disc. 21(2), 277–292 (2010)

    Article  Google Scholar 

  9. Casella, G., Berger, R.L.: Statistical Inference, 2nd edn. Duxbury Press, Delhi (2002)

    MATH  Google Scholar 

  10. Council of the European Union. General Data Protection Regulation (GDPR). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC, April 2016

    Google Scholar 

  11. Donoho, D.: 50 years of Data Science. Technical report, Stanford University, September 2015. Based on a Presentation at the Tukey Centennial Workshop, Princeton, NJ, 18 September 2015

    Google Scholar 

  12. European Commission: Directive 95/46/EC of the European Parliament and of the Council on the Protection of Individuals with Wegard to the Processing of Personal Data and on the Free Movement of Such Data. Official Journal of the European Communities, No L 281/31, October 1995

    Google Scholar 

  13. IEEE Task Force on Process Mining: XES Standard Definition (2013). www.xes-standard.org

  14. Kamiran, F., Calders, T., Pechenizkiy, M.: Discrimination-aware decision-tree learning. In: Proceedings of the IEEE International Conference on Data Mining (ICDM 2010), pp. 869–874 (2010)

    Google Scholar 

  15. Koops, B.J., Oosterlaken, I., Romijn, H., Swierstra, T., Van den Hoven, J. (eds.): Responsible Innovation 2: Concepts, Approaches, and Applications. Springer, Berlin (2015)

    Google Scholar 

  16. Leemans, S.J.J., Fahland, D., Aalst, W.M.P.: Exploring processes and deviations. In: Fournier, F., Mendling, J. (eds.) BPM 2014. LNBIP, vol. 202, pp. 304–316. Springer, Cham (2015). doi:10.1007/978-3-319-15895-2_26

    Google Scholar 

  17. Miller, R.G.: Simultaneous Statistical Inference. Springer, Berlin (1981)

    Book  MATH  Google Scholar 

  18. Monreale, A., Rinzivillo, S., Pratesi, F., Giannotti, F., Pedreschi, D.: Privacy-by-design in big data analytics and social mining. EPJ Data Sci. 1(10), 1–26 (2014)

    Google Scholar 

  19. Naur, P.: Concise Survey of Computer Methods. Studentlitteratur Lund, Akademisk Forlag, Kobenhaven (1974)

    MATH  Google Scholar 

  20. Nelson, G.S.: Practical Implications of Sharing Data: A Primer on Data Privacy, Anonymization, and De-Identification. Paper 1884–2015, ThotWave Technologies, Chapel Hill (2015)

    Google Scholar 

  21. Owen, R., Bessant, J., Heintz, M. (eds.): Responsible Innovation. Wiley, Hoboken (2013)

    Google Scholar 

  22. Pedreshi, D., Ruggieri, S., Turini, F.: Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 560–568. ACM (2008)

    Google Scholar 

  23. President’s Council of Advisors on Science and Technology: Big Data and Privacy: A Technological Perspective (Report to the President). Executive Office of the President, US-PCAST, May 2014

    Google Scholar 

  24. Press, G.: A very short history of data science. Forbes Technology (2013). http://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/

  25. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  26. Ruggieri, S., Pedreshi, D., Turini, F.: DCUBE: discrimination discovery in databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1127–1130. ACM (2010)

    Google Scholar 

  27. Tukey, J.W.: The future of data analysis. Ann. Math. Stat. 33(1), 1–67 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  28. Vigen, T.: Spurious Correlations. Hachette Books, New York (2015)

    Google Scholar 

  29. van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P.: Know what you stream: generating event streams from CPN models in ProM 6. In: Proceedings of the BPM2015 Demo Session. CEURWorkshop Proceedings, vol. 1418, pp. 85–89 (2015). http://ceur-ws.org/

Download references

Acknowledgements

This work is partly based by discussions in the context of the Responsible Data Science (RDS) collaboration involving principal scientists from Eindhoven University of Technology, Leiden University, University of Amsterdam, Radboud University Nijmegen, Tilburg University, VU University, Amsterdam Medical Center, VU Medical Center, Leiden University Medical Center, Delft University of Technology, and CWI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wil M. P. van der Aalst .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

van der Aalst, W.M.P. (2017). Responsible Data Science: Using Event Data in a “People Friendly” Manner. In: Hammoudi, S., Maciaszek, L., Missikoff, M., Camp, O., Cordeiro, J. (eds) Enterprise Information Systems. ICEIS 2016. Lecture Notes in Business Information Processing, vol 291. Springer, Cham. https://doi.org/10.1007/978-3-319-62386-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62386-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62385-6

  • Online ISBN: 978-3-319-62386-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics