Skip to main content

Comparing Machine Learning Correlations to Domain Experts’ Causal Knowledge: Employee Turnover Use Case

  • Conference paper
  • First Online:
Machine Learning and Knowledge Extraction (CD-MAKE 2022)

Abstract

This paper addresses two major phenomena, machine learning and causal knowledge discovery in the context of human resources management. First, we examine previous work analysing employee turnover predictions and the most important factors affecting these predictions using regular machine learning (ML) algorithms, we then interpret the results concluded from developing and testing different classification models using the IBM Human Resources (HR) data. Second, we explore an alternative process of extracting causal knowledge from semi-structured interviews with HR experts to form expert-derived causal graph (map). Through a comparison between the results concluded from using machine learning approaches and from interpreting findings of the interviews, we explore the benefits of adding domain experts’ causal knowledge to data knowledge. Recommendations are provided on the best methods and techniques to consider for causal graph learning to improve decision making.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.microsoft.com/en-us/microsoft-teams/group-chat-software.

  2. 2.

    https://lucid.co/.

References

  1. Adams, W.C., et al.: Conducting semi-structured interviews. In: Wholey, J., Hatry, H., Newcomer, K. (eds.) Handbook of Practical Program Evaluation, vol. 4, pp. 492–505. John Wiley & Sons, Inc., Hoboken (2015)

    Google Scholar 

  2. Aglietti, V., Damoulas, T., Álvarez, M., González, J.: Multi-task causal learning with Gaussian processes. arXiv preprint arXiv:2009.12821 (2020)

  3. Al-Radaideh, Q.A., Al Nagi, E.: Using data mining techniques to build a classification model for predicting employees performance. Int. J. Adv. Comput. Sci. Appl. 3(2) (2012). https://doi.org/10.14569/IJACSA.2012.030225, http://dx.doi.org/10.14569/IJACSA.2012.030225

  4. Athey, S.: Machine learning and causal inference for policy evaluation. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 5–6. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2783258.2785466, https://doi.org/10.1145/2783258.2785466

  5. Athey, S.: 21.The Impact of Machine Learning on Economics. In: The Economics of Artificial Intelligence, pp. 507–552. University of Chicago Press, Chicago (2019). https://doi.org/10.7208/chicago/9780226613475.001.0001, https://www.nber.org/books-and-chapters/economics-artificial-intelligence-agenda

  6. Athey, S., Imbens, G.: A measure of robustness to misspecification. Am. Econ. Rev. 105(5), 476–480 (2015). https://doi.org/10.1257/aer.p20151020, https://www.aeaweb.org/articles?id=10.1257/aer.p20151020

  7. Barbiero, P., Squillero, G., Tonda, A.: Modeling generalization in machine learning: a methodological and computational study. arXiv preprint arXiv:2006.15680 (2020)

  8. Bareinboim, E., Pearl, J.: Controlling selection bias in causal inference. In: Artificial Intelligence and Statistics, pp. 100–108. PMLR (2012). https://proceedings.mlr.press/v22/bareinboim12.html

  9. Bareinboim, E., Pearl, J.: Transportability of causal effects: completeness results. In: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2012, vol. 26, pp. 698–704 (2012)

    Google Scholar 

  10. Boyd, K., Eng, K.H., Page, C.D.: Area under the precision-recall curve: point estimates and confidence intervals. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 451–466. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_29

    Chapter  Google Scholar 

  11. Brownlee, J.: How to use ROC curves and precision-recall curves for classification in python. https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/ (2018). Accessed 10 Oct-2021

  12. Cai, X., Shang, J., Jin, Z., Liu, F., Qiang, B., Xie, W., Zhao, L.: DBGE: employee turnover prediction based on dynamic bipartite graph embedding. IEEE Access 8, 10390–10402 (2020)

    Article  Google Scholar 

  13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  14. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  15. Chien, C.F., Chen, L.F.: Data mining to improve personnel selection and enhance human capital: a case study in high-technology industry. Exp. Syst. Appl. 34(1), 280–290 (2008). https://doi.org/10.1016/j.eswa.2006.09.003, https://www.sciencedirect.com/science/article/pii/S0957417406002776

  16. Chowdhury, S., Joel-Edgar, S., Dey, P.K., Bhattacharya, S., Kharlamov, A.: Embedding transparency in artificial intelligence machine learning models: managerial implications on predicting and explaining employee turnover. Int. J. Hum. Resour. Manag. 1–32 (2022)

    Google Scholar 

  17. Correa, J.D., Tian, J., Bareinboim, E.: Identification of causal effects in the presence of selection bias. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 2744–2751 (2019)

    Google Scholar 

  18. DGOKE1: IBM HR Dataset: exploratory data analysis. https://www.kaggle.com/code/dgokeeffe/ibm-hr-dataset-exploratory-data-analysis/data (2017). Accessed 17 June 2022

  19. Duan, Y.: Statistical analysis and prediction of employee turnover propensity based on data mining. In: 2022 International Conference on Big Data, Information and Computer Network (BDICN), pp. 235–238 (2022). https://doi.org/10.1109/BDICN55575.2022.00052

  20. Evans, C., Lewis, J.: Analysing Semi-Structured Interviews Using Thematic Analysis: Exploring Voluntary Civic Participation Among Adults. SAGE Publications Limited, London (2018)

    Google Scholar 

  21. Farzaneh, F.: Attrition-binary classification of imbalanced data. https://www.kaggle.com/code/oceands/attrition-binary-classification-of-imbalanced-data/notebook (2021). Accessed 09 Oct 2021

  22. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    Google Scholar 

  23. Galletta, A.: Mastering the Semi-Structured Interview and Beyond. New York University Press, New York (2013)

    Google Scholar 

  24. Garg, S., Sinha, S., Kar, A.K., Mani, M.: A review of machine learning applications in human resource management. Int. J. Prod. Perform. Manag. 23 (2021)

    Google Scholar 

  25. Guest, G., Bunce, A., Johnson, L.: How many interviews are enough? an experiment with data saturation and variability. Field Methods 18(1), 59–82 (2006)

    Article  Google Scholar 

  26. Hang, J., Dong, Z., Zhao, H., Song, X., Wang, P., Zhu, H.: Outside. In: Market-aware heterogeneous graph neural network for employee turnover prediction. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 353–362 (2022)

    Google Scholar 

  27. Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)

    Article  Google Scholar 

  28. Hünermund, P., Kaminski, J., Schmitt, C.: Causal Machine Learning And Business-Decision Making (2021)

    Google Scholar 

  29. Jain, R., Nayyar, A.: Predicting employee attrition using XGBoost machine learning approach. In: 2018 International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 113–120. IEEE (2018)

    Google Scholar 

  30. Joarder, M.H.: The role of HRM practices in predicting faculty turnover intention: empirical evidence from private universities in Bangladesh. South East Asian J. Manag. 5 (2012)

    Google Scholar 

  31. Kovan, I.: An overview of boosting methods: CatBoost, XGBoost, AdaBoost, LightBoost, Histogram-based gradient boost. https://towardsdatascience.com/an-overview-of-boosting-methods-catboost-xgboost-adaboost-lightboost-histogram-based-gradient-407447633ac1 (2021). Accessed 3 Mar 2022

  32. Kumova, B.I., Saller, D.: Mining causal hypotheses in categorical time series by iterating on binary correlations. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2021. LNCS, vol. 12844, pp. 99–114. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84060-0_7

    Chapter  Google Scholar 

  33. Lazzari, M., Alvarez, J.M., Ruggieri, S.: Predicting and explaining employee turnover intention. Int. J. Data Sci. Anal. 33(9), 911–923 (2022)

    Google Scholar 

  34. Lee, S., Correa, J., Bareinboim, E.: General transportability-synthesizing observations and experiments from heterogeneous domains. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10210–10217 (2020)

    Google Scholar 

  35. Ling, C.X., Huang, J., Zhang, H., et al.: AUC: a statistically consistent and more discriminating measure than accuracy. In: IJCAI, vol. 3, pp. 519–524 (2003)

    Google Scholar 

  36. Ma, X., Zhang, Y., Song, Y., Wang, E., Yao, F., Zhang, Z.: Application of data mining in the field of human resource management: a review. In: 1st International Symposium on Economic Development and Management Innovation (EDMI 2019), pp. 222–227. Atlantis Press (2019)

    Google Scholar 

  37. Mackieson, P., Shlonsky, A., Connolly, M.: Increasing rigor and reducing bias in qualitative research: A document analysis of parliamentary debates using applied thematic analysis. Qual. Soc. Work. 18(6), 965–980 (2019)

    Article  Google Scholar 

  38. Madhavan, A.: Correlation vs causation: understand the difference for your product. https://amplitude.com/blog/causation-correlation (2019). Accessed 6 Mar 2022

  39. Maria-Carmen, L.: Classical machine-learning classifiers to predict employee turnover. In: Education, Research and Business Technologies, pp. 295–306. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-8866-9_25

  40. Moraffah, R., Karami, M., Guo, R., Raglin, A., Liu, H.: Causal interpretability for machine learning-problems, methods and evaluation. ACM SIGKDD Explor. News 22(1), 18–33 (2020)

    Article  Google Scholar 

  41. Palinkas, L.A., Horwitz, S.M., Green, C.A., Wisdom, J.P., Duan, N., Hoagwood, K.: Purposeful sampling for qualitative data collection and analysis in mixed method implementation research. Adm. Policy Mental Health Serv. Res. 42(5), 533–544 (2015)

    Google Scholar 

  42. Pearl, J.: The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62(3), 54–60 (2019)

    Article  Google Scholar 

  43. Pearl, J., Bareinboim, E.: Transportability of causal and statistical relations: a formal approach. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)

    Google Scholar 

  44. Pearl, J., Bareinboim, E.: External validity: from do-calculus to transportability across populations. Stat. Sci. 29(4), 579–595 (2014)

    Article  MathSciNet  Google Scholar 

  45. Pearl, J., Mackenzie, D.: The Book of Why: The New Science of Cause and Effect, 1st edn., Basic Books, New York (2018)

    Google Scholar 

  46. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  47. Peters, J., Janzing, D., Schölkopf, B.: Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, Cambridge (2017)

    Google Scholar 

  48. Pickus, S.: Logistic-regression-classifier-with-l2-regularization, April 2014. https://github.com/pickus91/Logistic-Regression-Classifier-with-L2-Regularization

  49. Raschka, S.: Python Machine Learning. Packt Publishing Ltd., Birmingham (2015)

    Google Scholar 

  50. Saarela, M., Jauhiainen, S.: Comparison of feature importance measures as explanations for classification models. SN Appl. Sci. 3(2), 1–12 (2021). https://doi.org/10.1007/s42452-021-04148-9

    Article  Google Scholar 

  51. Sakia, R.M.: The box-cox transformation technique: a review. J. R. Stat, Soc. Ser. D 41(2), 169–178 (1992)

    Google Scholar 

  52. Schölkopf, B., et al.: Towards causal representation learning. arXiv preprint arXiv:2102.11107 (2021)

  53. Sharma, R., Mithas, S., Kankanhalli, A.: Transforming decision-making processes: a research agenda for understanding the impact of business analytics on organizations. Eur. J. Inf. Syst. 23(4), 433–441 (2014)

    Article  Google Scholar 

  54. Shrestha, Y.R., Ben-Menahem, S.M., Von Krogh, G.: Organizational decision-making structures in the age of artificial intelligence. Calif. Manage. Rev. 61(4), 66–83 (2019)

    Article  Google Scholar 

  55. Sikaroudi, E., Mohammad, A., Ghousi, R., Sikaroudi, A.: A data mining approach to employee turnover prediction (case study: Arak automotive parts manufacturing). J. Ind. Syst. Eng. 8(4), 106–121 (2015)

    Google Scholar 

  56. Simon, H.A.: On the concept of organizational goal. Admin. Sci. Q. 9,1–22 (1964)

    Google Scholar 

  57. Spirtes, P.: Introduction to causal inference. J. Mach. Learn. Res. 11(5) (2010)

    Google Scholar 

  58. Strohmeier, S., Piazza, F.: Domain driven data mining in human resource management: A review of current research. Expert Syst. Appl. 40(7), 2410–2420 (2013)

    Article  Google Scholar 

  59. Tang, X., Chen, A., He, J.: A modelling approach based on Bayesian networks for dam risk analysis: integration of machine learning algorithm and domain knowledge. Int. J. Dis. Risk Reduct. 71, 102818 (2022)

    Google Scholar 

  60. Vega, R.P., Anderson, A.J., Kaplan, S.A.: A within-person examination of the effects of telework. J. Bus. Psychol. 30(2), 313–323 (2015)

    Article  Google Scholar 

  61. Zeng, S., Bayir, M.A., Pfeiffer III, J.J., Charles, D., Kiciman, E.: Causal transfer random forest: combining logged data and randomized experiments for robust prediction. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 211–219 (2021)

    Google Scholar 

  62. Zhao, Y., Hryniewicki, M.K., Cheng, F., Fu, B., Zhu, X.: Employee turnover prediction with machine learning: a reliable approach. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 869, pp. 737–758. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01057-7_56

    Chapter  Google Scholar 

  63. Zhu, Q., Shang, J., Cai, X., Jiang, L., Liu, F., Qiang, B.: CoxRF: employee turnover prediction based on survival analysis. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1123–1130. IEEE (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eya Meddeb .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Meddeb, E., Bowers, C., Nichol, L. (2022). Comparing Machine Learning Correlations to Domain Experts’ Causal Knowledge: Employee Turnover Use Case. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14463-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14462-2

  • Online ISBN: 978-3-031-14463-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics