Abstract
This paper addresses two major phenomena, machine learning and causal knowledge discovery in the context of human resources management. First, we examine previous work analysing employee turnover predictions and the most important factors affecting these predictions using regular machine learning (ML) algorithms, we then interpret the results concluded from developing and testing different classification models using the IBM Human Resources (HR) data. Second, we explore an alternative process of extracting causal knowledge from semi-structured interviews with HR experts to form expert-derived causal graph (map). Through a comparison between the results concluded from using machine learning approaches and from interpreting findings of the interviews, we explore the benefits of adding domain experts’ causal knowledge to data knowledge. Recommendations are provided on the best methods and techniques to consider for causal graph learning to improve decision making.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adams, W.C., et al.: Conducting semi-structured interviews. In: Wholey, J., Hatry, H., Newcomer, K. (eds.) Handbook of Practical Program Evaluation, vol. 4, pp. 492–505. John Wiley & Sons, Inc., Hoboken (2015)
Aglietti, V., Damoulas, T., Álvarez, M., González, J.: Multi-task causal learning with Gaussian processes. arXiv preprint arXiv:2009.12821 (2020)
Al-Radaideh, Q.A., Al Nagi, E.: Using data mining techniques to build a classification model for predicting employees performance. Int. J. Adv. Comput. Sci. Appl. 3(2) (2012). https://doi.org/10.14569/IJACSA.2012.030225, http://dx.doi.org/10.14569/IJACSA.2012.030225
Athey, S.: Machine learning and causal inference for policy evaluation. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 5–6. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2783258.2785466, https://doi.org/10.1145/2783258.2785466
Athey, S.: 21.The Impact of Machine Learning on Economics. In: The Economics of Artificial Intelligence, pp. 507–552. University of Chicago Press, Chicago (2019). https://doi.org/10.7208/chicago/9780226613475.001.0001, https://www.nber.org/books-and-chapters/economics-artificial-intelligence-agenda
Athey, S., Imbens, G.: A measure of robustness to misspecification. Am. Econ. Rev. 105(5), 476–480 (2015). https://doi.org/10.1257/aer.p20151020, https://www.aeaweb.org/articles?id=10.1257/aer.p20151020
Barbiero, P., Squillero, G., Tonda, A.: Modeling generalization in machine learning: a methodological and computational study. arXiv preprint arXiv:2006.15680 (2020)
Bareinboim, E., Pearl, J.: Controlling selection bias in causal inference. In: Artificial Intelligence and Statistics, pp. 100–108. PMLR (2012). https://proceedings.mlr.press/v22/bareinboim12.html
Bareinboim, E., Pearl, J.: Transportability of causal effects: completeness results. In: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2012, vol. 26, pp. 698–704 (2012)
Boyd, K., Eng, K.H., Page, C.D.: Area under the precision-recall curve: point estimates and confidence intervals. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 451–466. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_29
Brownlee, J.: How to use ROC curves and precision-recall curves for classification in python. https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/ (2018). Accessed 10 Oct-2021
Cai, X., Shang, J., Jin, Z., Liu, F., Qiang, B., Xie, W., Zhao, L.: DBGE: employee turnover prediction based on dynamic bipartite graph embedding. IEEE Access 8, 10390–10402 (2020)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Chien, C.F., Chen, L.F.: Data mining to improve personnel selection and enhance human capital: a case study in high-technology industry. Exp. Syst. Appl. 34(1), 280–290 (2008). https://doi.org/10.1016/j.eswa.2006.09.003, https://www.sciencedirect.com/science/article/pii/S0957417406002776
Chowdhury, S., Joel-Edgar, S., Dey, P.K., Bhattacharya, S., Kharlamov, A.: Embedding transparency in artificial intelligence machine learning models: managerial implications on predicting and explaining employee turnover. Int. J. Hum. Resour. Manag. 1–32 (2022)
Correa, J.D., Tian, J., Bareinboim, E.: Identification of causal effects in the presence of selection bias. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 2744–2751 (2019)
DGOKE1: IBM HR Dataset: exploratory data analysis. https://www.kaggle.com/code/dgokeeffe/ibm-hr-dataset-exploratory-data-analysis/data (2017). Accessed 17 June 2022
Duan, Y.: Statistical analysis and prediction of employee turnover propensity based on data mining. In: 2022 International Conference on Big Data, Information and Computer Network (BDICN), pp. 235–238 (2022). https://doi.org/10.1109/BDICN55575.2022.00052
Evans, C., Lewis, J.: Analysing Semi-Structured Interviews Using Thematic Analysis: Exploring Voluntary Civic Participation Among Adults. SAGE Publications Limited, London (2018)
Farzaneh, F.: Attrition-binary classification of imbalanced data. https://www.kaggle.com/code/oceands/attrition-binary-classification-of-imbalanced-data/notebook (2021). Accessed 09 Oct 2021
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Galletta, A.: Mastering the Semi-Structured Interview and Beyond. New York University Press, New York (2013)
Garg, S., Sinha, S., Kar, A.K., Mani, M.: A review of machine learning applications in human resource management. Int. J. Prod. Perform. Manag. 23 (2021)
Guest, G., Bunce, A., Johnson, L.: How many interviews are enough? an experiment with data saturation and variability. Field Methods 18(1), 59–82 (2006)
Hang, J., Dong, Z., Zhao, H., Song, X., Wang, P., Zhu, H.: Outside. In: Market-aware heterogeneous graph neural network for employee turnover prediction. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 353–362 (2022)
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)
Hünermund, P., Kaminski, J., Schmitt, C.: Causal Machine Learning And Business-Decision Making (2021)
Jain, R., Nayyar, A.: Predicting employee attrition using XGBoost machine learning approach. In: 2018 International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 113–120. IEEE (2018)
Joarder, M.H.: The role of HRM practices in predicting faculty turnover intention: empirical evidence from private universities in Bangladesh. South East Asian J. Manag. 5 (2012)
Kovan, I.: An overview of boosting methods: CatBoost, XGBoost, AdaBoost, LightBoost, Histogram-based gradient boost. https://towardsdatascience.com/an-overview-of-boosting-methods-catboost-xgboost-adaboost-lightboost-histogram-based-gradient-407447633ac1 (2021). Accessed 3 Mar 2022
Kumova, B.I., Saller, D.: Mining causal hypotheses in categorical time series by iterating on binary correlations. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2021. LNCS, vol. 12844, pp. 99–114. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84060-0_7
Lazzari, M., Alvarez, J.M., Ruggieri, S.: Predicting and explaining employee turnover intention. Int. J. Data Sci. Anal. 33(9), 911–923 (2022)
Lee, S., Correa, J., Bareinboim, E.: General transportability-synthesizing observations and experiments from heterogeneous domains. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10210–10217 (2020)
Ling, C.X., Huang, J., Zhang, H., et al.: AUC: a statistically consistent and more discriminating measure than accuracy. In: IJCAI, vol. 3, pp. 519–524 (2003)
Ma, X., Zhang, Y., Song, Y., Wang, E., Yao, F., Zhang, Z.: Application of data mining in the field of human resource management: a review. In: 1st International Symposium on Economic Development and Management Innovation (EDMI 2019), pp. 222–227. Atlantis Press (2019)
Mackieson, P., Shlonsky, A., Connolly, M.: Increasing rigor and reducing bias in qualitative research: A document analysis of parliamentary debates using applied thematic analysis. Qual. Soc. Work. 18(6), 965–980 (2019)
Madhavan, A.: Correlation vs causation: understand the difference for your product. https://amplitude.com/blog/causation-correlation (2019). Accessed 6 Mar 2022
Maria-Carmen, L.: Classical machine-learning classifiers to predict employee turnover. In: Education, Research and Business Technologies, pp. 295–306. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-8866-9_25
Moraffah, R., Karami, M., Guo, R., Raglin, A., Liu, H.: Causal interpretability for machine learning-problems, methods and evaluation. ACM SIGKDD Explor. News 22(1), 18–33 (2020)
Palinkas, L.A., Horwitz, S.M., Green, C.A., Wisdom, J.P., Duan, N., Hoagwood, K.: Purposeful sampling for qualitative data collection and analysis in mixed method implementation research. Adm. Policy Mental Health Serv. Res. 42(5), 533–544 (2015)
Pearl, J.: The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62(3), 54–60 (2019)
Pearl, J., Bareinboim, E.: Transportability of causal and statistical relations: a formal approach. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
Pearl, J., Bareinboim, E.: External validity: from do-calculus to transportability across populations. Stat. Sci. 29(4), 579–595 (2014)
Pearl, J., Mackenzie, D.: The Book of Why: The New Science of Cause and Effect, 1st edn., Basic Books, New York (2018)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Peters, J., Janzing, D., Schölkopf, B.: Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, Cambridge (2017)
Pickus, S.: Logistic-regression-classifier-with-l2-regularization, April 2014. https://github.com/pickus91/Logistic-Regression-Classifier-with-L2-Regularization
Raschka, S.: Python Machine Learning. Packt Publishing Ltd., Birmingham (2015)
Saarela, M., Jauhiainen, S.: Comparison of feature importance measures as explanations for classification models. SN Appl. Sci. 3(2), 1–12 (2021). https://doi.org/10.1007/s42452-021-04148-9
Sakia, R.M.: The box-cox transformation technique: a review. J. R. Stat, Soc. Ser. D 41(2), 169–178 (1992)
Schölkopf, B., et al.: Towards causal representation learning. arXiv preprint arXiv:2102.11107 (2021)
Sharma, R., Mithas, S., Kankanhalli, A.: Transforming decision-making processes: a research agenda for understanding the impact of business analytics on organizations. Eur. J. Inf. Syst. 23(4), 433–441 (2014)
Shrestha, Y.R., Ben-Menahem, S.M., Von Krogh, G.: Organizational decision-making structures in the age of artificial intelligence. Calif. Manage. Rev. 61(4), 66–83 (2019)
Sikaroudi, E., Mohammad, A., Ghousi, R., Sikaroudi, A.: A data mining approach to employee turnover prediction (case study: Arak automotive parts manufacturing). J. Ind. Syst. Eng. 8(4), 106–121 (2015)
Simon, H.A.: On the concept of organizational goal. Admin. Sci. Q. 9,1–22 (1964)
Spirtes, P.: Introduction to causal inference. J. Mach. Learn. Res. 11(5) (2010)
Strohmeier, S., Piazza, F.: Domain driven data mining in human resource management: A review of current research. Expert Syst. Appl. 40(7), 2410–2420 (2013)
Tang, X., Chen, A., He, J.: A modelling approach based on Bayesian networks for dam risk analysis: integration of machine learning algorithm and domain knowledge. Int. J. Dis. Risk Reduct. 71, 102818 (2022)
Vega, R.P., Anderson, A.J., Kaplan, S.A.: A within-person examination of the effects of telework. J. Bus. Psychol. 30(2), 313–323 (2015)
Zeng, S., Bayir, M.A., Pfeiffer III, J.J., Charles, D., Kiciman, E.: Causal transfer random forest: combining logged data and randomized experiments for robust prediction. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 211–219 (2021)
Zhao, Y., Hryniewicki, M.K., Cheng, F., Fu, B., Zhu, X.: Employee turnover prediction with machine learning: a reliable approach. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 869, pp. 737–758. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01057-7_56
Zhu, Q., Shang, J., Cai, X., Jiang, L., Liu, F., Qiang, B.: CoxRF: employee turnover prediction based on survival analysis. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1123–1130. IEEE (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Meddeb, E., Bowers, C., Nichol, L. (2022). Comparing Machine Learning Correlations to Domain Experts’ Causal Knowledge: Employee Turnover Use Case. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-14463-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14462-2
Online ISBN: 978-3-031-14463-9
eBook Packages: Computer ScienceComputer Science (R0)