Authors:
Ludmila Nascimento
1
;
Marcelo Balbino
1
;
2
;
Maycoln Teodoro
3
and
Cristiane Nobre
1
Affiliations:
1
Institute of Exact Sciences and Informatics, Pontifical Catholic University of Minas Gerais, Dom José Gaspar, Belo Horizonte, Brazil
;
2
Department of Computing and Civil Construction, Federal Center for Technological Education of Minas Gerais, Belo Horizonte, Brazil
;
3
Department of Psychology, Federal University of Minas Gerais, Belo Horizonte, Brazil
Keyword(s):
Children and Adolescents, Depression, Encoding, Interpretability.
Abstract:
Depression is a global public health challenge that affects approximately 300 million people. Artificial Intelligence and Machine Learning have revolutionized the healthcare sector, allowing the development of models to diagnose depression. Tabular data, shared in healthcare, requires preprocessing, including encoding categorical attributes into numeric values, as many Machine Learning algorithms only support numeric data. This study aims to investigate different coding methods for non-ordinal nominal categorical attributes in a dataset related to depression in children and adolescents suffering from Major Depressive Disorder (MDD). The comparison results revealed that the XGBoost algorithm with the Hash Encoding, Customized One Hot, Frequency, and Dummy coding techniques were more effective for the analyzed data set. However, not all of these encodings are interpretable. These results provide significant insights, highlighting the importance of choosing appropriate coding methods to
improve the accuracy of Machine Learning models and the interpretability of these models in healthcare.
(More)