Abstract
Machine Learning techniques have gained significant attention for their potential to solve diverse real-world problems across various fields. This study uses machine learning algorithms to predict hepatitis C stages, a prevalent liver disease affecting a substantial portion of the global population. By employing a dataset encompassing 615 patients and incorporating a multitude of factors associated with hepatitis C, a comprehensive analysis was conducted to compare the performance of six prominent machine learning algorithms. The algorithms considered include categorical boosting (CatBoost), Gaussian Naive Bayes (GNB), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), and ExtraTreeClassifier (ExtraT). To optimize the performance of these models, a hyperparameter optimization technique called Optuna was utilized to find the ideal parameters for each algorithm. Subsequently, all models’ performance was evaluated using the test dataset, comprising 20% of the overall patient data. The research findings revealed that the XGBoost algorithm emerged as the most effective approach, exhibiting a remarkable accuracy of 94.31%. Furthermore, the XGBoost model demonstrated exceptional F1-score, precision, and recall values, measuring 94.23%, 94.63%, and 94.31%, respectively. Building upon these promising results, we deployed the XGBoost model in a user-friendly web application leveraging Streamlit. This deployment ensures easy accessibility and usability of the model for the broader community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631 (2019)
Alizargar, A., Chang, Y.L., Tan, T.H.: Performance comparison of machine learning approaches on hepatitis c prediction employing data mining techniques. Bioengineering 10(4), 481 (2023)
Alotaibi, A., et al.: Explainable ensemble-based machine learning models for detecting the presence of cirrhosis in hepatitis c patients. Computation 11(6), 104 (2023)
Anand, M.V., KiranBala, B., Srividhya, S., Younus, M., Rahman, H., et al.: Gaussian naïve bayes algorithm: A reliable technique involved in the assortment of the segregation in cancer. Mobile Information Systems 2022 (2022)
Breiman, L.: Random forests. Machine Learn. 45, 5–32 (2001)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, L., Ji, P., Ma, Y.: Machine learning model for hepatitis c diagnosis customized to each patient. IEEE Access 10, 106655–106672 (2022)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Cohen, I., et al.: Pearson correlation coefficient. Noise reduction in speech processing, pp. 1–4 (2009)
Gerber, M.A.: Pathology of hepatitis c. FEMS Microbiol. Rev. 14(3), 205–210 (1994)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Hancock, J.T., Khoshgoftaar, T.M.: Catboost for big data: an interdisciplinary review. J. big data 7(1), 1–45 (2020)
Kalra, A., Yetiskul, E., Wehrle, C.J., Tuma, F.: Physiology, liver (2018)
Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems 30 (2017)
Lichtinghagen, R., Klawonn, F., Hoffmann, G.: Hcv data data set. Available online:(accessed on 19 March 2023), UCI Machine Learning Repository (2020)
Ma, L., Yang, Y., Ge, X., Wan, Y., Sang, X.: Prediction of disease progression of chronic hepatitis c based on xgboost algorithm. In: 2020 International Conference on Robots & Intelligent System (ICRIS), pp. 598–601. IEEE (2020)
Marcellin, P., Asselah, T., Boyer, N.: Fibrosis and disease progression in hepatitis c. Hepatology 36(S1), S47–S56 (2002)
Nandipati, S.C., XinYing, C., Wah, K.K.: Hepatitis c virus (hcv) prediction by machine learning techniques. Appl. Modell. Simul. 4, 89–100 (2020)
Oladimeji, O.O., Oladimeji, A., Olayanju, O.: Machine learning models for diagnostic classification of hepatitis c tests. Front. Health Inform. 10(1), 70 (2021)
Oleiwi, A.: Development of diagnostic decision making for chronic hepatitis c virus patients by various supervised predictive model. J. Adv. Res. Dyn. Control Syst. 12, 3113–3123 (10 2020)
Organization, W.H., et al.: Global hepatitis report 2017: World health organization. Accessed Oct 23 2020 (2017)
Organization, W.H., et al.: Hepatitis C rapid diagnostic tests for professional use and/or self-testing. World Health Organization (2022)
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: Catboost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Raschka, S.: An overview of general performance metrics of binary classifier systems (2014)
Safdari, R., Deghatipour, A., Gholamzadeh, M., Maghooli, K.: Applying data mining techniques to classify patients with suspected hepatitis c virus infection. Intell. Med. 2(04), 193–198 (2022)
Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In: Sattar, A., Kang, B. (eds.) AI 2006: Advances in Artificial Intelligence, pp. 1015–1021. Springer Berlin Heidelberg, Berlin, Heidelberg (2006). https://doi.org/10.1007/11941439_114
Zingaretti, C., De Francesco, R., Abrignani, S.: Why is it so difficult to develop a hepatitis c virus preventive vaccine? Clin. Microbiol. Infect. 20, 103–109 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yefou, U.N., Choudja, P.O.M., Sow, B., Adejumo, A. (2024). Optimized Machine Learning Models for Hepatitis C Prediction: Leveraging Optuna for Hyperparameter Tuning and Streamlit for Model Deployment. In: Debelee, T.G., Ibenthal, A., Schwenker, F., Megersa Ayano, Y. (eds) Pan-African Conference on Artificial Intelligence. PanAfriConAI 2023. Communications in Computer and Information Science, vol 2068. Springer, Cham. https://doi.org/10.1007/978-3-031-57624-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-57624-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57623-2
Online ISBN: 978-3-031-57624-9
eBook Packages: Computer ScienceComputer Science (R0)