Optimized Machine Learning Models for Hepatitis C Prediction: Leveraging Optuna for Hyperparameter Tuning and Streamlit for Model Deployment

Yefou, Uriel Nguefack; Choudja, Pauline Ornela Megne; Sow, Binta; Adejumo, Abduljaleel

doi:10.1007/978-3-031-57624-9_5

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2068))

Included in the following conference series:

Pan African Conference on Artificial Intelligence

35 Accesses

Abstract

Machine Learning techniques have gained significant attention for their potential to solve diverse real-world problems across various fields. This study uses machine learning algorithms to predict hepatitis C stages, a prevalent liver disease affecting a substantial portion of the global population. By employing a dataset encompassing 615 patients and incorporating a multitude of factors associated with hepatitis C, a comprehensive analysis was conducted to compare the performance of six prominent machine learning algorithms. The algorithms considered include categorical boosting (CatBoost), Gaussian Naive Bayes (GNB), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), and ExtraTreeClassifier (ExtraT). To optimize the performance of these models, a hyperparameter optimization technique called Optuna was utilized to find the ideal parameters for each algorithm. Subsequently, all models’ performance was evaluated using the test dataset, comprising 20% of the overall patient data. The research findings revealed that the XGBoost algorithm emerged as the most effective approach, exhibiting a remarkable accuracy of 94.31%. Furthermore, the XGBoost model demonstrated exceptional F1-score, precision, and recall values, measuring 94.23%, 94.63%, and 94.31%, respectively. Building upon these promising results, we deployed the XGBoost model in a user-friendly web application leveraging Streamlit. This deployment ensures easy accessibility and usability of the model for the broader community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631 (2019)
Google Scholar
Alizargar, A., Chang, Y.L., Tan, T.H.: Performance comparison of machine learning approaches on hepatitis c prediction employing data mining techniques. Bioengineering 10(4), 481 (2023)
Article Google Scholar
Alotaibi, A., et al.: Explainable ensemble-based machine learning models for detecting the presence of cirrhosis in hepatitis c patients. Computation 11(6), 104 (2023)
Article Google Scholar
Anand, M.V., KiranBala, B., Srividhya, S., Younus, M., Rahman, H., et al.: Gaussian naïve bayes algorithm: A reliable technique involved in the assortment of the segregation in cancer. Mobile Information Systems 2022 (2022)
Google Scholar
Breiman, L.: Random forests. Machine Learn. 45, 5–32 (2001)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chen, L., Ji, P., Ma, Y.: Machine learning model for hepatitis c diagnosis customized to each patient. IEEE Access 10, 106655–106672 (2022)
Article Google Scholar
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Cohen, I., et al.: Pearson correlation coefficient. Noise reduction in speech processing, pp. 1–4 (2009)
Google Scholar
Gerber, M.A.: Pathology of hepatitis c. FEMS Microbiol. Rev. 14(3), 205–210 (1994)
Article Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Google Scholar
Hancock, J.T., Khoshgoftaar, T.M.: Catboost for big data: an interdisciplinary review. J. big data 7(1), 1–45 (2020)
Article Google Scholar
Kalra, A., Yetiskul, E., Wehrle, C.J., Tuma, F.: Physiology, liver (2018)
Google Scholar
Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Lichtinghagen, R., Klawonn, F., Hoffmann, G.: Hcv data data set. Available online:(accessed on 19 March 2023), UCI Machine Learning Repository (2020)
Google Scholar
Ma, L., Yang, Y., Ge, X., Wan, Y., Sang, X.: Prediction of disease progression of chronic hepatitis c based on xgboost algorithm. In: 2020 International Conference on Robots & Intelligent System (ICRIS), pp. 598–601. IEEE (2020)
Google Scholar
Marcellin, P., Asselah, T., Boyer, N.: Fibrosis and disease progression in hepatitis c. Hepatology 36(S1), S47–S56 (2002)
Article Google Scholar
Nandipati, S.C., XinYing, C., Wah, K.K.: Hepatitis c virus (hcv) prediction by machine learning techniques. Appl. Modell. Simul. 4, 89–100 (2020)
Google Scholar
Oladimeji, O.O., Oladimeji, A., Olayanju, O.: Machine learning models for diagnostic classification of hepatitis c tests. Front. Health Inform. 10(1), 70 (2021)
Article Google Scholar
Oleiwi, A.: Development of diagnostic decision making for chronic hepatitis c virus patients by various supervised predictive model. J. Adv. Res. Dyn. Control Syst. 12, 3113–3123 (10 2020)
Google Scholar
Organization, W.H., et al.: Global hepatitis report 2017: World health organization. Accessed Oct 23 2020 (2017)
Google Scholar
Organization, W.H., et al.: Hepatitis C rapid diagnostic tests for professional use and/or self-testing. World Health Organization (2022)
Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: Catboost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Raschka, S.: An overview of general performance metrics of binary classifier systems (2014)
Google Scholar
Safdari, R., Deghatipour, A., Gholamzadeh, M., Maghooli, K.: Applying data mining techniques to classify patients with suspected hepatitis c virus infection. Intell. Med. 2(04), 193–198 (2022)
Article Google Scholar
Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In: Sattar, A., Kang, B. (eds.) AI 2006: Advances in Artificial Intelligence, pp. 1015–1021. Springer Berlin Heidelberg, Berlin, Heidelberg (2006). https://doi.org/10.1007/11941439_114
Chapter Google Scholar
Zingaretti, C., De Francesco, R., Abrignani, S.: Why is it so difficult to develop a hepatitis c virus preventive vaccine? Clin. Microbiol. Infect. 20, 103–109 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

African Institute for Mathematical Sciences Cameroon, Crystal Gardens Limbe 608 South West, Limbe, Cameroon
Uriel Nguefack Yefou
African Institute for Mathematical Sciences Senegal, Km2 route de Joal (IRD Center), Mbour 1418 Thies, Mbour, Senegal
Pauline Ornela Megne Choudja, Binta Sow & Abduljaleel Adejumo

Authors

Uriel Nguefack Yefou
View author publications
You can also search for this author in PubMed Google Scholar
Pauline Ornela Megne Choudja
View author publications
You can also search for this author in PubMed Google Scholar
Binta Sow
View author publications
You can also search for this author in PubMed Google Scholar
Abduljaleel Adejumo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Uriel Nguefack Yefou .

Editor information

Editors and Affiliations

Ethiopian Artificial Intelligence Instit, Addis Adaba, Ethiopia
Taye Girma Debelee
HAWK University of Applied Sciences and Arts, Göttingen, Germany
Achim Ibenthal
Universität Ulm, Ulm, Germany
Friedhelm Schwenker
Ethiopian Artificial Intelligence Instit, Addis Ababa, Ethiopia
Yehualashet Megersa Ayano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yefou, U.N., Choudja, P.O.M., Sow, B., Adejumo, A. (2024). Optimized Machine Learning Models for Hepatitis C Prediction: Leveraging Optuna for Hyperparameter Tuning and Streamlit for Model Deployment. In: Debelee, T.G., Ibenthal, A., Schwenker, F., Megersa Ayano, Y. (eds) Pan-African Conference on Artificial Intelligence. PanAfriConAI 2023. Communications in Computer and Information Science, vol 2068. Springer, Cham. https://doi.org/10.1007/978-3-031-57624-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-57624-9_5
Published: 07 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57623-2
Online ISBN: 978-3-031-57624-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimized Machine Learning Models for Hepatitis C Prediction: Leveraging Optuna for Hyperparameter Tuning and Streamlit for Model Deployment