skip to main content
10.1145/3328833.3328849acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicsieConference Proceedingsconference-collections
research-article

The application of data mining techniques and feature selection methods in the risk classification of Egyptian liver cancer patients using clinical and genetic data

Published: 09 April 2019 Publication History

Abstract

Data mining techniques has shown great potential in biomedical and health care fields. The objective of this paper is to apply feature selection methods and data mining techniques to Egyptian liver cancer patients' data to predict their prognosis and extract important features that affect the patient's survivability. Genetic and Clinical data from 1541 patients were analyzed. Three feature selection methods and seven data mining techniques were studied and compared. Wrapper Subset method and Random Forest proved to be the best performing feature selection method and data mining technique respectively. Moreover, important genetic features such as p53 gene exon 6 and 9 mutations proved to have a significant impact on patient's overall prognosis.

References

[1]
Holah, Nanis S., et al. "Hepatocellular carcinoma in Egypt: epidemiological and histopathological properties." Menoufia Medical Journal 28.3 (2015): 718.
[2]
Wallace, Michael C., et al. "The evolving epidemiology of hepatocellular carcinoma: a global perspective." Expert review of gastroenterology & hepatology 9.6 (2015): 765--779.
[3]
Grandhi, Miral Sadaria, et al. "Hepatocellular carcinoma: from diagnosis to treatment." Surgical oncology 25.2 (2016): 74--85.
[4]
Forner A, Llovet JM, Bruix J. Hepatocellular carcinoma. Lancet. 2012;379(9822):1245--55
[5]
Croft, Peter, et al. "The science of clinical practice: disease diagnosis or patient prognosis? Evidence about "what is likely to happen" should shape clinical practice." BMC medicine13.1 (2015): 20.
[6]
Tseng, Wan-Ting, et al. "The application of data mining techniques to oral cancer prognosis." Journal of medical systems 39.5 (2015): 59
[7]
Chen, Yen-Chen, Wan-Chi Ke, and Hung-Wen Chiu. "Risk classification of cancer survival using ANN with gene expression data from multiple laboratories." Computers in biology and medicine 48 (2014): 1--7.
[8]
Park, Kanghee, et al. "Robust predictive model for evaluating breast cancer survivability." Engineering Applications of Artificial Intelligence 26.9 (2013): 2194--2205.
[9]
Li, Guo, et al. "Comparison of three data mining models for prediction of advanced schistosomiasis prognosis in the Hubei province." PLoS neglected tropical diseases 12.2 (2018): e0006262.
[10]
Cruz, Joseph A., and David S. Wishart. "Applications of machine learning in cancer prediction and prognosis." Cancer informatics 2 (2006): 117693510600200030
[11]
Ramírez-Gallego, Sergio, et al. "Data discretization: taxonomy and big data challenge." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6.1 (2016): 5--21.
[12]
Crockett, David, and Brian Eliason. "What is data mining in healthcare?." HealthCatalyst,{Online}. Available: https://www. healthcatalyst. com/data-mining-in-healthcare (2014).
[13]
Kourou, Konstantina, et al. "Machine learning applications in cancer prognosis and prediction." Computational and structural biotechnology journal 13 (2015): 8--17.
[14]
Rakhman, Arief, Goeij Yong Sun, and Rama Catur APP. "Building artificial Neural network Using Weka Software." Information System Department, Sepuluh Nopember Institute of Technology at Surabaya, Indonesia (2009).
[15]
Bui, Dieu Tien, et al. "Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree." Landslides 13.2 (2016): 361--378.
[16]
Keerthi, S. Sathiya, et al. "Improvements to Platt's SMO algorithm for SVM classifier design." Neural computation 13.3 (2001): 637--649.
[17]
Bhargava, Neeraj, et al. "Decision tree analysis on j48 algorithm for data mining." Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering 3.6 (2013).
[18]
Kalmegh, Sushilkumar. "Analysis of WEKA data mining algorithm REPTree, Simple CART and Random Tree for classification of Indian news." International Journal of Innovative Science, Engineering and Technology 2.2 (2015): 438--46.
[19]
Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5--32.
[20]
Cohen, William W. "Fast effective rule induction." Machine Learning Proceedings 1995. 1995. 115--123.
[21]
Daud, Nor Ridzuan, and David Wolfe Corne. "Human readable rule induction in medical data mining." Proceedings of the European Computing Conference. Springer, Boston, MA, 2009.
[22]
Frank, Eibe, and Ian H. Witten. "Generating accurate rule sets without global optimization." (1998).

Cited By

View all
  • (2024)A Machine Learning‐Based Framework for Accurate and Early Diagnosis of Liver Diseases: A Comprehensive Study on Feature Selection, Data Imbalance, and Algorithmic PerformanceInternational Journal of Intelligent Systems10.1155/2024/61113122024:1Online publication date: 28-Jun-2024

Index Terms

  1. The application of data mining techniques and feature selection methods in the risk classification of Egyptian liver cancer patients using clinical and genetic data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICSIE '19: Proceedings of the 8th International Conference on Software and Information Engineering
      April 2019
      276 pages
      ISBN:9781450361057
      DOI:10.1145/3328833
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 April 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Data mining
      2. Medical informatics
      3. cancer prognosis
      4. feature selection
      5. liver cancer
      6. machine learning

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICSIE '19

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Machine Learning‐Based Framework for Accurate and Early Diagnosis of Liver Diseases: A Comprehensive Study on Feature Selection, Data Imbalance, and Algorithmic PerformanceInternational Journal of Intelligent Systems10.1155/2024/61113122024:1Online publication date: 28-Jun-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media