Abstract
Early diagnosis methods in cancer diagnosis studies are making great challenge as they require the involvement of different fields. Deoxyribonucleic acid (DNA) microarray analysis is one of the modern cancer diagnosis techniques used by scientists to measure the gene expression level changes in gene expression data. From the perspective of computing, an algorithm can be developed to identify more difficult cases. Numerous cancer studies have combined different machine learning techniques for the cancer diagnosis. This study is conducted to improve the cancer diagnosis technique, directed random walk (DRW) from the direction of framework. Improved directed random walk framework is proposed with the new introduced sub-algorithms, a larger directed graph and a different classifier. It is named as significant directed walk (SDW). In this study, six gene expression datasets are applied to study the effectiveness of the sub-algorithm, directed graph and classifier in SDW in terms of cancer prediction and cancer classification. Sub-algorithms of SDW can be further divided into data pre-processing phase, specific tuning parameter selection, weight as additional variable, and exclusion of unwanted adjacency matrix. Besides that, SDW also incorporated four directed graphs to study the usability of the directed graph. The best directed graph among the four is chosen to be part of the structure in SDW. The experimental results showed that the combination of SDW with walker network and linear regression is the best among all. SDW is achieves accuracy of 95.03% in average which is higher by 8.97% compare to conventional DRW for all cancer datasets. This study provides a foundation for further studies and research on early diagnosis of cancer with machine learning technique. It is found that these findings would improve the early diagnosis methods of cancer classification.















Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Change history
20 June 2022
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s12652-022-04187-z
References
Anagaw A, Chang Y-L (2018) A new complement naïve Bayesian approach for biomedical data classification. J Ambient Intell Human Comput 10(10):3889–3897
Attanayake A, Jayasundara D, Peiris T (2016) An application of 5-fold cross validation on a binary logistic regression model. Adv Appl Stat 49(6):443–451
Bhattacharjee A, Vishwakarma GK (2019) Time-course data prediction for repeatedly measured gene expression. Int J Biomath 12(04):1950033
Buraczewski D, Dyszewski P (2018) Precise large deviations for random walk in random environment. Electron J Prob 23(114):1–26. https://doi.org/10.1214/18-EJP239
Campos G, Pataki A, Pérez P (2013) The BGLR (Bayesian Generalized Linear Regression) R-Package [Internet]. Bglr.r-forge.r-project.org. https://bglr.r-forge.r-project.org/BGLR-tutorial.pdf. Accessed 4 Sept 2018
Choudum S (1986) A simple proof of the Erdos–Gallai theorem on graph sequences. Bull Aust Math Soc 33(01):67
Codling EA, Plank MJ, Benhamou S (2008) Random walk models in biology. J R Soc Interface 5(25):813–834
D’Errico M, Rinaldis ED, Blasi MF, Viti V, Falchetti M, Calcagnile A, Sera F, Saieva C, Ottini L, Palli D, Palombo F, Giuliani A, Dogliotti E (2009) Genome-wide expression profile of sporadic gastric cancers with microsatellite instability. Eur J Cancer 45(3):461–469
Dai Y, Guo L, Li M, Chen Y (2012) Microarray Я US: a user-friendly graphical interface to Bioconductor tools that enables accurate microarray data analysis and expedites comprehensive functional analysis of microarray results. BMC Res Note 5(1):282
Dalgliesh GL, Furge K, Greenman C, Chen L, Bignell G, Butler A, Davies H, Edkins S, Hardy C, Latimer C, Teague J, Andrews J (2010) Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes. Nature 463(7279):360–363
Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, Georgescu C, Romero R (2007) A systems biology approach for pathway level analysis. Genome Res 17(10):1537–1545
Edgar R (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210
Fan K, Wen S, Deng Z (2019) Deep learning for detecting breast cancer metastases on WSI. In: Innovation in medicine and healthcare systems, and multimedia smart innovation, systems and technologies, pp 137–145
Gao X, Chen F, Song F, Jin Z (2009) Influence of feature weight on text categorization performance of Bayesian classifier. J Comput Appl 28(12):3080–3083
Gibbons F, Roth F (2002) Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res 12(10):1574–1581
Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C, Topol EJ, Wang Q, Rao S (2005) Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinform 6(1):58
Ibrahim MA, Jassim S, Cawthorne MA, Langlands K (2011) A pathway-based gene selection method provides accurate disease classification. Int J Digital Soc 2(4):571–578
Indra P, Manikandan M (2020) Multilevel Tetrolet transform based breast cancer classifier and diagnosis system for healthcare applications. J Ambient Intel Human Comput
Jadamba E, Shin M (2014) A novel approach to significant pathway identification using pathway interaction network from PPI data. BioChip J 8(1):22–27
Jing LS, Shah FFM, Mohamad MS, Moorthy K, Deris S, Zakaria Z, Napis S (2015) A review on bioinformatics enrichment analysis tools towards functional analysis of high throughput gene set data. Curr Proteom 12(1):14–27
Johannes M, Frohlich H, Sultmann H, Beissbarth T (2011) pathClass: an R-package for integration of pathway knowledge into support vector machines for biomarker discovery. Bioinformatics 27(10):1442–1443
Jones J (2005) Gene signatures of progression and metastasis in renal cell cancer. Clin Cancer Res 11(16):5730–5739
Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Kang C, Huo Y, Xin L, Tian B, Yu B (2019) Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J Theor Biol 463:77–91
Kegg Pathway: leukocyte transendothelial migration—Homo sapiens (human) (2017). Genome.jp. https://www.genome.jp/kegg-bin/show_pathway?hsa04670. Accessed 28 Jul 2019
Kim SY, Kim TR, Jeong H, Sohn K (2018) Integrative pathway-based survival prediction utilizing the interaction between gene expression and DNA methylation in breast cancer. BMC Med Genom 11(S3)
Landi MT, Dracheva T, Rotunno M, Figueroa JD, Liu H, Dasgupta A, Mann FE, Fukuoka J, Hames M, Bergen AW, Murphy SE, Yang P, Pesatori AC, Consonni D, Bertazzi PA, Wacholder S, Shih JH, Caporaso NE, Jen J (2008) Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS ONE 3(2)
Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D (2008) Inferring pathway activity toward precise disease classification. PLoS Comput Biol 4(11)
Li C, Li X, Miao Y, Wang Q, Jiang W, Xu C, Li J, Han J, Zhang F, Gong B, Xu L (2009) SubpathwayMiner: a software package for flexible identification of pathways. Nucleic Acids Res 37(19)
Liu W, Li C, Xu Y, Yang H, Yao Q, Han J, Shang D, Zhang C, Su F, Li X, Xiao Y, Zhang F, Dai M, Li X (2013) Topologically inferring risk-active pathways toward precise cancer classification by directed random walk. Bioinformatics 29(17):2169–2177
Liu J, Xu Y, Zheng C, Kong H, Lai Z (2015) RPCA-based tumor classification using gene expression data. IEEE/ACM Trans Comput Biol Bioinf 12(4):964–970
Meghanathan N (2015) Exploiting the discriminating power of the eigenvector centrality measure to detect graph isomorphism. Int J Found Comput Sci Technol 5(6):1–13
Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A, Pawitan Y, Hall P, Klaar S, Liu ET, Bergh J (2005) An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci 102(38):13550–13555
Misman MF, Mohamad MS, Deris S, Abdullah A, Hashim SZ (2011) An improved hybrid of SVM and SCAD for pathway analysis. Bioinformation 7(4):169–175
Montenegro R (2009) The simple random walk and max-degree walk on a directed graph. Random Struct Algorithms 34(3):395–407
Ong HF, Mustapha N, Sulaiman MN (2011) Integrative gene selection for classification of microarray data. CIS Comput Inform Sci 4(2)
Paszkiewicz K, Studholme DJ (2011) High-throughput sequencing data analysis software: current state and future developments. Bioinformat Through Seq, pp 231–248
Pawitan Y, Bjöhle J, Amler L, Borg A-L, Egyhazi S, Hall P, Han X, Holmberg L, Huang F, Klaar S, Liu ET, Miller L, Nordgren H, Ploner P, Sandelin K, Shaw PM, Smeds J, Skoog L, Wedrén S, Bergh J (2005) Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res 7(6)
Polat K, Güneş S (2009) A new feature selection method on classification of medical datasets: Kernel F-score feature selection. Expert Syst Appl 36(7):10367–10373
Rami-Porta R, Goldstraw P (2010) Strength and weakness of the new TNM classification for lung cancer. Eur Respir J 36(2):237–239
Rehman MZ, Nawi NM, Tanveer A, Zafar H, Munir H, Hassan S (2019) Lungs cancer nodules detection from ct scan images with convolutional neural networks. In: Advances in intelligent systems and computing recent advances on soft computing and data mining, pp 382–391
Ren G, Liu Z (2012) NetCAD: a network analysis tool for coronary artery disease-associated PPI network. Bioinformatics 29(2):279–280
Revathy N, Amalraj D (2011) Accurate cancer classification using expressions of very few genes. Int J Comput Appl 14(4):19–22
Sarwar A, Suri J, Ali M, Sharma V (2016) Novel benchmark database of digitized and calibrated cervical cells for artificial intelligence based screening of cervical cancer. J Ambient Intell Human Comput 7(4):593–606
Seah CS, Kasim S, Mohamad MS (2017a) Specific tuning parameter for directed random walk algorithm cancer classification. Int J Adv Sci Eng Inf Technol 7(1)
Seah CS, Kasim S, Fudzee M, Mohamad M (2017b) A direct proof of significant directed random walk. IOP Conf Series Mater Sci Eng 235:012004
Seah C, Kasim S, Fudzee M, Ping JLZ, Mohamad M, Saedudin R, Ismail M (2017c) An enhanced topologically significant directed random walk in cancer classification using gene expression datasets. Saudi J Biol Sci 24(8):1828–1841
Seah CS, Kasim S, Fudzee MFM, Abdullah R, Atan R (2017d) Random walk from different perspective. Acta Electr Malaysia 1(2):26–27
Seah CS, Kasim S, Fudzee MF, Mohamad MS, Saedudin RR, Witarsyah D, Atan R (2018a) A direct proof of improved biased random walk with gastric cancer dataset. In: 2018 International conference on applied mathematics & computer science (ICAMCS)
Seah CS, Kasim S, Fudzee MF, Mohamad MS, Saedudin RR, Hassan R, Ismail AM, Atan R (2018b) An effective pre-processing phase for gene expression classification. Indo J Electr Eng Comput Sci 11(3):1223
Seah C, Kasim S, Saedudin R, Fudzee M, Mohamad M, Hassan R, Ismail M (2019) Topologically significant directed random walk with applied walker network in cancer environment. Pakistan J Pharm Sci 32(3):1395–1408
Štefka D, Holeňa M (2013) Performance of classification confidence measures in dynamic classifier systems. Neural Netw World 23(4):299–320
Stöppler MC (2019) 4 types of genetic diseases—symptoms, causes and human genome. https://www.medicinenet.com/genetic_disease/article.htm. Accessed 10 May 2019
Subat S, Mogushi K, Yasen M, Kohda T, Ishikawa Y, Tanaka H (2018) Identification of genes and pathways, including the CXCL2 axis, altered by DNA methylation in hepatocellular carcinoma. J Cancer Res Clin Oncol 145(3):675–684
Tripathi A, Venugopalan S, West DB (2010) A short constructive proof of the Erdős–Gallai characterization of graphic lists. Dis Math 310(4):843–844
Tsuchiya M, Parker JS, Kono H, Matsuda M, Fujii H, Rusyn I (2010) Gene expression in nontumoral liver tissue and recurrence-free survival in hepatitis C virus-positive hepatocellular carcinoma. Mol Cancer 9(1):74
Velsher L (2003) Genetic issues in the care of the adolescent patient. Paediatr Child Health 8(1):36–39
Wang W, Liu W (2018) Integration of gene interaction information into a reweighted random survival forest approach for accurate survival prediction and survival biomarker discovery. Sci Rep 8(1)
Wang X, Dalkic E, Wu M, Chan C (2008) Gene module level analysis: identification to networks and dynamics. Curr Opin Biotechnol 19(5):482–491
Wood A, Shpilrain V, Najarian K, Kahrobaei D (2019) Private naive bayes classification of personal biomedical data: application in cancer data analysis. Comput Biol Med 105:144–150
Wu J (2017) Feature selection for cancer classification using microarray gene expression data. Biostat Biomet Open Access J 1(2)
Yang S, Naiman DQ (2014) Multiclass cancer classification based on gene expression comparison. Stat Appli Genet Mole Biol
Yu K, Ganesan K, Tan LK, Laban M, Wu J, Zhao XD, Li H, Carol HWL, Zhu Y, Chia LW, Hooi SC, Miller L, Tan P (2008) A precisely regulated gene expression cassette potently modulates metastasis and survival in multiple solid cancers. PLoS Genet 4(7)
Zhang Q-L, Zhang G-L, Xiong Y, Li H-W, Guo J, Wang F, Deng X-Y, Chen J-Y, Wang Y-J, Lin L-B (2019) Genome-wide gene expression analysis reveals novel insights into the response to nitrite stress in gills of Branchiostoma belcheri. Chemosphere 218:609–615
Acknowledgements
This project is funded by Centre for Graduate Studies, Universiti Tun Hussein Onn Malaysia and Ministry of Higher Education Malaysia for supporting this research under the MYBRAIN15.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s12652-022-04187-z
About this article
Cite this article
Seah, C.S., Kasim, S., Md. Fudzee, M.F. et al. RETRACTED ARTICLE: Significant directed walk framework to increase the accuracy of cancer classification using gene expression data. J Ambient Intell Human Comput 12, 7281–7298 (2021). https://doi.org/10.1007/s12652-020-02404-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02404-1