Skip to main content

Advertisement

Log in

RETRACTED ARTICLE: Significant directed walk framework to increase the accuracy of cancer classification using gene expression data

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

This article was retracted on 20 June 2022

This article has been updated

Abstract

Early diagnosis methods in cancer diagnosis studies are making great challenge as they require the involvement of different fields. Deoxyribonucleic acid (DNA) microarray analysis is one of the modern cancer diagnosis techniques used by scientists to measure the gene expression level changes in gene expression data. From the perspective of computing, an algorithm can be developed to identify more difficult cases. Numerous cancer studies have combined different machine learning techniques for the cancer diagnosis. This study is conducted to improve the cancer diagnosis technique, directed random walk (DRW) from the direction of framework. Improved directed random walk framework is proposed with the new introduced sub-algorithms, a larger directed graph and a different classifier. It is named as significant directed walk (SDW). In this study, six gene expression datasets are applied to study the effectiveness of the sub-algorithm, directed graph and classifier in SDW in terms of cancer prediction and cancer classification. Sub-algorithms of SDW can be further divided into data pre-processing phase, specific tuning parameter selection, weight as additional variable, and exclusion of unwanted adjacency matrix. Besides that, SDW also incorporated four directed graphs to study the usability of the directed graph. The best directed graph among the four is chosen to be part of the structure in SDW. The experimental results showed that the combination of SDW with walker network and linear regression is the best among all. SDW is achieves accuracy of 95.03% in average which is higher by 8.97% compare to conventional DRW for all cancer datasets. This study provides a foundation for further studies and research on early diagnosis of cancer with machine learning technique. It is found that these findings would improve the early diagnosis methods of cancer classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Change history

References

  • Anagaw A, Chang Y-L (2018) A new complement naïve Bayesian approach for biomedical data classification. J Ambient Intell Human Comput 10(10):3889–3897

    Article  Google Scholar 

  • Attanayake A, Jayasundara D, Peiris T (2016) An application of 5-fold cross validation on a binary logistic regression model. Adv Appl Stat 49(6):443–451

    MATH  Google Scholar 

  • Bhattacharjee A, Vishwakarma GK (2019) Time-course data prediction for repeatedly measured gene expression. Int J Biomath 12(04):1950033

    Article  MathSciNet  MATH  Google Scholar 

  • Buraczewski D, Dyszewski P (2018) Precise large deviations for random walk in random environment. Electron J Prob 23(114):1–26. https://doi.org/10.1214/18-EJP239

    Article  MathSciNet  MATH  Google Scholar 

  • Campos G, Pataki A, Pérez P (2013) The BGLR (Bayesian Generalized Linear Regression) R-Package [Internet]. Bglr.r-forge.r-project.org. https://bglr.r-forge.r-project.org/BGLR-tutorial.pdf. Accessed 4 Sept 2018

  • Choudum S (1986) A simple proof of the Erdos–Gallai theorem on graph sequences. Bull Aust Math Soc 33(01):67

    Article  MathSciNet  Google Scholar 

  • Codling EA, Plank MJ, Benhamou S (2008) Random walk models in biology. J R Soc Interface 5(25):813–834

    Article  Google Scholar 

  • D’Errico M, Rinaldis ED, Blasi MF, Viti V, Falchetti M, Calcagnile A, Sera F, Saieva C, Ottini L, Palli D, Palombo F, Giuliani A, Dogliotti E (2009) Genome-wide expression profile of sporadic gastric cancers with microsatellite instability. Eur J Cancer 45(3):461–469

    Article  Google Scholar 

  • Dai Y, Guo L, Li M, Chen Y (2012) Microarray Я US: a user-friendly graphical interface to Bioconductor tools that enables accurate microarray data analysis and expedites comprehensive functional analysis of microarray results. BMC Res Note 5(1):282

    Article  Google Scholar 

  • Dalgliesh GL, Furge K, Greenman C, Chen L, Bignell G, Butler A, Davies H, Edkins S, Hardy C, Latimer C, Teague J, Andrews J (2010) Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes. Nature 463(7279):360–363

    Article  Google Scholar 

  • Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, Georgescu C, Romero R (2007) A systems biology approach for pathway level analysis. Genome Res 17(10):1537–1545

    Article  Google Scholar 

  • Edgar R (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210

    Article  Google Scholar 

  • Fan K, Wen S, Deng Z (2019) Deep learning for detecting breast cancer metastases on WSI. In: Innovation in medicine and healthcare systems, and multimedia smart innovation, systems and technologies, pp 137–145

  • Gao X, Chen F, Song F, Jin Z (2009) Influence of feature weight on text categorization performance of Bayesian classifier. J Comput Appl 28(12):3080–3083

    MATH  Google Scholar 

  • Gibbons F, Roth F (2002) Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res 12(10):1574–1581

    Article  Google Scholar 

  • Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C, Topol EJ, Wang Q, Rao S (2005) Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinform 6(1):58

    Article  Google Scholar 

  • Ibrahim MA, Jassim S, Cawthorne MA, Langlands K (2011) A pathway-based gene selection method provides accurate disease classification. Int J Digital Soc 2(4):571–578

  • Indra P, Manikandan M (2020) Multilevel Tetrolet transform based breast cancer classifier and diagnosis system for healthcare applications. J Ambient Intel Human Comput

  • Jadamba E, Shin M (2014) A novel approach to significant pathway identification using pathway interaction network from PPI data. BioChip J 8(1):22–27

    Article  Google Scholar 

  • Jing LS, Shah FFM, Mohamad MS, Moorthy K, Deris S, Zakaria Z, Napis S (2015) A review on bioinformatics enrichment analysis tools towards functional analysis of high throughput gene set data. Curr Proteom 12(1):14–27

    Article  Google Scholar 

  • Johannes M, Frohlich H, Sultmann H, Beissbarth T (2011) pathClass: an R-package for integration of pathway knowledge into support vector machines for biomarker discovery. Bioinformatics 27(10):1442–1443

    Article  Google Scholar 

  • Jones J (2005) Gene signatures of progression and metastasis in renal cell cancer. Clin Cancer Res 11(16):5730–5739

    Article  Google Scholar 

  • Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30

    Article  Google Scholar 

  • Kang C, Huo Y, Xin L, Tian B, Yu B (2019) Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J Theor Biol 463:77–91

    Article  MathSciNet  MATH  Google Scholar 

  • Kegg Pathway: leukocyte transendothelial migration—Homo sapiens (human) (2017). Genome.jp. https://www.genome.jp/kegg-bin/show_pathway?hsa04670. Accessed 28 Jul 2019

  • Kim SY, Kim TR, Jeong H, Sohn K (2018) Integrative pathway-based survival prediction utilizing the interaction between gene expression and DNA methylation in breast cancer. BMC Med Genom 11(S3)

  • Landi MT, Dracheva T, Rotunno M, Figueroa JD, Liu H, Dasgupta A, Mann FE, Fukuoka J, Hames M, Bergen AW, Murphy SE, Yang P, Pesatori AC, Consonni D, Bertazzi PA, Wacholder S, Shih JH, Caporaso NE, Jen J (2008) Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS ONE 3(2)

  • Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D (2008) Inferring pathway activity toward precise disease classification. PLoS Comput Biol 4(11)

  • Li C, Li X, Miao Y, Wang Q, Jiang W, Xu C, Li J, Han J, Zhang F, Gong B, Xu L (2009) SubpathwayMiner: a software package for flexible identification of pathways. Nucleic Acids Res 37(19)

  • Liu W, Li C, Xu Y, Yang H, Yao Q, Han J, Shang D, Zhang C, Su F, Li X, Xiao Y, Zhang F, Dai M, Li X (2013) Topologically inferring risk-active pathways toward precise cancer classification by directed random walk. Bioinformatics 29(17):2169–2177

    Article  Google Scholar 

  • Liu J, Xu Y, Zheng C, Kong H, Lai Z (2015) RPCA-based tumor classification using gene expression data. IEEE/ACM Trans Comput Biol Bioinf 12(4):964–970

    Article  Google Scholar 

  • Meghanathan N (2015) Exploiting the discriminating power of the eigenvector centrality measure to detect graph isomorphism. Int J Found Comput Sci Technol 5(6):1–13

    Article  Google Scholar 

  • Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A, Pawitan Y, Hall P, Klaar S, Liu ET, Bergh J (2005) An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci 102(38):13550–13555

    Article  Google Scholar 

  • Misman MF, Mohamad MS, Deris S, Abdullah A, Hashim SZ (2011) An improved hybrid of SVM and SCAD for pathway analysis. Bioinformation 7(4):169–175

    Article  Google Scholar 

  • Montenegro R (2009) The simple random walk and max-degree walk on a directed graph. Random Struct Algorithms 34(3):395–407

    Article  MathSciNet  MATH  Google Scholar 

  • Ong HF, Mustapha N, Sulaiman MN (2011) Integrative gene selection for classification of microarray data. CIS Comput Inform Sci 4(2)

  • Paszkiewicz K, Studholme DJ (2011) High-throughput sequencing data analysis software: current state and future developments. Bioinformat Through Seq, pp 231–248

  • Pawitan Y, Bjöhle J, Amler L, Borg A-L, Egyhazi S, Hall P, Han X, Holmberg L, Huang F, Klaar S, Liu ET, Miller L, Nordgren H, Ploner P, Sandelin K, Shaw PM, Smeds J, Skoog L, Wedrén S, Bergh J (2005) Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res 7(6)

  • Polat K, Güneş S (2009) A new feature selection method on classification of medical datasets: Kernel F-score feature selection. Expert Syst Appl 36(7):10367–10373

    Article  Google Scholar 

  • Rami-Porta R, Goldstraw P (2010) Strength and weakness of the new TNM classification for lung cancer. Eur Respir J 36(2):237–239

    Article  Google Scholar 

  • Rehman MZ, Nawi NM, Tanveer A, Zafar H, Munir H, Hassan S (2019) Lungs cancer nodules detection from ct scan images with convolutional neural networks. In: Advances in intelligent systems and computing recent advances on soft computing and data mining, pp 382–391

  • Ren G, Liu Z (2012) NetCAD: a network analysis tool for coronary artery disease-associated PPI network. Bioinformatics 29(2):279–280

    Article  MathSciNet  Google Scholar 

  • Revathy N, Amalraj D (2011) Accurate cancer classification using expressions of very few genes. Int J Comput Appl 14(4):19–22

    Google Scholar 

  • Sarwar A, Suri J, Ali M, Sharma V (2016) Novel benchmark database of digitized and calibrated cervical cells for artificial intelligence based screening of cervical cancer. J Ambient Intell Human Comput 7(4):593–606

    Article  Google Scholar 

  • Seah CS, Kasim S, Mohamad MS (2017a) Specific tuning parameter for directed random walk algorithm cancer classification. Int J Adv Sci Eng Inf Technol 7(1)

  • Seah CS, Kasim S, Fudzee M, Mohamad M (2017b) A direct proof of significant directed random walk. IOP Conf Series Mater Sci Eng 235:012004

    Article  Google Scholar 

  • Seah C, Kasim S, Fudzee M, Ping JLZ, Mohamad M, Saedudin R, Ismail M (2017c) An enhanced topologically significant directed random walk in cancer classification using gene expression datasets. Saudi J Biol Sci 24(8):1828–1841

    Article  Google Scholar 

  • Seah CS, Kasim S, Fudzee MFM, Abdullah R, Atan R (2017d) Random walk from different perspective. Acta Electr Malaysia 1(2):26–27

    Article  Google Scholar 

  • Seah CS, Kasim S, Fudzee MF, Mohamad MS, Saedudin RR, Witarsyah D, Atan R (2018a) A direct proof of improved biased random walk with gastric cancer dataset. In: 2018 International conference on applied mathematics & computer science (ICAMCS)

  • Seah CS, Kasim S, Fudzee MF, Mohamad MS, Saedudin RR, Hassan R, Ismail AM, Atan R (2018b) An effective pre-processing phase for gene expression classification. Indo J Electr Eng Comput Sci 11(3):1223

    Google Scholar 

  • Seah C, Kasim S, Saedudin R, Fudzee M, Mohamad M, Hassan R, Ismail M (2019) Topologically significant directed random walk with applied walker network in cancer environment. Pakistan J Pharm Sci 32(3):1395–1408

    Google Scholar 

  • Štefka D, Holeňa M (2013) Performance of classification confidence measures in dynamic classifier systems. Neural Netw World 23(4):299–320

    Article  Google Scholar 

  • Stöppler MC (2019) 4 types of genetic diseases—symptoms, causes and human genome. https://www.medicinenet.com/genetic_disease/article.htm. Accessed 10 May 2019

  • Subat S, Mogushi K, Yasen M, Kohda T, Ishikawa Y, Tanaka H (2018) Identification of genes and pathways, including the CXCL2 axis, altered by DNA methylation in hepatocellular carcinoma. J Cancer Res Clin Oncol 145(3):675–684

    Article  Google Scholar 

  • Tripathi A, Venugopalan S, West DB (2010) A short constructive proof of the Erdős–Gallai characterization of graphic lists. Dis Math 310(4):843–844

  • Tsuchiya M, Parker JS, Kono H, Matsuda M, Fujii H, Rusyn I (2010) Gene expression in nontumoral liver tissue and recurrence-free survival in hepatitis C virus-positive hepatocellular carcinoma. Mol Cancer 9(1):74

    Article  Google Scholar 

  • Velsher L (2003) Genetic issues in the care of the adolescent patient. Paediatr Child Health 8(1):36–39

    Article  Google Scholar 

  • Wang W, Liu W (2018) Integration of gene interaction information into a reweighted random survival forest approach for accurate survival prediction and survival biomarker discovery. Sci Rep 8(1)

  • Wang X, Dalkic E, Wu M, Chan C (2008) Gene module level analysis: identification to networks and dynamics. Curr Opin Biotechnol 19(5):482–491

    Article  Google Scholar 

  • Wood A, Shpilrain V, Najarian K, Kahrobaei D (2019) Private naive bayes classification of personal biomedical data: application in cancer data analysis. Comput Biol Med 105:144–150

    Article  Google Scholar 

  • Wu J (2017) Feature selection for cancer classification using microarray gene expression data. Biostat Biomet Open Access J 1(2)

  • Yang S, Naiman DQ (2014) Multiclass cancer classification based on gene expression comparison. Stat Appli Genet Mole Biol

  • Yu K, Ganesan K, Tan LK, Laban M, Wu J, Zhao XD, Li H, Carol HWL, Zhu Y, Chia LW, Hooi SC, Miller L, Tan P (2008) A precisely regulated gene expression cassette potently modulates metastasis and survival in multiple solid cancers. PLoS Genet 4(7)

  • Zhang Q-L, Zhang G-L, Xiong Y, Li H-W, Guo J, Wang F, Deng X-Y, Chen J-Y, Wang Y-J, Lin L-B (2019) Genome-wide gene expression analysis reveals novel insights into the response to nitrite stress in gills of Branchiostoma belcheri. Chemosphere 218:609–615

    Article  Google Scholar 

Download references

Acknowledgements

This project is funded by Centre for Graduate Studies, Universiti Tun Hussein Onn Malaysia and Ministry of Higher Education Malaysia for supporting this research under the MYBRAIN15.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Choon Sen Seah.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s12652-022-04187-z

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seah, C.S., Kasim, S., Md. Fudzee, M.F. et al. RETRACTED ARTICLE: Significant directed walk framework to increase the accuracy of cancer classification using gene expression data. J Ambient Intell Human Comput 12, 7281–7298 (2021). https://doi.org/10.1007/s12652-020-02404-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02404-1

Keywords