Multiple Gene Sets for Cancer Classification Using Gene Range Selection Based on Random Forest

Moorthy, Kohbalan; Bin Mohamad, Mohd Saberi; Deris, Safaai

doi:10.1007/978-3-642-36546-1_40

Kohbalan Moorthy²¹,
Mohd Saberi Bin Mohamad²¹ &
Safaai Deris²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7802))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

2075 Accesses
3 Citations

Abstract

The advancement of microarray technology allows obtaining genetic information from cancer patients, as computational data and cancer classification through computation software, has become possible. Through gene selection, we can identify certain numbers of informative genes that can be grouped into a smaller sets or subset of genes; which are informative genes taken from the initial data for the purpose of classification. In most available methods, the amount of genes selected in gene subsets are dependent on the gene selection technique used and cannot be fine-tuned to suit the requirement for particular number of genes. Hence, a proposed technique known as gene range selection based on a random forest method allows selective subset for better classification of cancer datasets. Our results indicate that various gene sets assist in increasing the overall classification accuracy of the cancer related datasets, as the amount of genes can be further scrutinized to create the best subset of genes. Moreover, it can assist the gene-filtering technique for further analysis of the microarray data in gene network analysis, gene-gene interaction analysis and many other related fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Paz, J.L., Seeberger, P.H.: Recent Advances and Future Challenges in Glycan Microarray Technology. In: Chevolot, Y. (ed.) Carbohydrate Microarrays, vol. 808, pp. 1–12. Humana Press (2012)
Google Scholar
Pham, T.D., Wells, C., Crane, D.I.: Analysis of Microarray Gene Expression Data. Current Bioinformatics 1, 37–53 (2006)
Article Google Scholar
Liew, A.W.-C., Law, N.-F., Yan, H.: Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Briefings in Bioinformatics 12, 498–513 (2011)
Article Google Scholar
Duval, B., Hao, J.-K.: Advances in metaheuristics for gene selection and classification of microarray data. Briefings in Bioinformatics 11, 127–141 (2010)
Article Google Scholar
Wu, D., Rice, C., Wang, X.: Cancer bioinformatics: A new approach to systems clinical medicine. BMC Bioinformatics 13, 71 (2012)
Article Google Scholar
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)
Article Google Scholar
Van Steen, K.: Travelling the world of gene–gene interactions. Briefings in Bioinformatics 13, 1–19 (2012)
Article Google Scholar
Hua, J., Tembe, W.D., Dougherty, E.R.: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn. 42, 409–424 (2009)
Article MATH Google Scholar
Wong, G., Leckie, C., Kowalczyk, A.: FSR: feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number. Bioinformatics 28, 151–159 (2012)
Article Google Scholar
Nanni, L., Brahnam, S., Lumini, A.: Combining multiple approaches for gene microarray classification. Bioinformatics 28, 1151–1157 (2012)
Article Google Scholar
Asyali, M.H., Colak, D., Demirkaya, O., Inan, M.S.: Gene Expression Profile Classification: A Review. Current Bioinformatics 1, 55–73 (2006)
Article Google Scholar
Lin, W.-J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Briefings in Bioinformatics (2012)
Google Scholar
Boulesteix, A.-L., Bender, A., Lorenzo Bermejo, J., Strobl, C.: Random forest Gini importance favours SNPs with large minor allele frequency: impact, sources and recommendations. Briefings in Bioinformatics 13, 292–304 (2012)
Article Google Scholar
Breiman, L.: Random Forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Article Google Scholar
Moorthy, K., Mohamad, M.S.: Random forest for gene selection and microarray data classification. Bioinformation 7, 142–146 (2011)
Article Google Scholar
Ramaswamy, S., Ross, K.N., Lander, E.S., Golub, T.R.: A molecular signature of metastasis in primary solid tumors. Nature Genetics 33, 49–54 (2003)
Article Google Scholar
van ’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Article Google Scholar
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 96, 6745–6750 (1999)
Article Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Article Google Scholar
Efron, B., Tibshirani, R.: Improvements on Cross-Validation: The .632+ Bootstrap Method. Journal of the American Statistical Association 92, 548–560 (1997)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence & Bioinformatics Research Group, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
Kohbalan Moorthy, Mohd Saberi Bin Mohamad & Safaai Deris

Authors

Kohbalan Moorthy
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Saberi Bin Mohamad
View author publications
You can also search for this author in PubMed Google Scholar
Safaai Deris
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer Science and Information Systems, Department of Software Engineering, Universiti Teknologi Malaysia, 81310, Johar Baharu, Johor, Malaysia
Ali Selamat & Habibollah Haron &
Institute of Informatics, Division of Knowledge Managements Systems, Wrocław University of Technology, Str. Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
Ngoc Thanh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moorthy, K., Bin Mohamad, M.S., Deris, S. (2013). Multiple Gene Sets for Cancer Classification Using Gene Range Selection Based on Random Forest. In: Selamat, A., Nguyen, N.T., Haron, H. (eds) Intelligent Information and Database Systems. ACIIDS 2013. Lecture Notes in Computer Science(), vol 7802. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36546-1_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-36546-1_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36545-4
Online ISBN: 978-3-642-36546-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics