A Tree-Based Approach to the Discovery of Diagnostic Biomarkers for Ovarian Cancer

Li, Jinyan; Ramamohanarao, Kotagiri

doi:10.1007/978-3-540-24775-3_80

A Tree-Based Approach to the Discovery of Diagnostic Biomarkers for Ovarian Cancer

Jinyan Li¹⁹ &
Kotagiri Ramamohanarao²⁰

Conference paper

2935 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3056))

Abstract

Computational diagnosis of cancer is a classification problem, and it has two special requirements on a learning algorithm: perfect accuracy and small number of features used in the classifier. This paper presents our results on an ovarian cancer data set. This data set is described by 15154 features, and consists of 253 samples. Each sample is referred to a woman who suffers from ovarian cancer or who does not have. In fact, the raw data is generated by the so-called mass spectrosmetry technology measuring the intensities of 15154 protein or peptide-features in a blood sample for every woman. The purpose is to identify a small subset of the features that can be used as biomarkers to separate the two classes of samples with high accuracy. Therefore, the identified features can be potentially used in routine clinical diagnosis for replacing labour-intensive and expensive conventional diagnosis methods. Our new tree-based method can achieve the perfect 100% accuracy in 10-fold cross validation on this data set. Meanwhile, this method also directly outputs a small set of biomarkers. Then we explain why support vector machines, naive bayes, and k-nearest neighbour cannot fulfill the purpose. This study is also aimed to elucidate the communication between contemporary cancer research and data mining techniques.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MATH MathSciNet Google Scholar
Breiman, L.: Random forest. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)
Article Google Scholar
Cortez, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–279 (1995)
Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21–27 (1967)
Article MATH Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–158 (2000)
Article Google Scholar
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. John Wiley & Sons, New York (1973)
MATH Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Bajcsy, R. (ed.) Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1029. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) ICML, Bari, Italy, July 1996, pp. 148–156. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Langley, P., Sage, S.: Induction of selective bayesian classifiers. In: Proceedings of the Tenth Conference on Uncertainty of Artificial Intelligence, pp. 399–406. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of ICDM, pp. 585–588. IEEE Computer Society, Los Alamitos (2003)
Google Scholar
Li, J., Liu, H., Ng, S.-K., Wong, L.: Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics 19, ii93–102 (2003)
Google Scholar
Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene and protein expression profiles. In: Genome Informatics 2002, Tokyo, Japan, pp. 51–60. Universal Academy Press, Washington DC (2002)
Google Scholar
Petricoin, E.F., MArdekani, A., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Wulfkuhle, J.D., Liotta, L.A., Petricoin, E.F.: Proteomic applications for the early detection of cancer. Nature Review: Cancer 3, 267–275 (2001)
Article Google Scholar
Yeoh, E.-J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.-H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore, 119613
Jinyan Li
Dept. of CSSE, The University of Melbourne, VIC, 3010, Australia
Kotagiri Ramamohanarao

Authors

Jinyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Kotagiri Ramamohanarao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering and Information Technology, Deakin University, VIC 3125, Australia
Honghua Dai
University of Illinois at Urbana-Champaign, 61801, Urbana, IL, USA
Ramakrishnan Srikant
Faculty of Engineering and Information Technology, Centre for Quantum Computation and Intelligent Systems, and Australian ACS National Committee for Artificial Intelligence, University of Technology, Sydney, Australia
Chengqi Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Ramamohanarao, K. (2004). A Tree-Based Approach to the Discovery of Diagnostic Biomarkers for Ovarian Cancer. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_80

Download citation

DOI: https://doi.org/10.1007/978-3-540-24775-3_80
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics