Abstract
When applied to supervised classification problems, dataset complexity determines how difficult a given dataset to classify. Since complexity is a nontrivial issue, it is typically defined by a number of measures. In this paper, we explore complexity of three gene expression datasets used for two-class cancer classification. We demonstrate that estimating the dataset complexity before performing actual classification may provide a hint whether to apply a single best nearest neighbour classifier or an ensemble of nearest neighbour classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ho, T.K., Basu, M.: Complexity Measures of Supervised Classification Problems. IEEE Trans. Patt. Analysis and Machine Intell. 24, 289–300 (2002)
Velculescu, V.E., Zhang, L., Vogelstein, B., Kinzler, K.W.: Serial Analysis of Gene Expression. Science 270, 484–487 (1995)
Gandrillon, O.: Guide to the Gene Expression Data. In: Berka, P., Crémilleux, B. (eds.): Proc. the ECML/PKDD Discovery Challenge Workshop, Pisa, Italy, pp. 116–120 (2004)
http://microarray.princeton.edu/oncology/affydata/index.html
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)
Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression. Nature 415, 436–442 (2002)
Bø, T.H., Jonassen, I.: Feature Subset Selection Procedures for Classification of Expression Profiles. Genome Biology 3, 0017.1–0017.11 (2002)
Prodromidis, A.L., Stolfo, S., Chan, P.K.: Pruning Classifiers in a Distributed Meta-Learning System. In: Proc. the 1st Panhellenic Conf. New Inf. Technologie, Athens, Greece, pp. 151–160 (1998)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Inc, Hoboken (2004)
Fawcett, T.: An Introduction to ROC Analysis. Patt. Recogn. Letters 27, 861–874 (2006)
Zar, J.H.: Biostatistical Analysis. Prentice Hall Inc., Upper Saddle River, NJ (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Okun, O., Priisalu, H. (2007). Dataset Complexity and Gene Expression Based Cancer Classification. In: Masulli, F., Mitra, S., Pasi, G. (eds) Applications of Fuzzy Sets Theory. WILF 2007. Lecture Notes in Computer Science(), vol 4578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73400-0_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-73400-0_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73399-7
Online ISBN: 978-3-540-73400-0
eBook Packages: Computer ScienceComputer Science (R0)