Skip to main content

Dataset Complexity and Gene Expression Based Cancer Classification

  • Conference paper
Book cover Applications of Fuzzy Sets Theory (WILF 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4578))

Included in the following conference series:

Abstract

When applied to supervised classification problems, dataset complexity determines how difficult a given dataset to classify. Since complexity is a nontrivial issue, it is typically defined by a number of measures. In this paper, we explore complexity of three gene expression datasets used for two-class cancer classification. We demonstrate that estimating the dataset complexity before performing actual classification may provide a hint whether to apply a single best nearest neighbour classifier or an ensemble of nearest neighbour classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ho, T.K., Basu, M.: Complexity Measures of Supervised Classification Problems. IEEE Trans. Patt. Analysis and Machine Intell. 24, 289–300 (2002)

    Article  Google Scholar 

  2. Velculescu, V.E., Zhang, L., Vogelstein, B., Kinzler, K.W.: Serial Analysis of Gene Expression. Science 270, 484–487 (1995)

    Article  Google Scholar 

  3. http://lisp.vse.cz/challenge/ecmlpkdd2004

  4. Gandrillon, O.: Guide to the Gene Expression Data. In: Berka, P., Crémilleux, B. (eds.): Proc. the ECML/PKDD Discovery Challenge Workshop, Pisa, Italy, pp. 116–120 (2004)

    Google Scholar 

  5. http://microarray.princeton.edu/oncology/affydata/index.html

  6. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)

    Article  Google Scholar 

  7. http://www.broad.mit.edu/mpr/CNS/

  8. Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression. Nature 415, 436–442 (2002)

    Article  Google Scholar 

  9. Bø, T.H., Jonassen, I.: Feature Subset Selection Procedures for Classification of Expression Profiles. Genome Biology 3, 0017.1–0017.11 (2002)

    Google Scholar 

  10. Prodromidis, A.L., Stolfo, S., Chan, P.K.: Pruning Classifiers in a Distributed Meta-Learning System. In: Proc. the 1st Panhellenic Conf. New Inf. Technologie, Athens, Greece, pp. 151–160 (1998)

    Google Scholar 

  11. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Inc, Hoboken (2004)

    MATH  Google Scholar 

  12. Fawcett, T.: An Introduction to ROC Analysis. Patt. Recogn. Letters 27, 861–874 (2006)

    Article  Google Scholar 

  13. Zar, J.H.: Biostatistical Analysis. Prentice Hall Inc., Upper Saddle River, NJ (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Francesco Masulli Sushmita Mitra Gabriella Pasi

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Okun, O., Priisalu, H. (2007). Dataset Complexity and Gene Expression Based Cancer Classification. In: Masulli, F., Mitra, S., Pasi, G. (eds) Applications of Fuzzy Sets Theory. WILF 2007. Lecture Notes in Computer Science(), vol 4578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73400-0_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73400-0_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73399-7

  • Online ISBN: 978-3-540-73400-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics