Skip to main content

Advertisement

Log in

Classification of multi class dataset using wavelet power spectrum

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Data mining techniques are widely used in many fields. One of the applications of data mining in the field of the Bioinformatics is classification of tissue samples. In the present work, a wavelet power spectrum based approach has been presented for feature selection and successful classification of the multi class dataset. The proposed method was applied on SRBCT and the breast cancer datasets which are multi class cancer datasets. The selected features are almost those selected in previous works. The method was able to produce almost 100% accurate classification results. The method is very simple and robust to noise. No extensive preprocessing is required. The classification was performed with comparatively very lesser number of features than those used in the original works. No information is lost due to the initial pruning of the data usually performed using a threshold in other methods. The method utilizes the inherent nature of the data in performing various tasks. So, the method can be used for a wide range of data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abramovich F, Bailey T, Sapatinas T (2000). Wavelet analysis and its statistical applications. JRSSD 48:1–30

    Article  Google Scholar 

  • Aldroubi A, Unser M (eds) (1996) Wavelets in medicine and biology. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Antonescu C, Peterson C, Meltzer P (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679

    Article  Google Scholar 

  • Azuaje F (2001) A computational neural approach to support the discovery of gene function and classes of cancer. IEEE Trans Biomed Eng 48:332–339

    Article  Google Scholar 

  • Azuaje F (2002) In silico approaches to microarray-based disease classification and gene function discovery. Annal Med 34(4):299–305

    Article  Google Scholar 

  • Bittner M, Meltzer P, Trent J (1999) Data analysis and integration of steps and arrows. Nat Genet 22:213–215

    Article  Google Scholar 

  • Chui CK (1992) An introduction to wavelets. Academic Press, Boston

    MATH  Google Scholar 

  • Daubechies I (1992) Ten lectures on wavelets. Capital City Press, Montpelier, Vermont

    MATH  Google Scholar 

  • Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learning 40:139–157

    Article  Google Scholar 

  • Dougherty ER (2001) Small sample issues for microarray-based classification. Comp Funct Genom 2(1):28—34

    Article  Google Scholar 

  • Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–4868

    Article  Google Scholar 

  • Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–53

    Article  Google Scholar 

  • Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, SanFrancisco, USA, 121

  • Ian HW, Frank E (2005) Data mining: practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Kaplan I (2001) Spectral analysis and filtering with the wavelet transform. (http://www.bearcave.com/misl//misl_tech/wavelets/freq/index.html)

  • Khan J, Wei J, Ringner M, Saal L, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C, Peterson C, Meltzer P (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Medi 7:673–679

    Article  Google Scholar 

  • Lio P (2003) Wavelets in bioinformatics and computational biology: state of art and perspectives. Bioinformatics 19:2–9

    Article  Google Scholar 

  • Li T et al (2002) A survey on wavelet applications in data mining. SIGKDD Explor 4(2):49–68

    Article  Google Scholar 

  • Lobenhofer EK et al (2001) Progress in the application of DNA microarrays. Environ Health Prospect 109(9):881–889

    Article  Google Scholar 

  • Mallat S (1998) A wavelet tour of signal processing. Academic Press, San Diego

    MATH  Google Scholar 

  • Perou CM et al (2000) Molecular portraits of human breast tumors. Nature 17:406(6797):747–752

    Google Scholar 

  • Rifkin R, Mukherjee S, Tamayo P, Ramaswamy S, Yeang CH, Angelo M, Reich M, Poggio T, Lander ES, Golub TR, Mesirov JP (2003) An analytical method for multi-class molecular cancer classification. SIAM Rev 45:706–723

    Article  MATH  MathSciNet  Google Scholar 

  • Southern E (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol 98:503–517

    Article  Google Scholar 

  • Strang G (1989) Wavelets and dilation equations: a brief introduction. SIAM Rev 31(4):614–627

    Article  MATH  MathSciNet  Google Scholar 

  • Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinform 2:S75–S83

    Google Scholar 

  • Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6):2907–2912

    Article  Google Scholar 

  • Thomas JG, Olson JM, Tapscott S, Zhao LP (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11:1227–1236

    Article  Google Scholar 

  • Zhou X, Wang X, Dougherty ER (2004a) A Bayesian approach to nonlinear probit gene selection and classification. J Franklin Instit 341:137–156

    Article  MATH  MathSciNet  Google Scholar 

  • Zhou X, Wang X, Doughety ER, Russ D, Suh E (2004b) Gene clustering based on mutual information. J Comput Biol 11(1):147–161

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Prabakaran.

Additional information

Communicated by Pierre Baldi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prabakaran, S., Sahu, R. & Verma, S. Classification of multi class dataset using wavelet power spectrum. Data Min Knowl Disc 15, 297–319 (2007). https://doi.org/10.1007/s10618-007-0068-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-007-0068-8

Keywords

Navigation