Abstract
Data mining techniques are widely used in many fields. One of the applications of data mining in the field of the Bioinformatics is classification of tissue samples. In the present work, a wavelet power spectrum based approach has been presented for feature selection and successful classification of the multi class dataset. The proposed method was applied on SRBCT and the breast cancer datasets which are multi class cancer datasets. The selected features are almost those selected in previous works. The method was able to produce almost 100% accurate classification results. The method is very simple and robust to noise. No extensive preprocessing is required. The classification was performed with comparatively very lesser number of features than those used in the original works. No information is lost due to the initial pruning of the data usually performed using a threshold in other methods. The method utilizes the inherent nature of the data in performing various tasks. So, the method can be used for a wide range of data.
Similar content being viewed by others
References
Abramovich F, Bailey T, Sapatinas T (2000). Wavelet analysis and its statistical applications. JRSSD 48:1–30
Aldroubi A, Unser M (eds) (1996) Wavelets in medicine and biology. CRC Press, Boca Raton
Antonescu C, Peterson C, Meltzer P (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679
Azuaje F (2001) A computational neural approach to support the discovery of gene function and classes of cancer. IEEE Trans Biomed Eng 48:332–339
Azuaje F (2002) In silico approaches to microarray-based disease classification and gene function discovery. Annal Med 34(4):299–305
Bittner M, Meltzer P, Trent J (1999) Data analysis and integration of steps and arrows. Nat Genet 22:213–215
Chui CK (1992) An introduction to wavelets. Academic Press, Boston
Daubechies I (1992) Ten lectures on wavelets. Capital City Press, Montpelier, Vermont
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learning 40:139–157
Dougherty ER (2001) Small sample issues for microarray-based classification. Comp Funct Genom 2(1):28—34
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–4868
Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–53
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, SanFrancisco, USA, 121
Ian HW, Frank E (2005) Data mining: practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco
Kaplan I (2001) Spectral analysis and filtering with the wavelet transform. (http://www.bearcave.com/misl//misl_tech/wavelets/freq/index.html)
Khan J, Wei J, Ringner M, Saal L, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C, Peterson C, Meltzer P (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Medi 7:673–679
Lio P (2003) Wavelets in bioinformatics and computational biology: state of art and perspectives. Bioinformatics 19:2–9
Li T et al (2002) A survey on wavelet applications in data mining. SIGKDD Explor 4(2):49–68
Lobenhofer EK et al (2001) Progress in the application of DNA microarrays. Environ Health Prospect 109(9):881–889
Mallat S (1998) A wavelet tour of signal processing. Academic Press, San Diego
Perou CM et al (2000) Molecular portraits of human breast tumors. Nature 17:406(6797):747–752
Rifkin R, Mukherjee S, Tamayo P, Ramaswamy S, Yeang CH, Angelo M, Reich M, Poggio T, Lander ES, Golub TR, Mesirov JP (2003) An analytical method for multi-class molecular cancer classification. SIAM Rev 45:706–723
Southern E (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol 98:503–517
Strang G (1989) Wavelets and dilation equations: a brief introduction. SIAM Rev 31(4):614–627
Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinform 2:S75–S83
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6):2907–2912
Thomas JG, Olson JM, Tapscott S, Zhao LP (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11:1227–1236
Zhou X, Wang X, Dougherty ER (2004a) A Bayesian approach to nonlinear probit gene selection and classification. J Franklin Instit 341:137–156
Zhou X, Wang X, Doughety ER, Russ D, Suh E (2004b) Gene clustering based on mutual information. J Comput Biol 11(1):147–161
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Pierre Baldi.
Rights and permissions
About this article
Cite this article
Prabakaran, S., Sahu, R. & Verma, S. Classification of multi class dataset using wavelet power spectrum. Data Min Knowl Disc 15, 297–319 (2007). https://doi.org/10.1007/s10618-007-0068-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-007-0068-8