Abstract
In this paper, the classification of the two binary bioinformatics datasets, leukemia and colon tumor, is further studied by using the recently developed neural network-based finite impulse response extreme learning machine (FIR-ELM). It is seen that a time series analysis of the microarray samples is first performed to determine the filtering properties of the hidden layer of the neural classifier with FIR-ELM for feature identification. The linear separability of the data patterns in the microarray datasets is then studied. For improving the robustness of the neural classifier against noise and errors, a frequency domain gene feature selection algorithm is also proposed. It is shown in the simulation results that the FIR-ELM algorithm has an excellent performance for the classification of bioinformatics data in comparison with many existing classification algorithms.
Similar content being viewed by others
References
Dudoit S, Fridlyand J (2002) Introduction to classification in microarray experiments. In: Berrar D, Dubitzky W, Granzow M (eds) A practical approach to microarray data analysis. Kluwer, Boston
Lu Y, Han J (2003) Cancer classification using gene expression data. Inform Syst 28(4):243–268
Huber W, Heydebreck AC, Vingron M (2003) Analysis of microarray gene expression data. In: Bishop M et al (eds) Handbook of statistical genetics. Wiley, Chichester
Misra J, Schmitt W, Hwang D, Hsiao L, Gullans S, Stephanopoulos G (2002) Interactive exploration of microarray gene expression patterns in a reduced dimensional space. Genome Res 12(7):1112–1120
Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. In: Berrar DP, Dubitzky W, Granzow M (eds) A practical approach to microarray data analysis. Kluwer, Norwell, pp 91–109
Liao X, Dasgupta N, Lin SM, Carin L (2002) ICA and PLS modelling for functional analysis and drug sensitivity for DNA microarray signals. In Proceedings of workshop on genomic signal processing and statistics
Chen A, Hsu J-C (2010) Exploring novel algorithms for the prediction of cancer classification. In: 2nd international conference on software engineering and data mining (SEDM), pp 378–383
Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinform 4(3):485–495
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C-H, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov J, Poggio T, Gerald W, Loda M, Lander E, Golub T (2002) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 98(26):15149–15154
Baboo D, Sasikala M (2010) Multicategory classification using support vector machine for microarray gene expression cancer diagnosis. Global J Comput Sci Technol
Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, New York
Abe S (2005) Support vector machines for pattern classification. Springer, London
Huang G-B, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501
Helmy T, Rasheed Z (2009) Multi-category bioinformatics dataset classification using extreme learning machine. Evolutionary computation, 2009. CEC ‘09. IEEE congress on, pp 3234–3240
Sanchez-Monedero J, Cruz-Ramirez M, Fernandez-Navarro F, Fernandez J, Gutierrez P, Hervas-Martinez C (2010) On the suitability of extreme learning machine for gene classification using feature selection. Intelligent systems design and applications (ISDA), 2010 10th international conference on, pp 507–512
Baboo S, Sasikala S (2010) Multicategory classification using an Extreme Learning Machine for microarray gene expression cancer diagnosis. Communication control and computing technologies (ICCCCT), 2010 IEEE international conference on, pp 748–757
Bharathi A, Natarajan A (2010) Microarray gene expression cancer diagnosis using machine learning algorithms. Signal and image processing (ICSIP), 2010 international conference on, pp 275–280
Man Z, Lee K, Wang D, Cao Z, Miao C (2011) A new robust training algorithm for a class of single-hidden layer feedforward neural networks. Neurocomputing 74(16):2491–2501
Diniz PSR, Silva EABD, Netto SL (2002) Digital signal processing system analysis and design. Cambridge University Press, Cambridge
Unger G, Chor B (2010) Linear separability of gene expression data sets. IEEE/ACM Trans Comput Biol Bioinform 7(2):375–381
Brody JP, Williams BA, Wold BJ, Quake SR (2002) Significance and statistical errors in the analysis of DNA microarray data. Proc Natl Acad Sci USA 99(20):12975–12978
Arce G, Li Y (2002) Median power and median correlation theory. IEEE Trans Signal Process 50(11):2768–2776
Salakhutdinov R (2009) Learning in Markov random fields using tempered transitions. In: Bengio Y, Schuurmans D, Lafferty J, Williams C, Culota A (eds) Advances in neural information processing systems, 22. MIT Press, Cambridge
Yang L, Yan H, Dong YX, Fei LY (2010) A kind of correlation classification distance of whole phase based on weight. Environmental science and information application technology (ESIAT), 2010 international conference on, 3: 668–671
Chatfield C (2004) The analysis of time series: an introduction. 6th Ed, Chapman and Hall
Ben-Dor A, Bruhn A, Friedman N, Nachman I, Schummer M, Yakhini Z (2000) Tissue classification with gene expression profiles. J Computational Biol 7(3/4):559–583
Mukherjee S, Tamayo P, Rogers S, Rifkin R, Engle A, Campbell C, Golub TR, Mesirov JP (2003) Estimating dataset size requirements for classifying DNA microarray data. J Comput Biol 10(2):119–142
Miche Y, Bas P, Jutten C, Simula O, Lendasse A (2008) A methodology for building regression models using extreme learning machine: OP-ELM. In: ESANN 2008, European symposium on artificial neural networks, Bruges, Belgium
Huang G-B, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17:879–892
Li J, Liu H (2004) Kent ridge bio-medical data set repository. School of Computer Engineering, Nanyang Technological University, Singapore, 2004. Online available: http://levis.tongji.edu.cn/gzli/data/mirror-kentridge.html
Sarhan AM (2009) Cancer classification based on microarray gene expression data using DCT and ANN. J Theoretical Appl Inform Technol (JATIT) 6(2):208–216
Ali AH (2008) Self-organization maps for prediction of kidney dysfunction. In Proceedings of 16th Telecommunications Forum TELFOR, Belgrade, Serbia
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, K., Man, Z., Wang, D. et al. Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis. Neural Comput & Applic 22, 457–468 (2013). https://doi.org/10.1007/s00521-012-0847-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-0847-z