Abstract
Based on advances in statistical learning theory, Support Vector Machine (SVM) has demonstrated unique features and state-of-the-art performance in many real-world classification problems. However, conventional SVM utilizes a sign function to classify test data into different classes, which has shown some limitations that hinder its performance. This paper exploresthe feasibility of incorporating information theory-based approaches into SVM decision making process and demonstrated its application in the classification of imbalanced biological datasets. The results obtained indicated that by incorporating information theory-based technique, a significant improvement was achieved (p < 0.005), especially in the process of classification of imbalanced datasets. The proposed methodology not only can improve the overall prediction performance but also can make the classification with the SVM less sensitive to the selection of input parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1999)
Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning 20, 273–297 (1995)
Osuna, E., Freund, R., Girosi, F.: Training Support Vector Machines: An Application to Face Detection. In: Conf. Computer Vision and Pattern Recognition, pp. 130–136 (1997)
Li, Y., Bontcheva, K., Cunningham, H.: Using uneven images SVM and perceptron for information extraction. In: the 9th Conference on Computational Natural Language Learning, pp. 72–79 (June 2005)
Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lengauer, T., Muller, K.-R.: Engineering Support Vector Machine Kernels that Recognize Translation Initiation Sites. BioInformatics 16(9), 799–807 (2000)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46(1/3), 389–422 (2002)
Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Ares Jr., M., Haussler, D.: Knowledge-based Analysis of Microarray Gene Expression Data by Using Support Vector Machines. Proc. Natl. Acad. Sci. 97, 262–267 (2000)
Altman, R.B.: Challenge for Intelligent Systems in Biology. IEEE Intelligent Systems 40(2), 394–409 (2001)
Burges, C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A Training Algorithm for Optimal Margin Classifiers. In: Haussler, D. (ed.) 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 144–152. ACM Press, New York (1992)
Xie, Z., Hu, Q., Yu, D.: Fuzzy Output Support Vector Machines for Classification. In: The Proc. of international conference on advances in natural computation 2005, pp. 1190–1197 (2005)
Li, B., Hu, J., Hirasawa, K.: An Improved Support Vector Machine with Soft Decision-Making Boundary. In: the 26th IASTED International Conference on Artificial Intelligence and Application (2008)
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Blackshaw, S., Fraioli, R.E., Furukawa, T., Cepko, C.L.: Comprehensive Analysis of Photoreceptor Gene Expression and the Identification of Candidate Retinal Disease Genes. Cell 107, 579–589 (2001)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Wang, H., Zheng, H., Simpson, D., Zauaje, F.: Machine Learning Approaches to Supporting the Identification of Photoreceptor-enriched Genes Based on Expression Data. BMC Bioinformatics 7, 116 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H., Zheng, H. (2008). An Improved Support Vector Machine for the Classification of Imbalanced Biological Datasets. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues. ICIC 2008. Lecture Notes in Computer Science, vol 5226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87442-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-87442-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87440-9
Online ISBN: 978-3-540-87442-3
eBook Packages: Computer ScienceComputer Science (R0)