Abstract
Classifying biased datasets with linearly non-separable features has been a challenge in pattern recognition because traditional classifiers, usually biased and skewed towards the majority class, often produce sub-optimal results. However, if biased or unbalanced data is not processed appropriately, any information extracted from such data risks being compromised. Least Squares Support Vector Machines (LS-SVM) is known for its computational advantage over SVM, however, it suffers from the lack of sparsity of the support vectors: it learns the separating hyper-plane based on the whole dataset and often produces biased hyper-planes with imbalanced datasets. Motivated to contribute a novel approach for the supervised classification of imbalanced datasets, we propose Barricaded Boundary Minority Oversampling (BBMO) that oversamples the minority samples at the boundary in the direction of the closest majority samples to remove LS-SVM’s bias due to data imbalance. Two variations of BBMO are studied: BBMO1 for the linearly separable case which uses the Lagrange multipliers to extract boundary samples from both classes, and the generalized BBMO2 for the non-linear case which uses the kernel matrix to extract the closest majority samples to each minority sample. In either case, BBMO computes the weighted means as new synthetic minority samples and appends them to the dataset. Experiments on different synthetic and real-world datasets show that BBMO with LS-SVM improved on other methods in the literature and motivates follow on research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ajeeb, N., Nayal, A., Awad, M.: Minority svm for linearly separable imbalanced datasets. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–5. IEEE (2013)
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7
Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17 (2011)
Awad, M., Motai, Y., Näppi, J., Yoshida, H.: A clinical decision support framework for incremental polyps classification in virtual colonoscopy. Algorithms 3(1), 1–20 (2010)
Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008)
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines (2000)
Das, B.: Implementation of smoteboost algorithm used to handle class imbalance problem in data (2012). https://www.mathworks.com/matlabcentral/fileexchange/37311-smoteboost
Di Martino, M., Decia, F., Molinelli, J., Fernández, A.: Improving electric fraud detection using class imbalance strategies. In: ICPRAM (2), pp. 135–141 (2012)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the 7th International Conference on Information and Knowledge Management, pp. 148–155. ACM (1998)
Hajj, N., Awad, M.: Isolated handwriting recognition via multi-stage support vector machines. In: 6th IEEE International Conference on Intelligent Systems, pp. 152–157. IEEE (2012)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Imam, T., Ting, K.M., Kamruzzaman, J.: z-SVM: An SVM for improved classification of imbalanced data. In: Sattar, A., Kang, B. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 264–273. Springer, Heidelberg (2006). https://doi.org/10.1007/11941439_30
Khanna, R., Awad, M.: Efficient learning machines: theories, concepts, and applications for engineers and system designers. Apress (2015)
Köknar-Tezel, S., Latecki, L.J.: Improving svm classification on imbalanced data sets in distance spaces. In: 9th International Conference on Data Mining, pp. 259–267. IEEE (2009)
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
Kowalczyk, A., Raskutti, B.: One class svm for yeast regulation prediction. ACM SIGKDD Explor. Newsl. 4(2), 99–100 (2002)
Li, P., Chan, K.L., Fang, W.: Hybrid kernel machine ensemble for imbalanced data sets. In: 18th International Conference on Pattern Recognition, vol. 1, pp. 1108–1111. IEEE (2006)
Lichman, M.: UCI machine learning repository (2013)
Nayal, A., Jomaa, H., Awad, M.: Kerminsvm for imbalanced datasets with a case study on arabic comics classification. Eng. Appl. Artif. Intell. 59, 159–169 (2017)
Ou, Y.Y., Hung, H.G., Oyang, Y.J.: A study of supervised learning with multivariate analysis on unbalanced datasets. In: International Joint Conference on Neural Networks, pp. 2201–2205. IEEE (2006)
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMS: a case study. ACM Sigkdd Explor. Newsl. 6(1), 60–69 (2004)
Rizk, Y., Mitri, N., Awad, M.: An ordinal kernel trick for a computationally efficient support vector machine. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 3930–3937. IEEE (2014)
Rizk, Y., Partamian, H., Awad, M.: Toward real-time seismic feature analysis for bright spot detection: a distributed approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. (2017)
Saab, S.A., Mitri, N., Awad, M.: Ham or spam? a comparative study for some content-based classification algorithms for email filtering. In: 17th IEEE Mediterranean Electrotechnical Conference, pp. 339–343 (2014)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Stefanowski, J., Wilk, S.: Improving rule based classifiers induced by modlem by selective pre-processing of imbalanced data. In: Proceedings of the RSKD Workshop at ECML/PKDD, Warsaw, pp. 54–65. Citeseer (2007)
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMS modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 281–288 (2009)
Tax, D.M., Duin, R.P.: Support vector domain description. Pattern Recognit. Lett. 20(11), 1191–1199 (1999)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer science & business media, Berlin (2013)
Veropoulos, K., Campbell, C., Cristianini, N., et al.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55–60 (1999)
Wang, X., Matwin, S., Japkowicz, N., Liu, X.: Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In: Zaïane, O.R., Zilles, S. (eds.) AI 2013. LNCS (LNAI), vol. 7884, pp. 174–186. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38457-8_15
Wu, G., Chang, E.Y.: Adaptive feature-space conformal transformation for imbalanced-data learning. In: International Conference on Machine Learning, pp. 816–823 (2003)
Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: ICML 2003 workshop on learning from imbalanced data sets II, pp. 49–56. Washington (2003)
Wu, G., Chang, E.Y.: KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)
Yang, J., Bouzerdoum, A., Phung, S.L.: A training algorithm for sparse LS-SVM using compressive sampling. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 2054–2057. IEEE (2010)
Yang, P., Xu, L., Zhou, B.B., Zhang, Z., Zomaya, A.Y.: A particle swarm based hybrid system for imbalanced medical data sampling. BMC Genomics 10(3), S34 (2009)
Zhuang, L., Dai, H.: Parameter optimization of kernel-based one-class classifier on imbalance learning. J. Comput. 1(7), 32–40 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Partamian, H., Rizk, Y., Awad, M. (2018). Barricaded Boundary Minority Oversampling LS-SVM for a Biased Binary Classification. In: Soldatova, L., Vanschoren, J., Papadopoulos, G., Ceci, M. (eds) Discovery Science. DS 2018. Lecture Notes in Computer Science(), vol 11198. Springer, Cham. https://doi.org/10.1007/978-3-030-01771-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-01771-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01770-5
Online ISBN: 978-3-030-01771-2
eBook Packages: Computer ScienceComputer Science (R0)