Barricaded Boundary Minority Oversampling LS-SVM for a Biased Binary Classification

Partamian, Hmayag; Rizk, Yara; Awad, Mariette

doi:10.1007/978-3-030-01771-2_2

Hmayag Partamian¹⁷,
Yara Rizk¹⁷ &
Mariette Awad¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11198))

Included in the following conference series:

International Conference on Discovery Science

816 Accesses

Abstract

Classifying biased datasets with linearly non-separable features has been a challenge in pattern recognition because traditional classifiers, usually biased and skewed towards the majority class, often produce sub-optimal results. However, if biased or unbalanced data is not processed appropriately, any information extracted from such data risks being compromised. Least Squares Support Vector Machines (LS-SVM) is known for its computational advantage over SVM, however, it suffers from the lack of sparsity of the support vectors: it learns the separating hyper-plane based on the whole dataset and often produces biased hyper-planes with imbalanced datasets. Motivated to contribute a novel approach for the supervised classification of imbalanced datasets, we propose Barricaded Boundary Minority Oversampling (BBMO) that oversamples the minority samples at the boundary in the direction of the closest majority samples to remove LS-SVM’s bias due to data imbalance. Two variations of BBMO are studied: BBMO1 for the linearly separable case which uses the Lagrange multipliers to extract boundary samples from both classes, and the generalized BBMO2 for the non-linear case which uses the kernel matrix to extract the closest majority samples to each minority sample. In either case, BBMO computes the weighted means as new synthetic minority samples and appends them to the dataset. Experiments on different synthetic and real-world datasets show that BBMO with LS-SVM improved on other methods in the literature and motivates follow on research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ajeeb, N., Nayal, A., Awad, M.: Minority svm for linearly separable imbalanced datasets. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–5. IEEE (2013)
Google Scholar
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7
Chapter Google Scholar
Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17 (2011)
Google Scholar
Awad, M., Motai, Y., Näppi, J., Yoshida, H.: A clinical decision support framework for incremental polyps classification in virtual colonoscopy. Algorithms 3(1), 1–20 (2010)
Article Google Scholar
Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008)
Article Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43
Chapter Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines (2000)
Google Scholar
Das, B.: Implementation of smoteboost algorithm used to handle class imbalance problem in data (2012). https://www.mathworks.com/matlabcentral/fileexchange/37311-smoteboost
Di Martino, M., Decia, F., Molinelli, J., Fernández, A.: Improving electric fraud detection using class imbalance strategies. In: ICPRAM (2), pp. 135–141 (2012)
Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the 7th International Conference on Information and Knowledge Management, pp. 148–155. ACM (1998)
Google Scholar
Hajj, N., Awad, M.: Isolated handwriting recognition via multi-stage support vector machines. In: 6th IEEE International Conference on Intelligent Systems, pp. 152–157. IEEE (2012)
Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Imam, T., Ting, K.M., Kamruzzaman, J.: z-SVM: An SVM for improved classification of imbalanced data. In: Sattar, A., Kang, B. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 264–273. Springer, Heidelberg (2006). https://doi.org/10.1007/11941439_30
Chapter Google Scholar
Khanna, R., Awad, M.: Efficient learning machines: theories, concepts, and applications for engineers and system designers. Apress (2015)
Google Scholar
Köknar-Tezel, S., Latecki, L.J.: Improving svm classification on imbalanced data sets in distance spaces. In: 9th International Conference on Data Mining, pp. 259–267. IEEE (2009)
Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
Google Scholar
Kowalczyk, A., Raskutti, B.: One class svm for yeast regulation prediction. ACM SIGKDD Explor. Newsl. 4(2), 99–100 (2002)
Article Google Scholar
Li, P., Chan, K.L., Fang, W.: Hybrid kernel machine ensemble for imbalanced data sets. In: 18th International Conference on Pattern Recognition, vol. 1, pp. 1108–1111. IEEE (2006)
Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Nayal, A., Jomaa, H., Awad, M.: Kerminsvm for imbalanced datasets with a case study on arabic comics classification. Eng. Appl. Artif. Intell. 59, 159–169 (2017)
Article Google Scholar
Ou, Y.Y., Hung, H.G., Oyang, Y.J.: A study of supervised learning with multivariate analysis on unbalanced datasets. In: International Joint Conference on Neural Networks, pp. 2201–2205. IEEE (2006)
Google Scholar
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)
Article Google Scholar
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMS: a case study. ACM Sigkdd Explor. Newsl. 6(1), 60–69 (2004)
Article Google Scholar
Rizk, Y., Mitri, N., Awad, M.: An ordinal kernel trick for a computationally efficient support vector machine. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 3930–3937. IEEE (2014)
Google Scholar
Rizk, Y., Partamian, H., Awad, M.: Toward real-time seismic feature analysis for bright spot detection: a distributed approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. (2017)
Google Scholar
Saab, S.A., Mitri, N., Awad, M.: Ham or spam? a comparative study for some content-based classification algorithms for email filtering. In: 17th IEEE Mediterranean Electrotechnical Conference, pp. 339–343 (2014)
Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article Google Scholar
Stefanowski, J., Wilk, S.: Improving rule based classifiers induced by modlem by selective pre-processing of imbalanced data. In: Proceedings of the RSKD Workshop at ECML/PKDD, Warsaw, pp. 54–65. Citeseer (2007)
Google Scholar
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Article Google Scholar
Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMS modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 281–288 (2009)
Article Google Scholar
Tax, D.M., Duin, R.P.: Support vector domain description. Pattern Recognit. Lett. 20(11), 1191–1199 (1999)
Article Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer science & business media, Berlin (2013)
Google Scholar
Veropoulos, K., Campbell, C., Cristianini, N., et al.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55–60 (1999)
Google Scholar
Wang, X., Matwin, S., Japkowicz, N., Liu, X.: Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In: Zaïane, O.R., Zilles, S. (eds.) AI 2013. LNCS (LNAI), vol. 7884, pp. 174–186. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38457-8_15
Chapter Google Scholar
Wu, G., Chang, E.Y.: Adaptive feature-space conformal transformation for imbalanced-data learning. In: International Conference on Machine Learning, pp. 816–823 (2003)
Google Scholar
Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: ICML 2003 workshop on learning from imbalanced data sets II, pp. 49–56. Washington (2003)
Google Scholar
Wu, G., Chang, E.Y.: KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)
Article Google Scholar
Yang, J., Bouzerdoum, A., Phung, S.L.: A training algorithm for sparse LS-SVM using compressive sampling. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 2054–2057. IEEE (2010)
Google Scholar
Yang, P., Xu, L., Zhou, B.B., Zhang, Z., Zomaya, A.Y.: A particle swarm based hybrid system for imbalanced medical data sampling. BMC Genomics 10(3), S34 (2009)
Article Google Scholar
Zhuang, L., Dai, H.: Parameter optimization of kernel-based one-class classifier on imbalance learning. J. Comput. 1(7), 32–40 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon
Hmayag Partamian, Yara Rizk & Mariette Awad

Authors

Hmayag Partamian
View author publications
You can also search for this author in PubMed Google Scholar
Yara Rizk
View author publications
You can also search for this author in PubMed Google Scholar
Mariette Awad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hmayag Partamian .

Editor information

Editors and Affiliations

Goldsmiths University of London, London, UK
Larisa Soldatova
Eindhoven University of Technology, Eindhoven, The Netherlands
Joaquin Vanschoren
University of Cyprus, Nicosia, Cyprus
George Papadopoulos
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Partamian, H., Rizk, Y., Awad, M. (2018). Barricaded Boundary Minority Oversampling LS-SVM for a Biased Binary Classification. In: Soldatova, L., Vanschoren, J., Papadopoulos, G., Ceci, M. (eds) Discovery Science. DS 2018. Lecture Notes in Computer Science(), vol 11198. Springer, Cham. https://doi.org/10.1007/978-3-030-01771-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-01771-2_2
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01770-5
Online ISBN: 978-3-030-01771-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics