ABSTRACT
Classification is a quintessential application of machine learning for which support vector machines have been used ubiquitously because of their optimal margins and ease of use. However, they’re rarely used for large datasets due to the cubic time complexity of their training process. This has inspired several papers attempting to reduce the number of features or the number of training samples to lessen the training time of the SVMs. This paper aims to propose a novel approach for reducing the number of training samples for support vector data description (SVDD) while attempting to maximize the knowledge of the target class by selecting the most promising candidates for support vectors, which are the farthest boundary points of the data clusters. The proposed algorithm utilizes the density gradient across the data distribution to uniformly detect the boundary points, which are sampled as potential support vectors to train the support vector machines in a smaller amount of time without significant loss in accuracy. The proposed algorithm is verified via tests conducted on Human Activity Recognition, Breast Cancer Detection, and Heart Disease Detection Datasets.
- [1] Minter, T. C. (1975, January). Single-class classification. In LARS Symposia (p. 54).Google Scholar
- [2] Koch, M. W., Moya, M. M., Hostetler, L. D., & Fogler, R. J. (1995). Cueing, feature discovery, and one-class learning for synthetic aperture radar automatic target recognition. Neural Networks, 8(7-8), 1081-1102.Google ScholarDigital Library
- [3]GeeksForGeeks (2022, Jan 04). Parzen Windows density estimation technique. GeeksForGeeks. https://www.geeksforgeeks.org/parzen-windows-density-estimation-technique/Google Scholar
- [4] Désir, C., Bernard, S., Petitjean, C., & Heutte, L. (2012, October). A random forest based approach for one class classification in medical imaging. In International Workshop on Machine Learning in Medical Imaging (pp. 250-257). Springer, Berlin, Heidelberg.Google ScholarCross Ref
- [5] Schölkopf, B., Williamson, R. C., Smola, A., Shawe-Taylor, J., & Platt, J. (1999). Support vector method for novelty detection. Advances in neural information processing systems, 12.Google Scholar
- [6] Hao, P. Y. (2008). Fuzzy one-class support vector machines. Fuzzy Sets and Systems, 159(18), 2317-2336.Google ScholarDigital Library
- [7] Ji, M., & Xing, H. J. (2017, May). Adaptive-weighted one-class support vector machine for outlier detection. In 2017 29th Chinese Control and Decision Conference (CCDC) (pp. 1766-1771). IEEE.Google ScholarCross Ref
- [8] Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural computation, 13(7), 1443-1471.Google Scholar
- [9] Tax, D. M., & Duin, R. P. (1999). Support vector domain description. Pattern recognition letters, 20(11-13), 1191-1199.Google Scholar
- [10] Xing, H. J., & Liu, W. T. (2020). Robust AdaBoost based ensemble of one-class support vector machines. Information Fusion, 55, 45-58.Google ScholarCross Ref
- [11] Zhu, F., Yang, J., Gao, C., Xu, S., Ye, N., & Yin, T. (2016). A weighted one-class support vector machine. Neurocomputing, 189, 1-10.Google ScholarDigital Library
- [12] Rohith Gandhi (2018, Jun 7). Support Vector Machine — Introduction to Machine Learning Algorithms. Towards Data Science. https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47Google Scholar
- [13] Liu, Y. G., Chen, Q., & Yu, R. Z. (2003, November). Extract candidates of support vector from training set. In Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 03EX693) (Vol. 5, pp. 3199-3202). IEEE.Google Scholar
- [14] Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines.Google Scholar
- [15] Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3), 1-27.Google ScholarDigital Library
- [16] Alam, S., Sonbhadra, S. K., Agarwal, S., Nagabhushan, P., & Tanveer, M. (2020). Sample reduction using farthest boundary point estimation (FBPE) for support vector data description (SVDD). Pattern Recognition Letters, 131, 268-276.Google ScholarCross Ref
- [17] Jeong, Y. S., Kang, I. H., Jeong, M. K., & Kong, D. (2012). A new feature selection method for one-class classification problems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1500-1509.Google ScholarDigital Library
- [18] Lian, H. (2012). On feature selection with principal component analysis for one-class SVM. Pattern Recognition Letters, 33(9), 1027-1031.Google ScholarDigital Library
- [19] Nagabhushan, P., & N Meenakshi, H. (2014). Target class supervised feature subsetting. International Journal of Computer Applications, 91(12), 11-23.Google ScholarCross Ref
- [20] Yousef, M., Saçar Demirci, M. D., Khalifa, W., & Allmer, J. (2016). Feature selection has a large impact on one-class classification accuracy for MicroRNAs in plants. Advances in bioinformatics, 2016.Google Scholar
- [21] Li, Y. (2011). Selecting training points for one-class support vector machines. Pattern recognition letters, 32(11), 1517-1522.Google Scholar
- [22] Kumar, B., Shukla, A., Singh, A., Ali, M. J., & Vyas, O. P. Reduction of Training Data from Large Datasets Using Encoder and Decoder Algorithm Without Much Compromise of Accuracy. Available at SSRN 3985435.Google Scholar
- [23] Sun, W., Qu, J., Chen, Y., Di, Y., & Gao, F. (2016). Heuristic sample reduction method for support vector data description. Turkish Journal of Electrical Engineering & Computer Sciences, 24(1), 298-312.Google ScholarCross Ref
- [24] Hens, A. B., & Tiwari, M. K. (2012). Computational time reduction for credit scoring: An integrated approach based on support vector machine and stratified sampling method. Expert Systems with Applications, 39(8), 6774-6781.Google ScholarDigital Library
- [25] Koggalage, R., & Halgamuge, S. (2004). Reducing the number of training samples for fast support vector machine classification. Neural Information Processing-Letters and Reviews, 2(3), 57-65.Google Scholar
- [26] Wang, Y., Yao, H., & Zhao, S. (2016). Auto-encoder based dimensionality reduction. Neurocomputing, 184, 232-242.Google ScholarDigital Library
- [27] Javatpoint (n.d.) Concept of Edge Detection. Javatpoint. Retrieved May 16, 2022 from https://www.javatpoint.com/dip-concept-of-edge-detectionGoogle Scholar
- [28] GeeksForGeeks (2022, March 25). Binary Search. GeeksForGeeks. https://www.geeksforgeeks.org/binary-search/Google Scholar
- [29] GeeksForGeeks (2018, Nov 28). upperbound in C++. GeeksForGeeks. https://www.geeksforgeeks.org/upper-bound-in-cpp/?ref=gcseGoogle Scholar
- [30] Marius H. (2020, Jun 15). Tree algorithms explained: Ball Tree Algorithm vs. KD Tree vs. Brute Force. Towards Data Science. https://towardsdatascience.com/tree-algorithms-explained-ball-tree-algorithm-vs-kd-tree-vs-brute-force-9746debcd940Google Scholar
- [31] Scikit Learn. (n.d.) sklearn.preprocessing.StandardScaler. Scikit learn. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.htmlGoogle Scholar
Recommendations
Information entropy based sample reduction for support vector data description
Highlights- A sample reduction strategy based on information entropy for training set is proposed. Samples are dynamically selected for training based the value of ...
AbstractSupport vector data description (SVDD) is one of the most attractive methods in one-class classification (OCC), especially in solving problems in novelty detection. SVDD helps to deal with the classification with a large amount of ...
K-farthest-neighbors-based concept boundary determination for support vector data description
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementSupport vector data description (SVDD) is very useful for one-class classification. However, it incurs high time complexity in handling large scale data. In this paper, we propose a novel and efficient method, named K-Farthest-Neighbors-based Concept ...
Bi-density twin support vector machines for pattern recognition
In this paper we present a classifier called bi-density twin support vector machines (BDTWSVMs) for data classification. In the training stage, BDTWSVMs first compute the relative density degrees for all training points using the intra-class graph whose ...
Comments