research-article

Sample Reduction for Support Vector Data Description (SVDD) by Farthest Boundary Point Estimation (FBPE) using Gradients of Data Density

Authors:
Pratyush Pareek

Indian Institute of Information Technology Allahabad, India

Indian Institute of Information Technology Allahabad, India
View Profile

,
Aaryan Bhardwaj

Indian Institute of Information Technology Allahabad, India

Indian Institute of Information Technology Allahabad, India
View Profile

,
Sanskar Patro

Indian Institute of Information Technology Allahabad, India

Indian Institute of Information Technology Allahabad, India
View Profile

,
Anirudh Arora

Indian Institute of Information Technology Allahabad, India

Indian Institute of Information Technology Allahabad, India
View Profile

,
Muskan Deep Kaur Maini

Indian Institute of Information Technology Allahabad, India

Indian Institute of Information Technology Allahabad, India
View Profile

,
Bagesh Kumar

Indian Institute of Information Technology Allahabad, India

Indian Institute of Information Technology Allahabad, India
View Profile

,
O. P. Vyas

Indian Institute of Information Technology Allahabad, India

Indian Institute of Information Technology Allahabad, India
View Profile

IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary ComputingAugust 2022Pages 467–475https://doi.org/10.1145/3549206.3549287

Published:24 October 2022Publication History

IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing

Pages 467–475

ABSTRACT

Classification is a quintessential application of machine learning for which support vector machines have been used ubiquitously because of their optimal margins and ease of use. However, they’re rarely used for large datasets due to the cubic time complexity of their training process. This has inspired several papers attempting to reduce the number of features or the number of training samples to lessen the training time of the SVMs. This paper aims to propose a novel approach for reducing the number of training samples for support vector data description (SVDD) while attempting to maximize the knowledge of the target class by selecting the most promising candidates for support vectors, which are the farthest boundary points of the data clusters. The proposed algorithm utilizes the density gradient across the data distribution to uniformly detect the boundary points, which are sampled as potential support vectors to train the support vector machines in a smaller amount of time without significant loss in accuracy. The proposed algorithm is verified via tests conducted on Human Activity Recognition, Breast Cancer Detection, and Heart Disease Detection Datasets.

References

[1] Minter, T. C. (1975, January). Single-class classification. In LARS Symposia (p. 54).Google Scholar
[2] Koch, M. W., Moya, M. M., Hostetler, L. D., & Fogler, R. J. (1995). Cueing, feature discovery, and one-class learning for synthetic aperture radar automatic target recognition. Neural Networks, 8(7-8), 1081-1102.Google ScholarDigital Library
[3]GeeksForGeeks (2022, Jan 04). Parzen Windows density estimation technique. GeeksForGeeks. https://www.geeksforgeeks.org/parzen-windows-density-estimation-technique/Google Scholar
[4] Désir, C., Bernard, S., Petitjean, C., & Heutte, L. (2012, October). A random forest based approach for one class classification in medical imaging. In International Workshop on Machine Learning in Medical Imaging (pp. 250-257). Springer, Berlin, Heidelberg.Google ScholarCross Ref
[5] Schölkopf, B., Williamson, R. C., Smola, A., Shawe-Taylor, J., & Platt, J. (1999). Support vector method for novelty detection. Advances in neural information processing systems, 12.Google Scholar
[6] Hao, P. Y. (2008). Fuzzy one-class support vector machines. Fuzzy Sets and Systems, 159(18), 2317-2336.Google ScholarDigital Library
[7] Ji, M., & Xing, H. J. (2017, May). Adaptive-weighted one-class support vector machine for outlier detection. In 2017 29th Chinese Control and Decision Conference (CCDC) (pp. 1766-1771). IEEE.Google ScholarCross Ref
[8] Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural computation, 13(7), 1443-1471.Google Scholar
[9] Tax, D. M., & Duin, R. P. (1999). Support vector domain description. Pattern recognition letters, 20(11-13), 1191-1199.Google Scholar
[10] Xing, H. J., & Liu, W. T. (2020). Robust AdaBoost based ensemble of one-class support vector machines. Information Fusion, 55, 45-58.Google ScholarCross Ref
[11] Zhu, F., Yang, J., Gao, C., Xu, S., Ye, N., & Yin, T. (2016). A weighted one-class support vector machine. Neurocomputing, 189, 1-10.Google ScholarDigital Library
[12] Rohith Gandhi (2018, Jun 7). Support Vector Machine — Introduction to Machine Learning Algorithms. Towards Data Science. https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47Google Scholar
[13] Liu, Y. G., Chen, Q., & Yu, R. Z. (2003, November). Extract candidates of support vector from training set. In Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 03EX693) (Vol. 5, pp. 3199-3202). IEEE.Google Scholar
[14] Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines.Google Scholar
[15] Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3), 1-27.Google ScholarDigital Library
[16] Alam, S., Sonbhadra, S. K., Agarwal, S., Nagabhushan, P., & Tanveer, M. (2020). Sample reduction using farthest boundary point estimation (FBPE) for support vector data description (SVDD). Pattern Recognition Letters, 131, 268-276.Google ScholarCross Ref
[17] Jeong, Y. S., Kang, I. H., Jeong, M. K., & Kong, D. (2012). A new feature selection method for one-class classification problems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1500-1509.Google ScholarDigital Library
[18] Lian, H. (2012). On feature selection with principal component analysis for one-class SVM. Pattern Recognition Letters, 33(9), 1027-1031.Google ScholarDigital Library
[19] Nagabhushan, P., & N Meenakshi, H. (2014). Target class supervised feature subsetting. International Journal of Computer Applications, 91(12), 11-23.Google ScholarCross Ref
[20] Yousef, M., Saçar Demirci, M. D., Khalifa, W., & Allmer, J. (2016). Feature selection has a large impact on one-class classification accuracy for MicroRNAs in plants. Advances in bioinformatics, 2016.Google Scholar
[21] Li, Y. (2011). Selecting training points for one-class support vector machines. Pattern recognition letters, 32(11), 1517-1522.Google Scholar
[22] Kumar, B., Shukla, A., Singh, A., Ali, M. J., & Vyas, O. P. Reduction of Training Data from Large Datasets Using Encoder and Decoder Algorithm Without Much Compromise of Accuracy. Available at SSRN 3985435.Google Scholar
[23] Sun, W., Qu, J., Chen, Y., Di, Y., & Gao, F. (2016). Heuristic sample reduction method for support vector data description. Turkish Journal of Electrical Engineering & Computer Sciences, 24(1), 298-312.Google ScholarCross Ref
[24] Hens, A. B., & Tiwari, M. K. (2012). Computational time reduction for credit scoring: An integrated approach based on support vector machine and stratified sampling method. Expert Systems with Applications, 39(8), 6774-6781.Google ScholarDigital Library
[25] Koggalage, R., & Halgamuge, S. (2004). Reducing the number of training samples for fast support vector machine classification. Neural Information Processing-Letters and Reviews, 2(3), 57-65.Google Scholar
[26] Wang, Y., Yao, H., & Zhao, S. (2016). Auto-encoder based dimensionality reduction. Neurocomputing, 184, 232-242.Google ScholarDigital Library
[27] Javatpoint (n.d.) Concept of Edge Detection. Javatpoint. Retrieved May 16, 2022 from https://www.javatpoint.com/dip-concept-of-edge-detectionGoogle Scholar
[28] GeeksForGeeks (2022, March 25). Binary Search. GeeksForGeeks. https://www.geeksforgeeks.org/binary-search/Google Scholar
[29] GeeksForGeeks (2018, Nov 28). upperbound in C++. GeeksForGeeks. https://www.geeksforgeeks.org/upper-bound-in-cpp/?ref=gcseGoogle Scholar
[30] Marius H. (2020, Jun 15). Tree algorithms explained: Ball Tree Algorithm vs. KD Tree vs. Brute Force. Towards Data Science. https://towardsdatascience.com/tree-algorithms-explained-ball-tree-algorithm-vs-kd-tree-vs-brute-force-9746debcd940Google Scholar
[31] Scikit Learn. (n.d.) sklearn.preprocessing.StandardScaler. Scikit learn. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.htmlGoogle Scholar

Recommendations

Information entropy based sample reduction for support vector data description
Highlights
- A sample reduction strategy based on information entropy for training set is proposed. Samples are dynamically selected for training based the value of ...
Abstract
Support vector data description (SVDD) is one of the most attractive methods in one-class classification (OCC), especially in solving problems in novelty detection. SVDD helps to deal with the classification with a large amount of ...
Read More
K-farthest-neighbors-based concept boundary determination for support vector data description
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Support vector data description (SVDD) is very useful for one-class classification. However, it incurs high time complexity in handling large scale data. In this paper, we propose a novel and efficient method, named K-Farthest-Neighbors-based Concept ...
Read More
Bi-density twin support vector machines for pattern recognition

In this paper we present a classifier called bi-density twin support vector machines (BDTWSVMs) for data classification. In the training stage, BDTWSVMs first compute the relative density degrees for all training points using the intra-class graph whose ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing
August 2022
710 pages
ISBN:9781450396752
DOI:10.1145/3549206
General Chairs:
Sartaj Sahni
University of Florida, USA
,
Vikas Saxena
JIIT Noida, India
,
Program Chair:
Sundaraja Sitharama Iyengar
Florida International University, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data density gradient
farthest boundary point estimation
sample reduction
sampling
support vector data description
support vector machines
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 43
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Sample Reduction for Support Vector Data Description (SVDD) by Farthest Boundary Point Estimation (FBPE) using Gradients of Data Density

IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing

ABSTRACT

References

Cited By

Recommendations

Information entropy based sample reduction for support vector data description

K-farthest-neighbors-based concept boundary determination for support vector data description

Bi-density twin support vector machines for pattern recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Sample Reduction for Support Vector Data Description (SVDD) by Farthest Boundary Point Estimation (FBPE) using Gradients of Data Density

IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing

ABSTRACT

References

Cited By

Recommendations

Information entropy based sample reduction for support vector data description

K-farthest-neighbors-based concept boundary determination for support vector data description

Bi-density twin support vector machines for pattern recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media