research-article

Enabling Precision/Recall Preferences for Semi-supervised SVM Training

Authors:
Zeyi Wen

The University of Melbourne, Melbourne, Australia

The University of Melbourne, Melbourne, Australia
View Profile

,
Rui Zhang

The University of Melbourne, Melbourne, Australia

The University of Melbourne, Melbourne, Australia
View Profile

,
Kotagiri Ramamohanarao

The University of Melbourne, Melbourne, Australia

The University of Melbourne, Melbourne, Australia
View Profile

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementNovember 2014Pages 421–430https://doi.org/10.1145/2661829.2661977

Published:03 November 2014Publication History

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 421–430

ABSTRACT

Semi-supervised learning is an essential approach to classification when the available labeled data is insufficient and we need to also make use of unlabeled data in the learning process. Numerous research efforts have focused on designing algorithms to improve the F₁ score, but have any mechanism to control precision or recall individually. However, many applications have precision/recall preferences. For instance, an email spam classifier requires a precision of 0.9 to mitigate the false dismissal of useful emails. In this paper, we propose a method that allows to specify a precision/recall preference while maximising the F₁ score. Our key idea is that we divide the semi-supervised learning process into multiple rounds of supervised learning, and the classifier learned at each round is calibrated using a sub-set of the labeled dataset before we use it on the unlabeled dataset for enlarging the training dataset. Our idea is applicable to a number of learning models such as Support Vector Machines (SVMs), Bayesian networks and neural networks. We focus our research and the implementation of our idea on SVMs. We conduct extensive experiments to validate the effectiveness of our method. The experimental results show that our method can train classifiers with a precision/recall preference, while the popular semi-supervised SVM training algorithm (which we use as the baseline) cannot. When we specify the precision preference and the recall preference to be the same, which indicates to maximise the F₁ score only as the baseline does, our method achieves better or similar F₁ scores to the baseline. An additional advantage of our method is that it converges much faster than the baseline.

References

Tong Zhang and F. Oles. The value of unlabeled data for classification problems. In ICML, pages 1191--1198. Citeseer, 2000.Google Scholar
Matthias Seeger et al. Learning with labeled and unlabeled data. Technical report, University of Edinburgh, 2001.Google Scholar
D. Sculley and Gabriel M. Wachman. Relaxed online svms for spam filtering. In SIGIR conference on Research and development in information retrieval, pages 415--422, 2007. Google ScholarDigital Library
Susan C. Harvey, Berta Geller, Robert G. Oppenheimer, Melanie Pinet, Leslie Riddell, and Brian Garra. Increase in cancer detection and recall rates with independent double interpretation of screening mammography. American Journal of Roentgenology, 180(5):1461--1467, 2003.Google ScholarCross Ref
Nello Cristianini and John Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, 2000. Google ScholarDigital Library
Ira Cohen, Nicu Sebe, F. G. Gozman, Marcelo Cesar Cirelo, and Thomas S. Huang. Learning bayesian network classifiers for facial expression recognition both labeled and unlabeled data. In Conference on Computer Vision and Pattern Recognition, volume 1, pages I--595. IEEE, 2003. Google ScholarDigital Library
Judith E. Dayhoff. Neural network architectures: an introduction. Van Nostrand Reinhold Co., 1990. Google ScholarDigital Library
Thorsten Joachims. Transductive inference for text classification using support vector machines. In ICML, volume 99, pages 200--209, 1999. Google ScholarDigital Library
Xiaojin Zhu. Semi-supervised learning literature survey. University of Wisconsin-Madison, 2:3, 2006.Google Scholar
Edgar Osuna, Robert Freund, and Federico Girosi. An improved training algorithm for support vector machines. In Proceedings of the 1997 IEEE Workshop, Neural Networks for Signal Processing VII., pages 276--285, 1997.Google ScholarCross Ref
John C. Platt. Fast training of SVMs using sequential minimal optimization. In Advances in kernel methods, pages 185--208. MIT Press, 1999. Google ScholarDigital Library
Thorsten Joachims. Training linear SVMs in linear time. In KDD, pages 217--226, 2006. Google ScholarDigital Library
Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming, 127(1):3--30, 2011. Google ScholarDigital Library
Kristin Bennett, Ayhan Demiriz, et al. Semi-supervised support vector machines. Advances in Neural Information processing systems, pages 368--374, 1999. Google ScholarDigital Library
Ayhan Demiriz and Kristin P. Bennett. Optimization approaches to semi-supervised learning. In Complementarity: Applications, Algorithms and Extensions, pages 121--141. Springer, 2001.Google Scholar
Tijl De Bie and Nello Cristianini. Semi-supervised learning using semi-definite programming. Semi-supervised learning. MIT Press, Cambridge-Massachussets, 32, 2006.Google Scholar
Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan Chang. PEBL: positive example based learning for web page classification using SVM. In KDD, pages 239--248, 2002. Google ScholarDigital Library
Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Hongjun Lu, and Philip S. Yu. Text classification without negative examples revisit. Knowledge and Data Engineering, IEEE Transactions on, 18(1):6--20, 2006. Google ScholarDigital Library
Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee, and Philip S. Yu. Building text classifiers using positive and unlabeled examples. In Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, pages 179--186. IEEE, 2003. Google ScholarDigital Library
Kristin P. Bennett and Erin J. Bredensteiner. Duality and geometry in svm classifiers. In ICML, pages 57--64, 2000. Google ScholarDigital Library
Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In Workshop on Computational Learning Theory, pages 144--152, 1992. Google ScholarDigital Library
Geoff Gordon and Ryan Tibshirani. Karush-kuhn-tucker conditions. Optimization, 10(725/36):725.Google Scholar
CUDA NVIDIA. NVIDIA CUDA programming guide, 2011.Google Scholar
Bryan Catanzaro, Narayanan Sundaram, and Kurt Keutzer. Fast SVM training and classification on graphics processors. In ICML, pages 104--111, 2008. Google ScholarDigital Library
Steven M. LaValle, Michael S. Branicky, and Stephen R. Lindemann. On the relationship between classical grid search and probabilistic roadmaps. The International Journal of Robotics Research, 23(7-8):673--692, 2004.Google ScholarCross Ref

Index Terms

Enabling Precision/Recall Preferences for Semi-supervised SVM Training
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More
Probabilistic Labeled Semi-supervised SVM
ICDMW '09: Proceedings of the 2009 IEEE International Conference on Data Mining Workshops

Semi-supervised learning has been paid increasing attention and is widely used in many fields such as data mining, information retrieval and knowledge management as it can utilize both labeled and unlabeled data. Laplacian SVM (LapSVM) is a very ...
Read More
A novel inductive semi-supervised SVM with graph-based self-training
IScIDE'12: Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering

In this paper, a novel inductive support vector machine for semi-supervised learning, named IS3VM, is proposed, which aims to improve SVM by bootstrapping unlabeled data with self-training. The SVM classifier is iteratively refined through the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
November 2014
2152 pages
ISBN:9781450325981
DOI:10.1145/2661829
General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
precision
preference
recall
semi-supervised svm training
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 258
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Enabling Precision/Recall Preferences for Semi-supervised SVM Training

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training

Probabilistic Labeled Semi-supervised SVM

A novel inductive semi-supervised SVM with graph-based self-training