A sparse $${\varvec{L}}_{2}$$ -regularized support vector machines for efficient natural language learning

Wu, Yu-Chieh

doi:10.1007/s10115-013-0615-0

A sparse ${\varvec{L}}_{2}$-regularized support vector machines for efficient natural language learning

Regular Paper
Published: 09 March 2013

Volume 39, pages 305–328, (2014)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yu-Chieh Wu¹

246 Accesses
2 Citations
Explore all metrics

Abstract

Linear kernel support vector machines (SVMs) using either $L_{1}$-norm or $L_{2}$-norm have emerged as an important and wildly used classification algorithm for many applications such as text chunking, part-of-speech tagging, information retrieval, and dependency parsing. $L_{2}$-norm SVMs usually provide slightly better accuracy than $L_{1}$-SVMs in most tasks. However, $L_{2}$-norm SVMs produce too many near-but-nonzero feature weights that are highly time-consuming when computing nonsignificant weights. In this paper, we present a cutting-weight algorithm to guide the optimization process of the $L_{2}$-SVMs toward a sparse solution. Before checking the optimality, our method automatically discards a set of near-but-nonzero feature weight. The final objects can then be achieved when the objective function is met by the remaining features and hypothesis. One characteristic of our cutting-weight algorithm is that it requires no changes in the original learning objects. To verify this concept, we conduct the experiments using three well-known benchmarks, i.e., CoNLL-2000 text chunking, SIGHAN-3 Chinese word segmentation, and Chinese word dependency parsing. Our method achieves 1–10 times feature parameter reduction rates in comparison with the original $L_{2}$-SVMs, slightly better accuracy with a lower training time cost. In terms of run-time efficiency, our method is reasonably faster than the original $L_{2}$-regularized SVMs. For example, our sparse $L_{2}$-SVMs is 2.55 times faster than the original $L_{2}$-SVMs with the same accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Siamese Neural Networks: An Overview

Introduction to Machine Learning

Notes

Similar to IOB2 tagging scheme, the 6-tags: S/B/BI/I/IE/E indicate the Single/Begin/SecondBegin/ Interior/BeforeEnd/End of a chunk.
http://lcg-www.uia.ac.be/conll2000/chunking/conlleval.txt.
$C=0.1/1$ for the CoNLL-2000 chunking and Chinese word dependency parsing tasks/SIGHAN-3 datasets.
The settings of hyper-parameter is the same as previous literatures [28, 34].
http://140.115.112.118/bcbb/CMM-CoNLL/sparseSVM.htm.

References

Ando RK, Zhang T (2005) A high-performance semi-supervised learning method for text chunking. In: Proceedings of the annual meeting of the association of computational linguistics, pp 1–9
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the annual ACM workshop on computational learning theory, pp 144–152
Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the conference on computational natural language learning, pp 149–164
Collins M (2002) Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the conference on empirical methods in natural language processing, pp 1–8
Daumé H, Marcu D (2005) Learning as search optimization: approximate large margin methods for structured prediction. In: Proceedings of international conference on machine learning, pp 169–176
Dhir CS, Lee J, Lee SY (2012) Extraction of independent discriminant features for data with asymmetric distribution. J Knowl Inf Syst 30(2):359–375
Article Google Scholar
Druck G, McCallum A (2010) High-performance semi-supervised learning using discriminatively constrained generative models. In: Proceedings of the international conference on machine learning, pp 319–326
Fan TK, Cang CH (2010) Sentiment oriented contextual advertising. J Knowl Inf Syst 23(3):321–344
Article Google Scholar
Frommer A, Maaß P (1999) Fast CG-based methods for Tikhonov-Phillips regularization. J Sci Comput 20(5):1831–1850
MATH MathSciNet Google Scholar
Gao J, Andrew G (2007) Scalable training of L1-regularized log-linear models. In: Proceedings of international conference on machine learning, pp 33–40
Gao J, Andrew G, Johnson M, Toutanova K (2007) A comparative study of parameter estimation methods for statistical natural language processing. In: Proceedings of the annual meeting of the association of computational linguistics, pp 824–831
Giménez J, Márquez L (2004) SVMTool: a general POS tagger generator based on support vector machines. In: Proceedings of 4th international conference on, language resources and evaluation, pp 43–46
Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of international conference on machine learning, pp 408–415
Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 217–226
Joachims T, Finley T, Yu CN (2009) Cutting-plane training of structural SVMs. Mach Learn 77(1):27–59
Article MATH Google Scholar
Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. J Knowl Inf Syst 22(3):371–391
Article Google Scholar
Keerthi SS, Sundararajan S, Chang KW, Hsieh CJ, Lin CJ (2008) A sequential dual method for large scale multi-class linear SVMs. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 408–416
Keerthi SS, DeCoste D (2005) A modified finite Newton method for fast solution of large scale linear SVMs. J Mach Learn Res 6:341–361
MATH MathSciNet Google Scholar
Kudo T, Matsumoto Y (2001) Chunking with support vector machines. In: Proceeding of the North American chapter of the association for computational linguistics on language technologies, pp 192–199
Kudo T, Yamamoto K, Matsumoto Y (2004) Applying conditional random fields to Japanese morphological analysis. In: Proceedings of conference on empirical methods in natural language processing, pp 230–237
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of international conference on machine learning, pp 282–289
Lee YS, Wu YC (2007) A robust multilingual portable phrase chunking system. Expert Syst Appl 33(3): 1–26
MATH Google Scholar
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Res 1:161–177
MATH MathSciNet Google Scholar
Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book MATH Google Scholar
Nivre J, Hall J, Kubler S, Mcdonald R, Nilsson J, Riedel S, Yuret D (2007) The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the conference on computational natural language learning, pp 915–932
Ng HT, Low JK (2004) Chinese part-of-speech tagging. One-at-a-time or all-at-once? word-based or character-based?. In: Proceedings of conference on empirical methods in natural language processing, pp 277–284
Suzuki J, Fujino A, Isozaki H (2007) Semi-supervised structured output learning based on a hybrid generative and discriminative approach. In: Proceedings of the annual meeting of the association of computational linguistics, pp 791–800
Suzuki J, Isozaki H (2008) Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In: Proceedings of the annual meeting of the association of computational linguistics, pp 665–673
Tsai RTH (2010) Chinese text segmentation: a hybrid approach using transductive learning and statistical association measures. Expert Syst Appl 37(5):3553–3560
Article Google Scholar
Tjong Kim Sang EF, Buchholz S (2000) Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of the conference on computational natural language learning, pp 127–132
Wu YC, Lee YS, Yang JC (2008) Robust and efficient Chinese word dependency analysis with linear kernel support vector machines. In: Proceedings of international conference on computational linguistics poster session, pp 135–138
Zhang Y, Clark S (2007) Chinese segmentation with a word-based perceptron algorithm. In: Proceedings of the annual meeting of the association of computational linguistics, pp 840–847
Zhang T, Damerau F, Johnson DE (2002) Text chunking based on a generalization of Winnow. J Mach Learn Res 2:615–637
Google Scholar
Zhao H, Kit C (2007) Incorporating global information into supervised learning for Chinese word segmentation. In: Proceedings of the conference of the pacific association for computational linguistics, pp 66–74

Download references

Acknowledgments

The authors acknowledge support under. NSC Grants NSC 101-2221-E-130-027- and NSC 101-2622-E-130-006-CC3.

Author information

Authors and Affiliations

Department of Communication and Management, Ming Chuan University, 250 Zhong Shan N. Rd., Sec. 5, Taipei, 111, Taiwan
Yu-Chieh Wu

Authors

Yu-Chieh Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Chieh Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, YC. A sparse ${\varvec{L}}_{2}$-regularized support vector machines for efficient natural language learning. Knowl Inf Syst 39, 305–328 (2014). https://doi.org/10.1007/s10115-013-0615-0

Download citation

Received: 17 August 2010
Revised: 24 January 2013
Accepted: 17 February 2013
Published: 09 March 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10115-013-0615-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sparse \({\varvec{L}}_{2}\)-regularized support vector machines for efficient natural language learning

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Siamese Neural Networks: An Overview

Introduction to Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A sparse \({\varvec{L}}_{2}\)-regularized support vector machines for efficient natural language learning

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Siamese Neural Networks: An Overview

Introduction to Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation