Abstract
Linear kernel support vector machines (SVMs) using either \(L_{1}\)-norm or \(L_{2}\)-norm have emerged as an important and wildly used classification algorithm for many applications such as text chunking, part-of-speech tagging, information retrieval, and dependency parsing. \(L_{2}\)-norm SVMs usually provide slightly better accuracy than \(L_{1}\)-SVMs in most tasks. However, \(L_{2}\)-norm SVMs produce too many near-but-nonzero feature weights that are highly time-consuming when computing nonsignificant weights. In this paper, we present a cutting-weight algorithm to guide the optimization process of the \(L_{2}\)-SVMs toward a sparse solution. Before checking the optimality, our method automatically discards a set of near-but-nonzero feature weight. The final objects can then be achieved when the objective function is met by the remaining features and hypothesis. One characteristic of our cutting-weight algorithm is that it requires no changes in the original learning objects. To verify this concept, we conduct the experiments using three well-known benchmarks, i.e., CoNLL-2000 text chunking, SIGHAN-3 Chinese word segmentation, and Chinese word dependency parsing. Our method achieves 1–10 times feature parameter reduction rates in comparison with the original \(L_{2}\)-SVMs, slightly better accuracy with a lower training time cost. In terms of run-time efficiency, our method is reasonably faster than the original \(L_{2}\)-regularized SVMs. For example, our sparse \(L_{2}\)-SVMs is 2.55 times faster than the original \(L_{2}\)-SVMs with the same accuracy.
Similar content being viewed by others
Notes
Similar to IOB2 tagging scheme, the 6-tags: S/B/BI/I/IE/E indicate the Single/Begin/SecondBegin/ Interior/BeforeEnd/End of a chunk.
\(C=0.1/1\) for the CoNLL-2000 chunking and Chinese word dependency parsing tasks/SIGHAN-3 datasets.
References
Ando RK, Zhang T (2005) A high-performance semi-supervised learning method for text chunking. In: Proceedings of the annual meeting of the association of computational linguistics, pp 1–9
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the annual ACM workshop on computational learning theory, pp 144–152
Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the conference on computational natural language learning, pp 149–164
Collins M (2002) Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the conference on empirical methods in natural language processing, pp 1–8
Daumé H, Marcu D (2005) Learning as search optimization: approximate large margin methods for structured prediction. In: Proceedings of international conference on machine learning, pp 169–176
Dhir CS, Lee J, Lee SY (2012) Extraction of independent discriminant features for data with asymmetric distribution. J Knowl Inf Syst 30(2):359–375
Druck G, McCallum A (2010) High-performance semi-supervised learning using discriminatively constrained generative models. In: Proceedings of the international conference on machine learning, pp 319–326
Fan TK, Cang CH (2010) Sentiment oriented contextual advertising. J Knowl Inf Syst 23(3):321–344
Frommer A, Maaß P (1999) Fast CG-based methods for Tikhonov-Phillips regularization. J Sci Comput 20(5):1831–1850
Gao J, Andrew G (2007) Scalable training of L1-regularized log-linear models. In: Proceedings of international conference on machine learning, pp 33–40
Gao J, Andrew G, Johnson M, Toutanova K (2007) A comparative study of parameter estimation methods for statistical natural language processing. In: Proceedings of the annual meeting of the association of computational linguistics, pp 824–831
Giménez J, Márquez L (2004) SVMTool: a general POS tagger generator based on support vector machines. In: Proceedings of 4th international conference on, language resources and evaluation, pp 43–46
Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of international conference on machine learning, pp 408–415
Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 217–226
Joachims T, Finley T, Yu CN (2009) Cutting-plane training of structural SVMs. Mach Learn 77(1):27–59
Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. J Knowl Inf Syst 22(3):371–391
Keerthi SS, Sundararajan S, Chang KW, Hsieh CJ, Lin CJ (2008) A sequential dual method for large scale multi-class linear SVMs. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 408–416
Keerthi SS, DeCoste D (2005) A modified finite Newton method for fast solution of large scale linear SVMs. J Mach Learn Res 6:341–361
Kudo T, Matsumoto Y (2001) Chunking with support vector machines. In: Proceeding of the North American chapter of the association for computational linguistics on language technologies, pp 192–199
Kudo T, Yamamoto K, Matsumoto Y (2004) Applying conditional random fields to Japanese morphological analysis. In: Proceedings of conference on empirical methods in natural language processing, pp 230–237
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of international conference on machine learning, pp 282–289
Lee YS, Wu YC (2007) A robust multilingual portable phrase chunking system. Expert Syst Appl 33(3): 1–26
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Res 1:161–177
Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Nivre J, Hall J, Kubler S, Mcdonald R, Nilsson J, Riedel S, Yuret D (2007) The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the conference on computational natural language learning, pp 915–932
Ng HT, Low JK (2004) Chinese part-of-speech tagging. One-at-a-time or all-at-once? word-based or character-based?. In: Proceedings of conference on empirical methods in natural language processing, pp 277–284
Suzuki J, Fujino A, Isozaki H (2007) Semi-supervised structured output learning based on a hybrid generative and discriminative approach. In: Proceedings of the annual meeting of the association of computational linguistics, pp 791–800
Suzuki J, Isozaki H (2008) Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In: Proceedings of the annual meeting of the association of computational linguistics, pp 665–673
Tsai RTH (2010) Chinese text segmentation: a hybrid approach using transductive learning and statistical association measures. Expert Syst Appl 37(5):3553–3560
Tjong Kim Sang EF, Buchholz S (2000) Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of the conference on computational natural language learning, pp 127–132
Wu YC, Lee YS, Yang JC (2008) Robust and efficient Chinese word dependency analysis with linear kernel support vector machines. In: Proceedings of international conference on computational linguistics poster session, pp 135–138
Zhang Y, Clark S (2007) Chinese segmentation with a word-based perceptron algorithm. In: Proceedings of the annual meeting of the association of computational linguistics, pp 840–847
Zhang T, Damerau F, Johnson DE (2002) Text chunking based on a generalization of Winnow. J Mach Learn Res 2:615–637
Zhao H, Kit C (2007) Incorporating global information into supervised learning for Chinese word segmentation. In: Proceedings of the conference of the pacific association for computational linguistics, pp 66–74
Acknowledgments
The authors acknowledge support under. NSC Grants NSC 101-2221-E-130-027- and NSC 101-2622-E-130-006-CC3.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, YC. A sparse \({\varvec{L}}_{2}\)-regularized support vector machines for efficient natural language learning. Knowl Inf Syst 39, 305–328 (2014). https://doi.org/10.1007/s10115-013-0615-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0615-0