Abstract
As a powerful sequence labeling model, conditional random fields (CRFs) have had successful applications in many natural language processing (NLP) tasks. However, the high complexity of CRFs training only allows a very small tag (or label) set, because the training becomes intractable as the tag set enlarges. This paper proposes an improved decomposed training and joint decoding algorithm for CRF learning. Instead of training a single CRF model for all tags, it trains a binary sub-CRF independently for each tag. An optimal tag sequence is then produced by a joint decoding algorithm based on the probabilistic output of all sub-CRFs involved. To test its effectiveness, we apply this approach to tackling Chinese word segmentation (CWS) as a sequence labeling problem. Our evaluation shows that it can reduce the computational cost of this language processing task by 40–50% without any significant performance loss on various large-scale data sets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. the Eighteenth International Conference on Machine Learning, ICML’01, Williams College: Morgan Kaufmann Publishers Inc., USA, 2001, pp.282–289.
Rosenfeld B, Feldman R, Fresko M. A systematic cross-comparison of sequence classifiers. In Proc. SDM 2006, Bethesda, Maryland, 2006, pp.563–567.
Sha F, Pereira F. Shallow parsing with conditional random fields. In Proc. the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, Vol. 1, 2003, pp.134–141.
Wallach H M. Efficient training of conditional random fields [Thesis]. Division of Informatics, University of Edinburgh, 2002.
Viterbi A J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 1967, 13(2): 260–269.
Cohn T, Smith A, Osborne M. Scaling conditional random fields using error-correcting codes. In Proc. the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, Michigan: Association for Computational Linguistics, June 2005, pp.10–17.
Hsu C W, Lin C J. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 2002, 13(2): 415–425.
Sutton C, McCallum A. Piecewise pseudolikelihood for efficient training of conditional random fields. In Proc. the 24th International Conference on Machine Learning, Corvalis, Oregon, ACM Press, June 20–24 2007, pp.863–870.
Toutanova K, Klein D, Manning C, Singer Y. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. HLT-NAACL’03, Edmonton, Canada, May 27–June 1, 2003, pp.252–259.
V Punyakanok, D Roth, W tau Yih, D Zimak. Learning and inference over constrained output. In Proc. IJCAI 2005, Edinburgh, Scotland, July 30–August 5, 2005, pp.1124–1129.
Abbeel P, Koller D, Ng A Y. Learning factor graphs in polynomial time and sample complexity. The Journal of Machine Learning Research, 2006, 7: 1743–1788.
McCallum A, Sutton C. Piecewise training with parameter independence diagrams: Comparing globally- and locally-trained linear-chain CRFs. Tech. Rep. IR-383, Center for Intelligent Information Retrieval, University of Massachusetts, 2004, presented at NIPS 2004 Workshop on Learning with Structured Outputs.
Xue N. Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29–48.
Peng F, Feng F, McCallum A. Chinese segmentation and new word detection using conditional random fields. In Proc. COLING 2004, Geneva, Switzerland, August 23–27, 2004, pp.562–568.
Tseng H, Chang P, Andrew G, Jurafsky D, Manning C. A conditional random field word segmenter for SIGHAN bakeoff 2005. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.168–171.
Tsai R T H, Hung H C, Sung C L, Dai H J, Hsu W L. On closed task of Chinese word segmentation: An improved CRF model coupled with character clustering and automatically generated template matching. In Proc. the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, July 22–23, 2006, pp.108–117.
Zhao H, Huang C N, Li M. An improved Chinese word segmentation system with conditional random field. In Proc. the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, July 22–23, 2006, pp.162–165.
Zhang R, Kikui G, Sumita E. Subword-based tagging by conditional random fields for Chinese word segmentation. In Proc. Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL - 2006), New York, 2006, pp.193–196.
Zhou G D. A chunking strategy towards unknown word detection in Chinese word segmentation. In Proc. the 2nd International Joint Conference on Natural Language Processing (IJCNLP-2005), Dale R, Wong K F, Su J, Kwong O Y (eds.), Jeju Island, Korea, Lecture Notes in Computer Science, Vol. 3651. Springer, October 11–13, 2005, pp.530–541.
Low J K, Ng H T, Guo W. A maximum entropy approach to Chinese word segmentation. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.161–164.
Zhao H, Huang C N, Li M, Lu B L. Effective tag set selection in Chinese word segmentation via conditional random field modeling. In Proc. the 20th Asian Pacific Conference on Language, Information and Computation, Wuhan, China, November 1–3, 2006, pp.87–94.
Emerson T. The second international Chinese word segmentation bakeoff. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.123–133.
Asahara M, Fukuoka K, Azuma A, Goh C L, Watanabe Y, Matsumoto Y, Tsuzuki T. Combination of machine learning methods for optimum Chinese word segmentation. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.134–137.
Chen A, Zhou Y, Zhang A, Sun G, Unigram language model for Chinese word segmentation. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.138–141.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research in this paper was supported by the Research Grants Council of Hong Kong S.A.R., China, through the CERG under Grant No. 9040861 (CityU 1318/03H) and by City University of Hong Kong through the Strategic Research under Grant No. 7002037.
*Dr. Hai Zhao was supported by a Postdoctoral Research Fellowship in the Department of Chinese, Translation and Linguistics, City University of Hong Kong.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhao, H., Kit, C. Scaling Conditional Random Fields by One-Against-the-Other Decomposition. J. Comput. Sci. Technol. 23, 612–619 (2008). https://doi.org/10.1007/s11390-008-9157-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-008-9157-4