Skip to main content
Log in

Scaling Conditional Random Fields by One-Against-the-Other Decomposition

  • Short Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

As a powerful sequence labeling model, conditional random fields (CRFs) have had successful applications in many natural language processing (NLP) tasks. However, the high complexity of CRFs training only allows a very small tag (or label) set, because the training becomes intractable as the tag set enlarges. This paper proposes an improved decomposed training and joint decoding algorithm for CRF learning. Instead of training a single CRF model for all tags, it trains a binary sub-CRF independently for each tag. An optimal tag sequence is then produced by a joint decoding algorithm based on the probabilistic output of all sub-CRFs involved. To test its effectiveness, we apply this approach to tackling Chinese word segmentation (CWS) as a sequence labeling problem. Our evaluation shows that it can reduce the computational cost of this language processing task by 40–50% without any significant performance loss on various large-scale data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. the Eighteenth International Conference on Machine Learning, ICML’01, Williams College: Morgan Kaufmann Publishers Inc., USA, 2001, pp.282–289.

  2. Rosenfeld B, Feldman R, Fresko M. A systematic cross-comparison of sequence classifiers. In Proc. SDM 2006, Bethesda, Maryland, 2006, pp.563–567.

  3. Sha F, Pereira F. Shallow parsing with conditional random fields. In Proc. the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, Vol. 1, 2003, pp.134–141.

  4. Wallach H M. Efficient training of conditional random fields [Thesis]. Division of Informatics, University of Edinburgh, 2002.

  5. Viterbi A J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 1967, 13(2): 260–269.

    Article  MATH  Google Scholar 

  6. Cohn T, Smith A, Osborne M. Scaling conditional random fields using error-correcting codes. In Proc. the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, Michigan: Association for Computational Linguistics, June 2005, pp.10–17.

    Google Scholar 

  7. Hsu C W, Lin C J. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 2002, 13(2): 415–425.

    Article  Google Scholar 

  8. Sutton C, McCallum A. Piecewise pseudolikelihood for efficient training of conditional random fields. In Proc. the 24th International Conference on Machine Learning, Corvalis, Oregon, ACM Press, June 20–24 2007, pp.863–870.

  9. Toutanova K, Klein D, Manning C, Singer Y. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. HLT-NAACL’03, Edmonton, Canada, May 27–June 1, 2003, pp.252–259.

  10. V Punyakanok, D Roth, W tau Yih, D Zimak. Learning and inference over constrained output. In Proc. IJCAI 2005, Edinburgh, Scotland, July 30–August 5, 2005, pp.1124–1129.

  11. Abbeel P, Koller D, Ng A Y. Learning factor graphs in polynomial time and sample complexity. The Journal of Machine Learning Research, 2006, 7: 1743–1788.

    MathSciNet  Google Scholar 

  12. McCallum A, Sutton C. Piecewise training with parameter independence diagrams: Comparing globally- and locally-trained linear-chain CRFs. Tech. Rep. IR-383, Center for Intelligent Information Retrieval, University of Massachusetts, 2004, presented at NIPS 2004 Workshop on Learning with Structured Outputs.

  13. Xue N. Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29–48.

    Google Scholar 

  14. Peng F, Feng F, McCallum A. Chinese segmentation and new word detection using conditional random fields. In Proc. COLING 2004, Geneva, Switzerland, August 23–27, 2004, pp.562–568.

  15. Tseng H, Chang P, Andrew G, Jurafsky D, Manning C. A conditional random field word segmenter for SIGHAN bakeoff 2005. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.168–171.

  16. Tsai R T H, Hung H C, Sung C L, Dai H J, Hsu W L. On closed task of Chinese word segmentation: An improved CRF model coupled with character clustering and automatically generated template matching. In Proc. the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, July 22–23, 2006, pp.108–117.

  17. Zhao H, Huang C N, Li M. An improved Chinese word segmentation system with conditional random field. In Proc. the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, July 22–23, 2006, pp.162–165.

  18. Zhang R, Kikui G, Sumita E. Subword-based tagging by conditional random fields for Chinese word segmentation. In Proc. Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL - 2006), New York, 2006, pp.193–196.

  19. Zhou G D. A chunking strategy towards unknown word detection in Chinese word segmentation. In Proc. the 2nd International Joint Conference on Natural Language Processing (IJCNLP-2005), Dale R, Wong K F, Su J, Kwong O Y (eds.), Jeju Island, Korea, Lecture Notes in Computer Science, Vol. 3651. Springer, October 11–13, 2005, pp.530–541.

  20. Low J K, Ng H T, Guo W. A maximum entropy approach to Chinese word segmentation. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.161–164.

  21. Zhao H, Huang C N, Li M, Lu B L. Effective tag set selection in Chinese word segmentation via conditional random field modeling. In Proc. the 20th Asian Pacific Conference on Language, Information and Computation, Wuhan, China, November 1–3, 2006, pp.87–94.

  22. Emerson T. The second international Chinese word segmentation bakeoff. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.123–133.

  23. Asahara M, Fukuoka K, Azuma A, Goh C L, Watanabe Y, Matsumoto Y, Tsuzuki T. Combination of machine learning methods for optimum Chinese word segmentation. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.134–137.

  24. Chen A, Zhou Y, Zhang A, Sun G, Unigram language model for Chinese word segmentation. In Proc. the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, October 14–15, 2005, pp.138–141.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunyu Kit.

Additional information

The research in this paper was supported by the Research Grants Council of Hong Kong S.A.R., China, through the CERG under Grant No. 9040861 (CityU 1318/03H) and by City University of Hong Kong through the Strategic Research under Grant No. 7002037.

*Dr. Hai Zhao was supported by a Postdoctoral Research Fellowship in the Department of Chinese, Translation and Linguistics, City University of Hong Kong.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 65.1 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, H., Kit, C. Scaling Conditional Random Fields by One-Against-the-Other Decomposition. J. Comput. Sci. Technol. 23, 612–619 (2008). https://doi.org/10.1007/s11390-008-9157-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-008-9157-4

Keywords

Navigation