Skip to main content

Advertisement

Log in

Semantic separator learning and its applications in unsupervised Chinese text parsing

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Grammar learning has been a bottleneck problem for a long time. In this paper, we propose a method of semantic separator learning, a special case of grammar learning. The method is based on the hypothesis that some classes of words, called semantic separators, split a sentence into several constituents. The semantic separators are represented by words together with their part-of-speech tags and other information so that rich semantic information can be involved. In the method, we first identify the semantic separators with the help of noun phrase boundaries, called subseparators. Next, the argument classes of the separators are learned from corpus by generalizing argument instances in a hypernym space. Finally, in order to evaluate the learned semantic separators, we use them in unsupervised Chinese text parsing. The experiments on a manually labeled test set show that the proposed method outperforms previous methods of unsupervised text parsing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Manning C, Raghavan P, Schutze H. Introduction to information retrieval. Cambridge University Press, 2008

    Book  MATH  Google Scholar 

  2. Croce D, Moschitti A, Basili R. Semantic convolution kernels over dependency trees: smoothed partial tree kernel. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 2013–2016

    Google Scholar 

  3. Zhang C, Cao C, Sui Y, Wu X. A Chinese time ontology for the semantic web. Knowledge-Based Systems, 2011, 24(7): 1057–1074

    Article  Google Scholar 

  4. Liu Y, Lü Y, Liu Q. Improving tree-to-tree translation with packed forests. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009, 558–566

    Google Scholar 

  5. Zhang H, Yu H, Xiong D, Liu Q. HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. 2003, 184–187

    Chapter  Google Scholar 

  6. Gold E. Language identification in the limit. Information and Control, 1967, 10(5): 447–474

    Article  MATH  Google Scholar 

  7. Liang P, Petrov S, Jordan M, Klein D. The infinite PCFG using hierarchical Dirichlet processes. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 2007, 688–697

    Google Scholar 

  8. Klein D. The unsupervised learning of natural language structure. PhD thesis, Stanford University, 2005

    Google Scholar 

  9. Yoshinaka R. Identification in the limit of k, l-substitutable contextfree languages. Grammatical Inference: Algorithms and Applications, 2008, 266–279

    Chapter  Google Scholar 

  10. Clark A, Eyraud R, Habrard A. A polynomial algorithm for the inference of context free languages. Grammatical Inference: Algorithms and Applications, 2008, 29–42

    Chapter  Google Scholar 

  11. Clark A, Florêncio C, Watkins C, Serayet M. Planar languages and learnability. Grammatical Inference: Algorithms and Applications, 2006, 148–160

    Chapter  Google Scholar 

  12. Clark A, Costa Florêncio C, Watkins C. Languages as hyperplanes: grammatical inference with string kernels. Machine Learning, 2011, 82(3): 351–373

    Article  MATH  Google Scholar 

  13. Berg-Kirkpatrick T, Bouchard-Côté A, DeNero J, Klein D. Painless unsupervised learning with features. In: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010, 582–590

    Google Scholar 

  14. Iwata T, Mochihashi D, Sawada H. Learning common grammar from multilingual corpus. In: Proceedings of the ACL 2010 Conference Short Papers. 2010, 184–188

    Google Scholar 

  15. Berg-Kirkpatrick T, Klein D. Phylogenetic grammar induction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010, 1288–1297

    Google Scholar 

  16. Slonneger K, Kurtz B. Formal syntax and semantics of programming languages. Addison-Wesley, 1995

    MATH  Google Scholar 

  17. Abney S. Stochastic attribute-value grammars. Computational Linguistics, 1997, 23(4): 597–618

    MathSciNet  Google Scholar 

  18. Eisele A. Towards probabilistic extensions of constraint-based grammars. Computational Aspects of Constraint-based Linguistic Description, 1994, 3–21

    Google Scholar 

  19. Brew C. Stochastic HPSG. In: Proceedings of the 7th conference on European chapter of the Association for Computational Linguistics. 1995, 83–89

    Google Scholar 

  20. Clark A, Eyraud R. Identification in the limit of substitutable contextfree languages. In: Proceedings of the 16th International Conference on Algorithmic Learning Theory. 2005, 283–296

    Chapter  Google Scholar 

  21. Naseem T, Chen H, Barzilay R, Johnson M. Using universal linguistic knowledge to guide grammar induction. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010, 1234–1244

    Google Scholar 

  22. Naseem T, Barzilay R. Using semantic cues to learn syntax. In: Proceedings of the 25th International Conference on Artificial Intelligence. 2011

    Google Scholar 

  23. Boonkwan P, Steedman M. Grammar induction from text using small syntactic prototypes. In: Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011

    Google Scholar 

  24. Muresan S. Learning for deep language understanding. In: Proceedings of the 22nd International Joint conference on Artificial Intelligence. 2011, 1858–1865

    Google Scholar 

  25. Gavaldà M, Waibel A. Growing semantic grammars. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. 1998, 451–456

    Chapter  Google Scholar 

  26. Abisha P, Thomas D, Kumaar S. Learning subclasses of pure pattern languages. Grammatical Inference: Algorithms and Applications, 2008, 280–282

    Chapter  Google Scholar 

  27. Santamaria J, Araujo L. Identifying patterns for unsupervised grammar induction. In: Proceedings of the 14th Conference on Computational Natural Language Learning. 2010, 38–45

    Google Scholar 

  28. Liu L, Zhang S, Diao L, Yan S, Cao C. Acquiring ISA relations from Chinese free text based on multiple patterns. In: Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery. 2008, 160–164

    Google Scholar 

  29. Chen C. Propositon and Its Function. Anhui Education Press, 2002 (in Chinese)

    Google Scholar 

  30. Wang S, Cao Y, Cao X, Cao C. Learning concepts from text based on the inner-constructive model. Knowledge Science, Engineering and Management, 2007, 255–266

    Chapter  Google Scholar 

  31. Miao T. Encyclpedia of Music. People’s Music Press, 1998 (in Chinese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuming Wu.

Additional information

Yuming Wu received his BS from Dalian Jiaotong University in 2002 and hisMS from Capital Normal University in 2008. Now he is a PhD candidate in the Institute of Computing Technology, Chinese Academy of Science. His research interests include grammar learning, natural language processing, large-scale knowledge processing, and topic modeling.

Xiaodong Luo received his MS in Telecom Engineering from the University of South Australia in 2003. He has more than 15 years of experience in telecommunications operations, maintenance, and engineering. He has served as a technical support engineer for network planning and platforms establishment in Shanghai Telecom, especially in the field of next generation call center (NGCC) area.

Zhen Yang is a senior engineer at Shanghai Research Institute of China Telecom Corporation Limited. He received his BS from Harbin Institute of Technology, his MS from Chinese Academy of Science, his PhD from Dalian University of Technology. His research interests include the characteristics, conception, methods, and algorithms of individuality or personal information retrieval, the theory of personal data mining and personal information pattern recognition, and the application and development of search engine technologies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Luo, X. & Yang, Z. Semantic separator learning and its applications in unsupervised Chinese text parsing. Front. Comput. Sci. 7, 55–68 (2013). https://doi.org/10.1007/s11704-013-2072-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-013-2072-z

Keywords