Abstract
Grammar learning has been a bottleneck problem for a long time. In this paper, we propose a method of semantic separator learning, a special case of grammar learning. The method is based on the hypothesis that some classes of words, called semantic separators, split a sentence into several constituents. The semantic separators are represented by words together with their part-of-speech tags and other information so that rich semantic information can be involved. In the method, we first identify the semantic separators with the help of noun phrase boundaries, called subseparators. Next, the argument classes of the separators are learned from corpus by generalizing argument instances in a hypernym space. Finally, in order to evaluate the learned semantic separators, we use them in unsupervised Chinese text parsing. The experiments on a manually labeled test set show that the proposed method outperforms previous methods of unsupervised text parsing.
Similar content being viewed by others
References
Manning C, Raghavan P, Schutze H. Introduction to information retrieval. Cambridge University Press, 2008
Croce D, Moschitti A, Basili R. Semantic convolution kernels over dependency trees: smoothed partial tree kernel. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 2013–2016
Zhang C, Cao C, Sui Y, Wu X. A Chinese time ontology for the semantic web. Knowledge-Based Systems, 2011, 24(7): 1057–1074
Liu Y, Lü Y, Liu Q. Improving tree-to-tree translation with packed forests. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009, 558–566
Zhang H, Yu H, Xiong D, Liu Q. HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. 2003, 184–187
Gold E. Language identification in the limit. Information and Control, 1967, 10(5): 447–474
Liang P, Petrov S, Jordan M, Klein D. The infinite PCFG using hierarchical Dirichlet processes. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 2007, 688–697
Klein D. The unsupervised learning of natural language structure. PhD thesis, Stanford University, 2005
Yoshinaka R. Identification in the limit of k, l-substitutable contextfree languages. Grammatical Inference: Algorithms and Applications, 2008, 266–279
Clark A, Eyraud R, Habrard A. A polynomial algorithm for the inference of context free languages. Grammatical Inference: Algorithms and Applications, 2008, 29–42
Clark A, Florêncio C, Watkins C, Serayet M. Planar languages and learnability. Grammatical Inference: Algorithms and Applications, 2006, 148–160
Clark A, Costa Florêncio C, Watkins C. Languages as hyperplanes: grammatical inference with string kernels. Machine Learning, 2011, 82(3): 351–373
Berg-Kirkpatrick T, Bouchard-Côté A, DeNero J, Klein D. Painless unsupervised learning with features. In: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010, 582–590
Iwata T, Mochihashi D, Sawada H. Learning common grammar from multilingual corpus. In: Proceedings of the ACL 2010 Conference Short Papers. 2010, 184–188
Berg-Kirkpatrick T, Klein D. Phylogenetic grammar induction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010, 1288–1297
Slonneger K, Kurtz B. Formal syntax and semantics of programming languages. Addison-Wesley, 1995
Abney S. Stochastic attribute-value grammars. Computational Linguistics, 1997, 23(4): 597–618
Eisele A. Towards probabilistic extensions of constraint-based grammars. Computational Aspects of Constraint-based Linguistic Description, 1994, 3–21
Brew C. Stochastic HPSG. In: Proceedings of the 7th conference on European chapter of the Association for Computational Linguistics. 1995, 83–89
Clark A, Eyraud R. Identification in the limit of substitutable contextfree languages. In: Proceedings of the 16th International Conference on Algorithmic Learning Theory. 2005, 283–296
Naseem T, Chen H, Barzilay R, Johnson M. Using universal linguistic knowledge to guide grammar induction. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010, 1234–1244
Naseem T, Barzilay R. Using semantic cues to learn syntax. In: Proceedings of the 25th International Conference on Artificial Intelligence. 2011
Boonkwan P, Steedman M. Grammar induction from text using small syntactic prototypes. In: Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011
Muresan S. Learning for deep language understanding. In: Proceedings of the 22nd International Joint conference on Artificial Intelligence. 2011, 1858–1865
Gavaldà M, Waibel A. Growing semantic grammars. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. 1998, 451–456
Abisha P, Thomas D, Kumaar S. Learning subclasses of pure pattern languages. Grammatical Inference: Algorithms and Applications, 2008, 280–282
Santamaria J, Araujo L. Identifying patterns for unsupervised grammar induction. In: Proceedings of the 14th Conference on Computational Natural Language Learning. 2010, 38–45
Liu L, Zhang S, Diao L, Yan S, Cao C. Acquiring ISA relations from Chinese free text based on multiple patterns. In: Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery. 2008, 160–164
Chen C. Propositon and Its Function. Anhui Education Press, 2002 (in Chinese)
Wang S, Cao Y, Cao X, Cao C. Learning concepts from text based on the inner-constructive model. Knowledge Science, Engineering and Management, 2007, 255–266
Miao T. Encyclpedia of Music. People’s Music Press, 1998 (in Chinese)
Author information
Authors and Affiliations
Corresponding author
Additional information
Yuming Wu received his BS from Dalian Jiaotong University in 2002 and hisMS from Capital Normal University in 2008. Now he is a PhD candidate in the Institute of Computing Technology, Chinese Academy of Science. His research interests include grammar learning, natural language processing, large-scale knowledge processing, and topic modeling.
Xiaodong Luo received his MS in Telecom Engineering from the University of South Australia in 2003. He has more than 15 years of experience in telecommunications operations, maintenance, and engineering. He has served as a technical support engineer for network planning and platforms establishment in Shanghai Telecom, especially in the field of next generation call center (NGCC) area.
Zhen Yang is a senior engineer at Shanghai Research Institute of China Telecom Corporation Limited. He received his BS from Harbin Institute of Technology, his MS from Chinese Academy of Science, his PhD from Dalian University of Technology. His research interests include the characteristics, conception, methods, and algorithms of individuality or personal information retrieval, the theory of personal data mining and personal information pattern recognition, and the application and development of search engine technologies.
Rights and permissions
About this article
Cite this article
Wu, Y., Luo, X. & Yang, Z. Semantic separator learning and its applications in unsupervised Chinese text parsing. Front. Comput. Sci. 7, 55–68 (2013). https://doi.org/10.1007/s11704-013-2072-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-013-2072-z