Abstract
We present a study on the effect of adding morphological tags to the training corpus of a grammar inductor. For this purpose, we carried out several experiments using the grammar induction system called Alignment-Based Learning (ABL) and the CAST-3LB syntactically tagged Spanish corpus for training and testing. ABL produces a set of possible constituents with a word alignment process. We developed an algorithm which converts the hypotheses generated by ABL into ordered production rules. Then our algorithm groups them into possible phrase groups (constituents). These phrase groups correspond to the syntactic tagging of the unannotated text. We compared the phrase groups obtained by our algorithm with the manually tagged groups of CAST-3LB. The experiments in the grammar induction process consisted on trying three different variants for the training corpus: (1) using words; (2) using only the morphological tags; and (3) adding morphological tags to words. Our experiments show that the inclusion of morphological tags in the grammar induction process improves significantly the performance of ABL.
Work done under support of the Mexican Government (CONACYT, SNI, PIFI, and SIP-IPN).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bonnema, R., Bod, R., Scha, R.: A DOP model for semantic interpretation. In: Proceedings of the Association for Computational Linguistics/European Chapter of the Association for Computational Linguistics, Madrid, pp. 159–167 (1997)
Charniak, E.: A Maximun-Entropy-Inspired Parser. In: Proceedings of NAACL-2000 (2000)
Dupont, P.: Grammatical Inference: Formal and Heuristics Methods. Carnegie Mellon University (1997)
Geertzen, J., van Zaanen, M.: Alignment-Based Learning Reference Guide. Thecnical Report, Macquarie University (2006)
Harris, S.Z.: Structural Linguistic. University of Chicago Press, Chicago (2000)
Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (2000)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotaded Corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Navarro, B., Civit, M., Antonia Martí, M., Marcos, R., Fernández, B.: Syntactic, semantic and pragmatic annotation in Cast3LB. In: Shallow Processing of Large Corpora (SProLaC), a Workshop of Corpus Linguistics, Lancaster, UK (2003)
van Zaanen, M., Adriaans, P.: Alignment-Based Learning versus EMILE: A Comparison. In: Krose, B., de Rijke, M., Schreiber, G., Van Someren, M. (eds.) BNAIC 2001. Proceedings of the Belgian-Dutch Conference on Artificial Intelligence, Amsterdam, The Netherlands, pp. 315–322 (October 25-26, 2001)
van Zaanen, M.: ABL: Alignment-Based Learning. In: COLING 2000, pp. 961–967 (2000)
van Zaanen, M.: Bootstrapping Structure into Language: Alignment-Based Learning. PhD Thesis, School of Computing, University of Leeds, U.K (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Juárez Gambino, O., Calvo, H. (2007). On the Usage of Morphological Tags for Grammar Induction. In: Gelbukh, A., Kuri Morales, Á.F. (eds) MICAI 2007: Advances in Artificial Intelligence. MICAI 2007. Lecture Notes in Computer Science(), vol 4827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76631-5_87
Download citation
DOI: https://doi.org/10.1007/978-3-540-76631-5_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76630-8
Online ISBN: 978-3-540-76631-5
eBook Packages: Computer ScienceComputer Science (R0)