Skip to main content
Log in

A computational grammar for Persian based on GPSG

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, we present our attempts to design and implement a large-coverage computational grammar for the Persian language based on the Generalized Phrase Structured Grammar (GPSG) model. This grammatical model was developed for continuous speech recognition (CSR) applications, but is suitable for other applications that need the syntactic analysis of Persian. In this work, we investigate various syntactic structures relevant to the modern Persian language, and then describe these structures according to a phrase structure model. Noun (N), Verb (V), Adjective (ADJ), Adverb (ADV), and Preposition (P) are considered basic syntactic categories, and X-bar theory is used to define Noun phrases, Verb phrases, Adjective phrases, Adverbial phrases, and Prepositional phrases. However, we have to extend Noun phrase levels in X-bar theory to four levels due to certain complexities in the structure of Noun phrases in the Persian language. A set of 120 grammatical rules for describing different phrase structures of Persian is extracted, and a few instances of the rules are presented in this paper. These rules cover the major syntactic structures of the modern Persian language. For evaluation, the obtained grammatical model is utilized in a bottom-up chart parser for parsing 100 Persian sentences. Our grammatical model can take 89 sentences into account. Incorporating this grammar in a Persian CSR system leads to a 31% reduction in word error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

  1. This feature is often called Number feature and it takes s (singular) or p (plural) values. In this paper, we refer to this feature as Plurality (PLU) feature and it takes {+} (plural) or {−} (singular) values.

  2. N1¯ is read as N1 minus and means one level before N1.

  3. In our grammar, we have considered the GAP only for object gap. We have not defined subject gap because subject deletion in Persian sentences is very prevalent.

  4. Other FSD features are COMP, POBJ, and COR.

  5. The meaningful features that are not specified in the rules (like PER and PLU) can take every possible value unless they comply with FSD instruction and take their default value.

  6. The copula represents the enclitic form of the verb 'بودن' [budan] ‘to be’ in the present indicative.

  7. This is a simulation of the output of a CSR system. In the CSR systems, the Ezafe marker is given by the system.

References

  • Abeillé, A., & Schabes, Y. (1989). Parsing idioms in tree adjoining grammars. In Proceedings of the 4th conference of the European chapter of the association for computational linguistics, (pp. 339–349). Manchester, England, UK.

  • Allen, J. (1995). Natural language understanding. Redwood City, CA: The Benjamin/Cummings Publishing Company, Inc.

    Google Scholar 

  • Amtrup, J. W., Mansouri Rad, H., Megerdoomian, K., & Zajac, R. (2000). Persian-English machine translation: An overview of the Shiraz project. NMSU, CRL, Memoranda in Computer and Cognitive Science, MCCS-00-319, 2000.

  • Amtrup, J. W., & Megerdoomian, K. (2007). Machine translation of Persian complex predicates. Hamburg, Germany: Talk presented at the Second International Conference on Iranian Linguistics.

    Google Scholar 

  • Ayat, M. (2001). “Yek Gerâmer-e Mohâsebâti Barâye Zabân-e Fârsi” [A computational grammar for Persian language]. M.S. thesis, Computer Engineering Dept., AmirKabir University of Technology, Iran, (In Persian).

  • Chein, L. F., Chen, K. J., & Lee, L. S. (1993). A best-first language processing model integrating the unification grammar and markov language model for speech recognition applications. IEEE Transactions on Speech and Audio Processing, 1(2), 221–240.

    Article  Google Scholar 

  • Chu-ren, H. (1986). Coordination schema and Chinese NP coordination in GPSG. Journal of Cahiers de linguistique Asie Orientale, 15(1), 107–127.

    Article  Google Scholar 

  • Dehdari, J., & Lonsdale, D. (2008). A link grammar parser for Persian. In S. Karimi, V. Samiian, & D. Stilo (Eds.), Aspects of Iranian linguistics (Vol. 1). Cambridge: Cambridge Scholars Press.

    Google Scholar 

  • Edwards, M. (1993). Relative clauses in Egyptian Arabic. In Proceedings of the autumn meeting of the linguistics association of Greet Britain, University of Newcastle, (pp. 437–442).

  • Emirkanian, L., Da Sylva, L., & Bouchard, L. H. (1996). The implementation of a computational grammar of French using the Grammar Development Environment. In Proceedings of the 16th international conference on computational linguistics (pp. 1024–1027), Copenhagen.

  • Feili, H., & Ghassem-Sani, G. (2004). An application of lexicalized grammars in English-Persian translation. In Proceedings of the 16th European conference on artificial intelligence (ECAI 2004), (pp. 596–600) Spain, Aug 2004.

  • Gazdar, G. (1982). Phrase structure grammar. In P. Jacobson & G. K. Pullum (Eds.), The nature of syntactic representation (pp. 131–186). Dordrecht: D. Reidel.

    Google Scholar 

  • Gazdar, G., Klein, E., Pullum, G., & Sag, I. (1985). Generalized phrase structure grammar. MA, Oxford: Harvard University Press, Basil Blackwell.

    Google Scholar 

  • Givi, H. A., & Anvari, H. (2003). Dastur-e Zabân-e Fârsi [Grammar of Persian Language], Vol. 2, Tehran, Iran: Fatemi Press (In Persian).

  • Hafezi-Manshadi, M. (2001). “Tarrâhi-ye Yek Tahlilgar-e Nahvi Barâye Jomalât-e Neveshtâri-ye Zabân-e Fârsi” [Design and implementation of a syntactic parser for Persian written sentences], M.S. thesis, Electrical Engineering Dept., Sharif University of Technology, Iran, 2001 (In Persian).

  • Joshi, A. K., Levy, L. S., & Takahashi, M. (1975). Tree adjunct grammars. Journal of Computer Systems Science, 10(1), 136–163.

    Article  Google Scholar 

  • Kaplan, R. M., & Bresnan, J. (1982). Lexical-functional grammar: A formal system for grammatical representation. In J. Bresnan (Ed.), The mental representation of grammatical relation. Cambridge, MA: MIT Press.

    Google Scholar 

  • Karimi, S. (2008). Raising and control in Persian. In S. Karimi, V. Samiian, & D. Stilo (Eds.), Aspects of Iranian linguistics. Cambridge: Cambridge Scholars Publishing.

    Google Scholar 

  • Khanlari, P. (1995). Târix-e Zabân-e F â rsi [History of Persian Language]. Tehran, Iran: Simorgh Press (In Persian).

  • Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, Massachusetts: The MIT press.

    Google Scholar 

  • Megerdoomian, K. (2000). Persian computational morphology: A unification-based approach. NMSU, CRL, Memoranda in Computer and Cognitive Science, MCCS-00-320, pp. 1–50.

  • Meshkatoddini, M. (2003). Dastur-e Zabân-e Fârsi bar Asâs-e Nazariye-ye Gashtâri [Introduction to Persian Transformational grammar], 2nd edn. Mashhad, Iran: Ferdowsi University Press. (In Persian).

  • Müller, S. (2010). Persian complex predicates and the limits of inheritance-based analyses. Journal of Linguistics, 46(3), 601–655.

    Article  Google Scholar 

  • Pollard, C., & Sag, I. A. (1993). Head-driven phrase structure grammar. Chicago, IL: Chicago University Press.

    Google Scholar 

  • Radford, A. (1988). Transformational grammar. Cambridge: Cambridge University Press.

    Google Scholar 

  • Rezaei, S. (1993). Constraint based parsing of a free word order language: Persian. M.S. thesis, Artificial Intelligence Department, University of Edinburgh, UK.

  • Sajjadi, A. (2006). Bâznemâyi-ye Dânesh-e Dasturi-ye Zabân-e Fârsi be Komak-e Gerâmer-e Peyvandi”[Representing of syntactic knowledge of Persian language using link grammars], M.S. thesis, Computer Engineering Department, AmirKabir University of Technology, Iran, 2006. (In Persian).

  • Valad, A. M. (2006) Tahlilgar-e Nahvi-ye Zabân-e Fârsi” [The syntactic parser of Persian language], B.S. thesis, Shahid Beheshti University, Tehran, Iran. (In Persian).

  • XTAG Research Group. (1998). A lexicalized tree adjoining grammar for English. Technical Report IRCS 98-18, Institute for Research in Cognitive Science, University of Pennsylvania, (pp. 5–16).

Download references

Acknowledgments

This research was supported by a grant from the Iran Telecommunication Research Center (ITRC).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mohammad Bahrani or Hossein Sameti.

Appendix: Test sentences

Appendix: Test sentences

figure a
figure b

About this article

Cite this article

Bahrani, M., Sameti, H. & Hafezi Manshadi, M. A computational grammar for Persian based on GPSG. Lang Resources & Evaluation 45, 387–408 (2011). https://doi.org/10.1007/s10579-011-9144-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-011-9144-1

Keywords

Navigation