skip to main content
research-article

PAMR: Persian Abstract Meaning Representation Corpus

Published: 09 March 2024 Publication History

Abstract

One of the most used and well-known semantic representation models is Abstract Meaning Representation (AMR). This representation has had numerous applications in natural language processing tasks in recent years. Currently, for English and Chinese languages, large annotated corpora are available. In addition, in some low-resource languages, related corpora have been generated with less size; although, until now, to the best of our knowledge, there is not any AMR corpus for the Persian/Farsi language. Therefore, the aim of this article is to create a Persian AMR (PAMR) corpus via translating English sentences and adjusting AMR guidelines and to solve the various challenges that are faced in this regard. The result of this research is a corpus, containing 1,020 Persian sentences and their related AMR that can be used in various natural language processing tasks. In this article, to investigate the feasibility of using the corpus, we have applied it to two natural language processing tasks: Sentiment Analysis and Text Summarization.

References

[1]
Nasim Tohidi and Chitra Dadkhah. 2023. Integrated semantic representation (ISR-Model): Syntax-independent model for natural language. Journal of Soft Computing and Information Technology 12, 2 (2023), 74–88.
[2]
Valerio Basile, Johan Bos, Kilian Evang, and Noortje Venhuizen. 2012. A platform for collaborative semantic annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics.
[3]
O. Abend and A. Rappoport. 2013. UCCA: A semantics-based grammatical annotation scheme. In Proceedings of the 10th International Conference on Computational Semantics (IWCS ’13), 2013.
[4]
Alistair Butler and Kei Yoshimoto. 2012. Banking meaning representations from treebanks. Linguistic Issues in Language Technology 7, 6 (2012), 1–22.
[5]
Alena Böhmová, Jan Hajič, Eva Hajičová, and Barbora Hladká. 2003. The Prague Dependency Treebank, Vol. 20. Springer, 103–127.
[6]
H. Uchida, M. Zhu, and T. D. Senta. 1996. An electronic language for communication, understanding and collaboration. UNL: Universal Networking Language, IAS/UNU Tokyo, 1996.
[7]
R. Martins. 2012. Le petit prince in UNL. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12).
[8]
Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse.
[9]
Paul Kingsbury and Martha Palmer. 2002. From TreeBank to PropBank. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02).
[10]
Martha Palmer, Claire Bonial, and Jena D. Hwang. 2017. VerbNet: Capturing English verb behavior, meaning and usage. The Oxford Handbook of Cognitive Science, Oxford University Press.
[11]
Hans Kamp, Josef Van Genabith, and Uwe Reyle. 2011. Discourse representation theory. Handbook of Philosophical Logic, Springer, Dordrecht, 125–394.
[12]
Noortje J. Venhuizen and Harm Brouwer. 2014. Implementing projective discourse representation theory. In Proceedings of the 18th Workshop on the Semantics and Pragmatics of Dialogue, (SemDial ’14 – DialWatt).
[13]
Kevin Knight, Bianca Badarau, Laura Baranescu, Claire Bonial, Madalina Bardocz, Kira Griffitt, Ulf Hermjakob, Daniel Marcu, Martha Palmer, Tim O'Gorman, and Nathan Schneider. 2020. Abstract meaning representation (AMR) annotation release 3.0. Linguistic Data Consortium.
[14]
Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi, and Luke Zettlemoyer. 2017. Neural AMR: Sequence-to-sequence models for parsing and generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.
[15]
J. Flanigan. 2018. Parsing and Generation for the Abstract Meaning Representation. Ph.D. Dissertation. Carnegie Mellon University, Pittsburgh, PA.
[16]
Lucy Vanderwende, Arul Menezes, and Chris Quirk. 2015. An AMR parser for English, French, German, Spanish and Japanese and a new AMR-annotated corpus. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations.
[17]
Nianwen Xue, Ondřej Bojar, Jan Hajič, Martha Palmer, Zdeňka Urešová, and Xiuhong Zhang. 2014. Not an interlingua, but close: Comparison of English AMRs to Chinese and Czech. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14).
[18]
Noelia Migueles-Abraira, Rodrigo Agerri, and Arantza Diaz de Ilarraza. 2018. Annotating abstract meaning representations for Spanish. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18).
[19]
Marco Damonte and Shay B. Cohen. 2018. Cross-lingual abstract meaning representation parsing. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[20]
Michael Pust, Ulf Hermjakob, Kevin Knight, Daniel Marcu, and Jonathan May. 2015. Parsing English into abstract meaning representation using syntax-based machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.
[21]
Yoav Artzi, Kenton Lee, and Luke Zettlemoyer. 2015. Broad-coverage CCG semantic parsing with AMR. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.
[22]
Viet Pham, Long H. B. Nguyen, and Dien Dinh. 2020. Semantic convolutional neural machine translation using AMR for English-Vietnamese. In Proceedings of the 2020 International Conference on Computer Communication and Information Systems (CCCIS’20).
[23]
Zixuan Zhang and Heng Ji. 2021. Abstract meaning representation guided graph encoding and decoding for joint information extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[24]
Nasim Tohidi and Chitra Dadkhah. 2022. A short review of abstract meaning representation applications. Journal of Modeling & Simulation in Electrical & Electronics Engineering 2, 3 (2022), 1–9.
[25]
Bin Li, Yuan Wen, Lijun Bu, Weiguang Qu, and Nianwen Xue. 2016. Annotating the Little Prince with Chinese AMRs. In Proceedings of LAW X – The 10th Linguistic Annotation Workshop.
[26]
Marco Antonio Sobrevilla Cabezudo and Thiago Pardo. 2019. Towards a general abstract meaning representation corpus for Brazilian Portuguese. In Proceedings of the 13th Linguistic Annotation Workshop.
[27]
W. Winiwarter. 2015. JAMRED: A Japanese abstract meaning representation editor. In Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services.
[28]
Johannes Heinecke and Anastasia Shimorina. 2022. Multilingual abstract meaning representation for Celtic languages. In Proceedings of the 4th Celtic Language Technology Workshop within LREC2022.
[29]
Elif Oral, Ali Acar, and Gülşen Eryiğit. 2022. Abstract meaning representation of Turkish. Natural Language Engineering First View, (2022). 1–30.
[30]
Nasim Tohidi and Chitra Dadkhah. 2022. A study on abstract meaning representation applications. In Proceedings of the 1st Conference on Artificial Intelligence and Smart Computing.
[31]
Yuk Wah Wong and Raymond J. Mooney. 2006. Learning for semantic parsing with statistical machine translation. In Proceedings of the Main Conference on Human Language Technology, Conference of the North American Chapter of the Association of Computational Linguistics.
[32]
Li Bin, Wen Yuan, Song Li, Bu Li-jun, Qu Weiguang, and Xue Nianwen. 2017. Construction of Chinese abstract meaning representation corpus with concept-to-word alignment. Journal of Chinese Information Processing 31, 6 (2017), 93–102.
[33]
Bin Li, Yuan Wen, Li Song, Weiguang Qu, and Nianwen Xue. 2019. Building a Chinese AMR bank with concept and relation alignments. Linguistic Issues in Language Technology 18, 1 (2019).
[34]
C. Wang. 2018. Abstract Meaning Representation Parsing. Ph.D. Thesis, Brandeis University.
[35]
Li Song, Yuling Dai, Yihuan Liu, Bin Li, and Weiguang Qu. 2020. Construct a sense-frame aligned predicate lexicon for Chinese AMR corpus. In Proceedings of the 12th Language Resources and Evaluation Conference.
[36]
Rafael Anchiêta and Thiago Pardo. 2018. Towards AMR-BR: A sembank for Brazilian Portuguese language. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18).
[37]
Magali Sanches Duran and Sandra M. Aluisio. 2015. Automatic generation of a lexical resource to support semantic role labeling in Portuguese. In Proceedings of the 4th Joint Conference on Lexical and Computational Semantics.
[38]
Ha Linh and Huyen Nguyen. 2019. A case study on meaning representation for Vietnamese. In Proceedings of the 1st International Workshop on Designing Meaning Representations.
[39]
Zahra Azin and Gülşen Eryiğit. 2019. Towards Turkish abstract meaning representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop.
[40]
Hyonsu Choe, Jiyoon Han, Hyejin Park, Taehwan Oh, Seokwon Park, and Hansaem Kim. 2020. Establishment of Korean abstract semantic representation guidelines and corpus for graph structure representation of sentence meaning. Journal of the Information Science Society 47, 12 (2020), 1134–1141.
[41]
Hyonsu Choe, Jiyoon Han, Hyejin Park, and Hansaem Kim. 2019. Copula and case-stacking annotations for Korean AMR. In Proceedings of the 1st International Workshop on Designing Meaning Representations.
[42]
Janaki Sheth, Young-Suk Lee, Ramón Fernandez Astudillo, Tahira Naseem, Radu Florian, Salim Roukos, and Todd Ward. 2021. Bootstrapping multilingual AMR with contextual word alignments. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume.
[43]
Sinh Trong Vu, Minh Le Nguyen, and Ken Satoh. 2022. Abstract meaning representation for legal documents: An empirical research on a human-annotated dataset. Artificial Intelligence and Law 30, 2 (2022), 221–243.
[44]
W. C. Mann. 1983. An overview of the Penman text generation system. In Proceedings of the 3rd AAAI Conference on Artificial Intelligence.
[45]
Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics 31, 1 (2005), 71–106.
[46]
Kexin Liao, Logan Lebanoff, and Fei Liu. 2018. Abstract meaning representation for multi-document summarization. In Proceedings of the 27th International Conference on Computational Linguistics.
[47]
M. Shamsfard. 2019. Challenges and opportunities in processing low resource languages: A study on Persian. Proceedings of the Language Technologies for All (LT4All'19). 291-295.
[48]
Ehsan Basiri and Arman Kabiri. 2018. Words are important: Improving sentiment analysis in the Persian language by lexicon refining. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 4 (2018), 1–18.
[49]
Nasim Tohidi, Chitra Dadkhah, and Rustam B. Rustamov. 2020. Optimizing the performance of Persian multi-objective question answering system. In Proceedings of the 16th International Conference on Technical and Physical Problems of Engineering.
[50]
Nasim Tohidi, Chitra Dadkhah, and Rustam B. Rustamov. 2021. Optimizing Persian multi-objective question answering system. International Journal on Technical and Physical Problems of Engineering (IJTPE) 13, 46 (2021).
[51]
Nasim Tohidi and Seyed Mohammad Hossein Hasheminejad. 2022. A practice of human-machine collaboration for Persian text summarization. In Proceedings of the 27th International Computer Conference of the Computer Society of Iran.
[52]
Azadeh Mirzaei and Amirsaeid Moloodi. 2016. Persian proposition bank. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference.
[53]
Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, and Mohammad Manthouri. 2021. ParsBERT: Transformer-based model for Persian language understanding. Neural Processing Letters 53 (2021), 3831–3847.
[54]
Kia Dashtipour, Ali Raza, Alexander Gelbukh, Rui Zhang, Erik Cambria, and Amir Hussain. 2020. PerSent 2.0: Persian sentiment lexicon enriched with domain-specific words. In Proceedings of the 10th International Conference on Advances in Brain Inspired Cognitive Systems (BICS’19).
[55]
Majid Abolghasemi, Chitra Dadkhah, and Nasim Tohidi. 2022. HTS-DL: Hybrid text summarization system using deep learning. In Proceedings of the 27th International Computer Conference of the Computer Society of Iran.
[56]
M. Girvan and M. E. J. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences USA 99, 12 (2002), 7821–7826.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 3
March 2024
277 pages
EISSN:2375-4702
DOI:10.1145/3613569
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 March 2024
Online AM: 19 January 2024
Accepted: 14 December 2023
Received: 05 August 2023
Published in TALLIP Volume 23, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Abstract Meaning Representation
  2. Persian
  3. text
  4. corpus
  5. low-resource language
  6. natural language processing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 222
    Total Downloads
  • Downloads (Last 12 months)152
  • Downloads (Last 6 weeks)7
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media