Abstract
In this article, by the ability to translate Aramaic to another spoken languages, we investigated machine translation in a cultural heritage domain for two primary purposes: evaluating the quality of ancient translations and preserving Aramaic (an endangered language). First, we detailed the construction of a publicly available Biblical parallel Aramaic-Hebrew corpus based on two ancient (early 2nd to late 4th century) Hebrew-Aramaic translations: Targum Onkelus and Targum Jonathan. Then using the statistical machine translation approach, which in our use case significantly outperforms neural machine translation, we validated the excepted high quality of the translations. The trained model failed to translate Aramaic texts of other dialects. However, when we trained the same statistical machine translation model on another Aramaic-Hebrew corpus of a different dialect (Zohar, 13th century), a very high translation score was achieved. We examined an additional important cultural heritage source of Aramaic texts, the Babylonian Talmud (early 3rd to late 5th century). Since we do not have a parallel Aramaic-Hebrew corpus of the Talmud, we used the model trained on the Bible corpus for translation. We performed an analysis of the results and suggest some potential promising future research.
- [1] . 2021. MENYO-20k: A multi-domain English-Yorubá corpus for machine translation and domain adaptation. CoRR abs/2103.08647 (2021).Google Scholar
- [2] . 2021. The low-resource double bind: An empirical study of pruning for low-resource machine translation. arXiv preprint arXiv:2110.03036 (2021).Google Scholar
- [3] . 2020. In neural machine translation, what does transfer learning transfer? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7701–7710.
DOI: Google ScholarCross Ref - [4] . 1992. “Targum, Targumim.” In Anchor Bible Dictionary, David Noel Freedman (Ed.). Anchor Bible, 320–31.Google ScholarCross Ref
- [5] . 2017. A causal framework for explaining the predictions of black-box sequence-to-sequence models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 412–421.Google ScholarCross Ref
- [6] . 2020. Optimizing Transformer for low-resource neural machine translation. In Proceedings of the 28th International Conference on Computational Linguistics. 3429–3435.Google ScholarCross Ref
- [7] . 2022. Tracing semantic change with multilingual LLOD and diachronic word embeddings. In Proceedings of the International Scientific Interdisciplinary Conference.Google Scholar
- [8] . 2021. HISTORIAE, history of socio-cultural transformation as linguistic data science. A humanities use case. In Proceedings of the 3rd Conference on Language, Data, and Knowledge (LDK ’21).Google Scholar
- [9] . 2017. Past, present, future: A computational investigation of the typology of tense in 1000 languages. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 113–124.Google ScholarCross Ref
- [10] . 2019. Restoring ancient text using deep learning: A case study on Greek epigraphy. arXiv preprint arXiv:1910.06262 (2019).Google Scholar
- [11] . 2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 355–362.Google ScholarDigital Library
- [12] . 2006. A maximum entropy approach to combining word alignments. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 96–103.Google ScholarDigital Library
- [13] . 2020. The learnability of the annotated input in NMT replicating (Vanmassenhove and Way, 2018) with OpenNMT. In Proceedings of the 12th Language Resources and Evaluation Conference. 5631–5640.Google Scholar
- [14] . 2021. Neural modeling for named entities and morphology (NEMO2). Transactions of the Association for Computational Linguistics 9 (2021), 909–928.Google ScholarCross Ref
- [15] . 2019. Survey on neural machine translation for multilingual translation system. In Proceedings of the 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC ’19). IEEE, Los Alamitos, CA, 443–448.Google ScholarCross Ref
- [16] . 2006. Constraining the phrase-based, joint probability statistical translation model. In Proceedings of the Workshop on Statistical Machine Translation. 154–157.Google ScholarDigital Library
- [17] . 2020. The EDGeS Diachronic Bible Corpus. In Proceedings of the 12th Language Resources and Evaluation Conference. 5232–5239.Google Scholar
- [18] . 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 2 (1993), 263–311.Google ScholarDigital Library
- [19] . 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Syntax, Semantics, and Structure in Statistical Translation (SSST ’14). 103.Google ScholarCross Ref
- [20] . 2022. HeBERT and HebEMO: A Hebrew BERT model and a tool for polarity analysis and emotion recognition. INFORMS Journal on Data Science 1, 1 (2022), 81–95.Google ScholarCross Ref
- [21] . 2015. A massively parallel corpus: The Bible in 100 languages. Language Resources and Evaluation 49, 2 (2015), 375–395.Google ScholarDigital Library
- [22] . 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. 1304–1319.Google Scholar
- [23] . 2020. A survey of domain adaptation for machine translation. Journal of Information Processing 28 (2020), 413–426.Google ScholarCross Ref
- [24] . 2019. KuroNet: Pre-modern Japanese Kuzushiji character recognition with deep learning. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR ’19). IEEE, Los Alamitos, CA, 607–614.Google ScholarCross Ref
- [25] . 2022. Hebrew Transformed: Machine Translation of Hebrew Using the Transformer Architecture. Ph.D. Dissertation. Harvard University.Google Scholar
- [26] . 2020. A survey of multilingual neural machine translation. ACM Computing Surveys 53, 5 (2020), 1–38.Google ScholarDigital Library
- [27] . 2006. Why generative phrase models underperform surface heuristics. In Proceedings of the Workshop on Statistical Machine Translation. 31–38.Google ScholarDigital Library
- [28] . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- [29] . 2007. Getting the structure right for word alignment: LEAF. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL ’07).Google Scholar
- [30] . 2020. Latin-Spanish neural machine translation: From the Bible to Saint Augustine. In Proceedings of the 2020 1st Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA ’20). 94–99.Google Scholar
- [31] . 2004. Fast and optimal decoding for machine translation. Artificial Intelligence 154, 1-2 (2004), 127–143.Google ScholarDigital Library
- [32] . 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali–English and Sinhala–English. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19). 6100–6113.Google ScholarCross Ref
- [33] . 2008. Combined one sense disambiguation of abbreviations. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers (HLT-Short ’08). 61–64. http://dl.acm.org/citation.cfm?id=1557690.1557707Google ScholarDigital Library
- [34] . 2010. HAADS: A Hebrew Aramaic abbreviation disambiguation system. Journal of the American Society for Information Science and Technology 61, 9 (2010), 1923–1932.Google ScholarDigital Library
- [35] . 2013. Initialism disambiguation: Man versus machine. Journal of the American Society for Information Science and Technology 64, 10 (2013), 2133–2148.Google ScholarCross Ref
- [36] . 2011. Automatically identifying citations in Hebrew-Aramaic documents. Cybernetics and Systems: An International Journal 42, 3 (2011), 180–197.Google ScholarDigital Library
- [37] . 1998. Encoding linguistic corpora. In Proceedings of the 6th Workshop on Very Large Corpora.Google Scholar
- [38] . 2013. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation. 262–270.Google Scholar
- [39] . 2005. A maximum entropy word aligner for Arabic-English machine translation. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 89–96.Google ScholarDigital Library
- [40] . 1997. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA.Google ScholarDigital Library
- [41] . 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339–351.Google ScholarCross Ref
- [42] . 2009. Speech and Language Processing: An Introduction to Natural LanguageProcessing, Computational Linguistics, and Speech Recognition (2nd ed.). Prentice Hall.Google Scholar
- [43] . 2013. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1700–1709.Google Scholar
- [44] . 2018. Processing tools for Greek and other languages of the Christian Middle East. Journal of Data Mining & Digital Humanities. Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages.Google ScholarCross Ref
- [45] . 2020. The OpenNMT neural machine translation toolkit: 2020 edition. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA ’20). 102–109.Google Scholar
- [46] . 2020. The OpenNMT neural machine translation toolkit: 2020 edition. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA ’20). 102–109.Google Scholar
- [47] . 2018. OpenNMT: Neural machine translation toolkit. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers). 177–184. https://www.aclweb.org/anthology/W18-1817Google Scholar
- [48] . 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics, Vancouver, Canada, 67–72. https://www.aclweb.org/anthology/P17-4012Google ScholarCross Ref
- [49] . 2020. Efficient and high-quality neural machine translation with OpenNMT. In Proceedings of the 4th Workshop on Neural Generation and Translation. 211–217.Google ScholarCross Ref
- [50] . 1976. Converse translation: A Targumic technique. Biblica 57, 4 (1976), 515–537.Google Scholar
- [51] . 2020. Getting the## life out of living: How adequate are word-pieces for modelling complex morphology? In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology. 204–209.Google ScholarCross Ref
- [52] . 1997. Automating knowledge acquisition for machine translation. AI Magazine 18, 4 (1997), 81.Google Scholar
- [53] . 1999. A Statistical MT Tutorial Workbook. Kevin Knight.Google Scholar
- [54] . 2018. Trivial transfer learning for low-resource neural machine translation. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. 244–252.Google ScholarCross Ref
- [55] . 2004. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In Proceedings of the Conference of the Association for Machine Translation in the Americas. 115–124.Google ScholarCross Ref
- [56] . 2009. Statistical Machine Translation. Cambridge University Press.Google ScholarCross Ref
- [57] . 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. 177–180.Google ScholarCross Ref
- [58] . 2003. Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 127–133.Google ScholarCross Ref
- [59] . 2011. The Responsa Project: Some promising future directions. In Language, Culture, Computation, Computing of the Humanities, Law, and Narratives. Lecture Notes in Computer Science, Vol. 8002. Springer, 1–8.Google Scholar
- [60] . 2003. CHAT: A system for stylistic classification of Hebrew-Aramaic texts. In Proceedings of the 3rd KDD Workshop on Operational Text Categorization.Google Scholar
- [61] . 2006. New methods for attribution of rabbinic literature. Hebrew Linguistics: A Journal for Hebrew Descriptive, Computational and Applied Linguistics 57 (2006), 5–18.Google Scholar
- [62] . 2014. Measuring direct and indirect authorial influence in historical corpora. Journal of the Association for Information Science and Technology 65, 10 (2014), 2138–2144.Google ScholarDigital Library
- [63] . 2006. Word alignment via quadratic assignment. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 112–119.Google ScholarDigital Library
- [64] . 2021. Transformers for low-resource languages: Is Féidir Linn! In Proceedings of the Machine Translation Summit XVIII: Research Track.Google Scholar
- [65] . 2022. Investigation of English to Hindi multimodal neural machine translation using transliteration-based phrase pairs augmentation. In Proceedings of the 9th Workshop on Asian Translation. 117–122. https://aclanthology.org/2022.wat-1.15Google Scholar
- [66] . 2004. Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages.Google Scholar
- [67] . 2017. Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics 5 (2017), 365–378.Google ScholarCross Ref
- [68] . 2006. Alignment by agreement. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 104–111.Google ScholarDigital Library
- [69] . 2012. Statistical thesaurus construction for a morphologically rich language. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics—Volume 1: Proceedings of the Main Conference and the Shared Task (*SEM ’12), and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval ’12). 59–64.Google Scholar
- [70] . 2016. Semiautomatic construction of cross-period thesaurus. Journal on Computing and Cultural Heritage 9, 4 (2016), 22.Google ScholarDigital Library
- [71] . 2019. An algorithmic scheme for statistical thesaurus construction in a morphologically rich language. Applied Artificial Intelligence 33, 6 (2019), 483–496.Google ScholarCross Ref
- [72] . 2020. Automatic construction of Aramaic-Hebrew translation lexicon. In Proceedings of the 1st Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA ’20). 10–16.Google Scholar
- [73] . 2020. Deep learning for period classification of historical Hebrew texts. Journal of Data Mining & Digital Humanities 20 (2020), 1–22.Google ScholarCross Ref
- [74] . 2017. Coarse-to-fine attention models for document summarization. In Proceedings of the Workshop on New Frontiers in Summarization. 33–42.Google ScholarCross Ref
- [75] . 2005. Log-linear models for word alignment. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 459–466.Google ScholarDigital Library
- [76] . 2008. Statistical machine translation. ACM Computing Surveys 40, 3 (2008), 8.Google ScholarDigital Library
- [77] . 2017. Learning language representations for typology prediction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2529–2535.Google ScholarCross Ref
- [78] . 2002. A phrase-based, joint probability model for statistical machine translation. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing—Volume 10. 133–139.Google ScholarDigital Library
- [79] . 2014. Creating a massively parallel Bible corpus. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC ’14). 3158–3163.Google Scholar
- [80] . 2020. The Johns Hopkins University Bible Corpus: 1600+ tongues for typological exploration. In Proceedings of the 12th Language Resources and Evaluation Conference. 2884–2892.Google Scholar
- [81] . 2005. Aramaic-Hebrew-English Dictionary of the Babylonian Talmud. Feldheim Publishers.Google Scholar
- [82] . 2019. Text mining for evaluating authors’ birth and death years. ACM Transactions on Knowledge Discovery from Data 13, 1 (2019), 7.Google ScholarDigital Library
- [83] . 2006. Improved discriminative bilingual word alignment. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 513–520.Google ScholarDigital Library
- [84] . 2006. The Priesthood in Targum Pseudo-Jonathan. Renewing the Profession. Vol. 1. Brill.Google Scholar
- [85] . 2003. Classification of Hebrew Texts According to Style. Unpublished Master’s Thesis [in Hebrew]. Bar-Ilan University, Ramat-Gan, Israel.Google Scholar
- [86] . 2017. Mining and using key-words and key-phrases to identify the era of an anonymous text. In Transactions on Computational Collective Intelligence XXVI. Springer, 119–143.Google ScholarCross Ref
- [87] . 2019. A case study on neural headline generation for editing support. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 73–82.Google ScholarCross Ref
- [88] . 2017. Transfer learning across low-resource, related languages for neural machine translation. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 296–301.Google Scholar
- [89] . 1998. A DP based search algorithm for statistical machine translation. In Proceedings of the 17th International Conference on Computational Linguistics—Volume 2. 960–967.Google Scholar
- [90] . 2003. Statistical Machine Translation: From Single Word Models to Alignment Templates. Ph.D. Dissertation. Aachen, Technische Hochschule.Google Scholar
- [91] . 2000. A comparison of alignment models for statistical machine translation. In Proceedings of the 18th Conference on Computational Linguistics—Volume 2.1086–1090.Google ScholarDigital Library
- [92] . 2001. Statistical multi-source translation. In Proceedings of the Machine Translation Summit, Vol. 8. 253–258.Google Scholar
- [93] . 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19–51.Google ScholarDigital Library
- [94] . 1999. Improved alignment models for statistical machine translation. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.Google Scholar
- [95] . 2017. Continuous multilinguality with language vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 2. 644–649.Google ScholarCross Ref
- [96] . 2017. Machine translation and automated analysis of the Sumerian language. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities, and Literature. 10–16.Google ScholarCross Ref
- [97] . 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.Google ScholarDigital Library
- [98] . 2020. Ancient Korean neural machine translation. IEEE Access 8 (2020), 116617–116625.Google ScholarCross Ref
- [99] . 2015. chrF: Character n-gram F-score for automatic MT evaluation. In Proceedings of the 10th Workshop on Statistical Machine Translation. 392–395.
DOI: Google ScholarCross Ref - [100] . 2017. chrF++: Words helping character n-grams. In Proceedings of the 2nd Conference on Machine Translation. 612–618.Google ScholarCross Ref
- [101] . 1999. The Bible as a parallel corpus: Annotating the ‘Book of 2000 Tongues.’ Computers and the Humanities 33, 1 (1999), 129–153.Google ScholarCross Ref
- [102] . 2018. Semantically equivalent adversarial rules for debugging NLP models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 856–865.Google ScholarCross Ref
- [103] . 2019. Joint approach to deromanization of code-mixed texts. In Proceedings of the 6th Workshop on NLP for Similar Languages, Varieties, and Dialects. 26–34.Google ScholarCross Ref
- [104] . 2021. AlephBERT: A Hebrew large pre-trained language model to start-off your Hebrew NLP application with. arXiv preprint arXiv:2104.04052 (2021).Google Scholar
- [105] . 2021. Neural machine translation of low-resource languages using SMT phrase pair injection. Natural Language Engineering 27, 3 (2021), 271–292.Google ScholarCross Ref
- [106] . 2016. Controlling politeness in neural machine translation via side constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 35–40.Google ScholarCross Ref
- [107] . 2019. Revisiting low-resource neural machine translation: A case study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 211–221.Google ScholarCross Ref
- [108] . 1990. Dating Targum Pseudo-Jonathan: Some more comments. Journal of Jewish Studies 41, 1 (1990), 57–61.Google ScholarCross Ref
- [109] . 2022. Introducing BEREL: BERT embeddings for rabbinic-encoded language. arXiv preprint arXiv:2208.01875 (2022).Google Scholar
- [110] . 2018. Identification of parallel passages across a large Hebrew/Aramaic corpus. Journal of Data Mining & Digital Humanities. Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages.Google ScholarCross Ref
- [111] . 2020. Nakdan: Professional Hebrew diacritizer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 197–203.Google ScholarCross Ref
- [112] . 2012. Collecting and using comparable corpora for statistical machine translation. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC ’12).Google Scholar
- [113] . 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 223–231.Google Scholar
- [114] . 2008. Unsupervised multilingual learning for morphological segmentation. In Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, OH, 737–745. https://www.aclweb.org/anthology/P08-1084Google ScholarDigital Library
- [115] . 2020. Evaluating word embeddings on low-resource languages. In Proceedings of the 1st Workshop on Evaluation and Comparison of NLP Systems. 176–186.Google ScholarCross Ref
- [116] . 2022. Text analysis using deep neural networks in digital humanities and information science. Journal of the Association for Information Science and Technology 73, 2 (2022), 268–287.Google ScholarCross Ref
- [117] . 2022. Toward a period-specific optimized neural network for OCR error correction of historical Hebrew texts. ACM Journal on Computing and Cultural Heritage 15, 2 (2022), 1–20. Google ScholarDigital Library
- [118] . 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104–3112.Google ScholarDigital Library
- [119] . 2022. A universal dependencies treebank of ancient Hebrew. In Proceedings of the 13th Language Resources and Evaluation Conference. 2353–2361.Google Scholar
- [120] . 2003. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computational Linguistics 29, 1 (2003), 97–133.Google ScholarDigital Library
- [121] . 2003. Overlapping phrase-level translation rules in an SMT engine. In Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. IEEE, Los Alamitos, CA, 574–579.Google ScholarCross Ref
- [122] . 2020. On optimal Transformer depth for low-resource language translation. arXiv preprint arXiv:2004.04418 (2020).Google Scholar
- [123] . 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS ’17). 1–11.Google ScholarDigital Library
- [124] . 2003. Effective phrase translation extraction from alignment models. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics—Volume 1. 319–326.Google ScholarDigital Library
- [125] . 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th Conference on Computational Linguistics—Volume 2. 836–841.Google ScholarDigital Library
- [126] . 2003. The CMU statistical machine translation system. In Proceedings of the Machine Translation Summit, Vol. 9. 54–61.Google Scholar
- [127] . 2014. A systematic comparison of data selection criteria for SMT domain adaptation. Scientific World Journal 2014 (2014), 745485.Google Scholar
- [128] . 2018. A tree-based decoder for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4772–4777.Google ScholarCross Ref
- [129] . 1997. Decoding algorithm in statistical machine translation. In Proceedings of the 8th Conference of the European Chapter of the Association for Computational Linguistics. 366–372.Google Scholar
- [130] . 2016. Multi-domain machine translation enhancements by parallel data extraction from comparable corpora. arXiv preprint arXiv:1603.06785 (2016).Google Scholar
- [131] . 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23, 3 (1997), 377–403.Google ScholarDigital Library
- [132] . 2020. A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526 (2020).Google Scholar
- [133] . 2014. Inflating a training corpus for SMT by using unrelated unaligned monolingual data. In Proceedings of the International Conference on Natural Language Processing. 236–248.Google ScholarCross Ref
- [134] . 2004. Improved word alignment using a symmetric lexicon model. In Proceedings of the 20th International Conference on Computational Linguistics. 36.Google ScholarDigital Library
- [135] . 2002. Phrase-based statistical machine translation. In Proceedings of the Annual Conference on Artificial Intelligence. 18–32.Google ScholarCross Ref
- [136] . 2003. Integrated phrase segmentation and alignment algorithm for statistical machine translation. In Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. IEEE, Los Alamitos, CA, 567–573.Google ScholarCross Ref
- [137] . 2013. Automatic thesaurus construction for cross generation corpus. Journal on Computing and Cultural Heritage 6, 1 (2013), 4.Google ScholarDigital Library
- [138] . 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1568–1575.Google ScholarCross Ref
Index Terms
- Machine Translation for Historical Research: A Case Study of Aramaic-Ancient Hebrew Translations
Recommendations
A Survey of Orthographic Information in Machine Translation
AbstractMachine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related ...
Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning TechnologiesIn this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Machine translation and its evaluation: a study
AbstractMachine translation (namely MT) has been one of the most popular fields in computational linguistics and Artificial Intelligence (AI). As one of the most promising approaches, MT can potentially break the language barrier of people from all over ...
Comments