research-article

Machine Translation for Historical Research: A Case Study of Aramaic-Ancient Hebrew Translations

Authors:
Chaya Liebeskind

Department of Computer Science, Jerusalem College of Technology, Jerusalem, Israel

Department of Computer Science, Jerusalem College of Technology, Jerusalem, Israel

0000-0003-0476-3796
View Profile

,
Shmuel Liebeskind

Department of Data Mining, Jerusalem College of Technology, Jerusalem, Israel

Department of Data Mining, Jerusalem College of Technology, Jerusalem, Israel

0009-0006-3205-7038
View Profile

,
Dan Bouhnik

Department of Computer Science, Jerusalem College of Technology, Jerusalem, Israel

Department of Computer Science, Jerusalem College of Technology, Jerusalem, Israel

0000-0002-3141-8819
View Profile

Authors Info & Claims

Journal on Computing and Cultural Heritage Volume 17 Issue 2Article No.: 20pp 1–23https://doi.org/10.1145/3627168

Published:23 February 2024Publication History

Journal on Computing and Cultural Heritage

Abstract

In this article, by the ability to translate Aramaic to another spoken languages, we investigated machine translation in a cultural heritage domain for two primary purposes: evaluating the quality of ancient translations and preserving Aramaic (an endangered language). First, we detailed the construction of a publicly available Biblical parallel Aramaic-Hebrew corpus based on two ancient (early 2^nd to late 4^th century) Hebrew-Aramaic translations: Targum Onkelus and Targum Jonathan. Then using the statistical machine translation approach, which in our use case significantly outperforms neural machine translation, we validated the excepted high quality of the translations. The trained model failed to translate Aramaic texts of other dialects. However, when we trained the same statistical machine translation model on another Aramaic-Hebrew corpus of a different dialect (Zohar, 13^th century), a very high translation score was achieved. We examined an additional important cultural heritage source of Aramaic texts, the Babylonian Talmud (early 3^rd to late 5^th century). Since we do not have a parallel Aramaic-Hebrew corpus of the Talmud, we used the model trained on the Bible corpus for translation. We performed an analysis of the results and suggest some potential promising future research.

REFERENCES

[1] Adelani David Ifeoluwa, Ruiter Dana, Alabi Jesujoba O., Adebonojo Damilola, Ayeni Adesina, Adeyemi Mofetoluwa, Awokoya Ayodele, and España-Bonet Cristina. 2021. MENYO-20k: A multi-domain English-Yorubá corpus for machine translation and domain adaptation. CoRR abs/2103.08647 (2021).Google Scholar
[2] Ahia Orevaoghene, Kreutzer Julia, and Hooker Sara. 2021. The low-resource double bind: An empirical study of pruning for low-resource machine translation. arXiv preprint arXiv:2110.03036 (2021).Google Scholar
[3] Aji Alham Fikri, Bogoychev Nikolay, Heafield Kenneth, and Sennrich Rico. 2020. In neural machine translation, what does transfer learning transfer? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7701–7710. DOI:Google ScholarCross Ref
[4] Alexander Philip S.. 1992. “Targum, Targumim.” In Anchor Bible Dictionary, David Noel Freedman (Ed.). Anchor Bible, 320–31.Google ScholarCross Ref
[5] Alvarez-Melis David and Jaakkola Tommi. 2017. A causal framework for explaining the predictions of black-box sequence-to-sequence models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 412–421.Google ScholarCross Ref
[6] Araabi Ali and Monz Christof. 2020. Optimizing Transformer for low-resource neural machine translation. In Proceedings of the 28th International Conference on Computational Linguistics. 3429–3435.Google ScholarCross Ref
[7] Armaselu Florentina, Apostol Elena-Simona, Chiarcos Christian, Khan Anas Fahad, Liebeskind Chaya, McGillivray Barbara, Truică Ciprian-Octavian, and Valūnaitė-Oleškevičienė Giedrė. 2022. Tracing semantic change with multilingual LLOD and diachronic word embeddings. In Proceedings of the International Scientific Interdisciplinary Conference.Google Scholar
[8] Armaselu Florentina, Apostol Elena-Simona, Khan Anas Fahad, Liebeskind Chaya, McGillivray Barbara, Truică Ciprian-Octavian, and Oleškevičienė Giedrė Valūnaitė. 2021. HISTORIAE, history of socio-cultural transformation as linguistic data science. A humanities use case. In Proceedings of the 3rd Conference on Language, Data, and Knowledge (LDK ’21).Google Scholar
[9] Asgari Ehsaneddin and Schütze Hinrich. 2017. Past, present, future: A computational investigation of the typology of tense in 1000 languages. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 113–124.Google ScholarCross Ref
[10] Assael Yannis, Sommerschield Thea, and Prag Jonathan. 2019. Restoring ancient text using deep learning: A case study on Greek epigraphy. arXiv preprint arXiv:1910.06262 (2019).Google Scholar
[11] Axelrod Amittai, He Xiaodong, and Gao Jianfeng. 2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 355–362.Google ScholarDigital Library
[12] Ayan Necip Fazil and Dorr Bonnie J.. 2006. A maximum entropy approach to combining word alignments. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 96–103.Google ScholarDigital Library
[13] Ballier Nicolas, Amari Nabil, Merat Laure, and Yunès Jean-Baptiste. 2020. The learnability of the annotated input in NMT replicating (Vanmassenhove and Way, 2018) with OpenNMT. In Proceedings of the 12th Language Resources and Evaluation Conference. 5631–5640.Google Scholar
[14] Bareket Dan and Tsarfaty Reut. 2021. Neural modeling for named entities and morphology (NEMO2). Transactions of the Association for Computational Linguistics 9 (2021), 909–928.Google ScholarCross Ref
[15] Basmatkar Pranjali, Holani Hemant, and Kaushal Shivani. 2019. Survey on neural machine translation for multilingual translation system. In Proceedings of the 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC ’19). IEEE, Los Alamitos, CA, 443–448.Google ScholarCross Ref
[16] Birch Alexandra, Callison-Burch Chris, Osborne Miles, and Koehn Philipp. 2006. Constraining the phrase-based, joint probability statistical translation model. In Proceedings of the Workshop on Statistical Machine Translation. 154–157.Google ScholarDigital Library
[17] Bouma Gerlof, Coussé Evie, Dijkstra Trude, and Sijs Nicoline van der. 2020. The EDGeS Diachronic Bible Corpus. In Proceedings of the 12th Language Resources and Evaluation Conference. 5232–5239.Google Scholar
[18] Brown Peter F., Pietra Vincent J. Della, Pietra Stephen A. Della, and Mercer Robert L.. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 2 (1993), 263–311.Google ScholarDigital Library
[19] Cho Kyunghyun, Merriënboer Bart van, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Syntax, Semantics, and Structure in Statistical Translation (SSST ’14). 103.Google ScholarCross Ref
[20] Chriqui Avihay and Yahav Inbal. 2022. HeBERT and HebEMO: A Hebrew BERT model and a tool for polarity analysis and emotion recognition. INFORMS Journal on Data Science 1, 1 (2022), 81–95.Google ScholarCross Ref
[21] Christodouloupoulos Christos and Steedman Mark. 2015. A massively parallel corpus: The Bible in 100 languages. Language Resources and Evaluation 49, 2 (2015), 375–395.Google ScholarDigital Library
[22] Chu Chenhui and Wang Rui. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. 1304–1319.Google Scholar
[23] Chu Chenhui and Wang Rui. 2020. A survey of domain adaptation for machine translation. Journal of Information Processing 28 (2020), 413–426.Google ScholarCross Ref
[24] Clanuwat Tarin, Lamb Alex, and Kitamoto Asanobu. 2019. KuroNet: Pre-modern Japanese Kuzushiji character recognition with deep learning. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR ’19). IEEE, Los Alamitos, CA, 607–614.Google ScholarCross Ref
[25] Crater David T.. 2022. Hebrew Transformed: Machine Translation of Hebrew Using the Transformer Architecture. Ph.D. Dissertation. Harvard University.Google Scholar
[26] Dabre Raj, Chu Chenhui, and Kunchukuttan Anoop. 2020. A survey of multilingual neural machine translation. ACM Computing Surveys 53, 5 (2020), 1–38.Google ScholarDigital Library
[27] DeNero John, Gillick Dan, Zhang James, and Klein Dan. 2006. Why generative phrase models underperform surface heuristics. In Proceedings of the Workshop on Statistical Machine Translation. 31–38.Google ScholarDigital Library
[28] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
[29] Fraser Alexander and Marcu Daniel. 2007. Getting the structure right for word alignment: LEAF. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL ’07).Google Scholar
[30] Garcia Eva Martínez and Tejedor Álvaro García. 2020. Latin-Spanish neural machine translation: From the Bible to Saint Augustine. In Proceedings of the 2020 1st Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA ’20). 94–99.Google Scholar
[31] Germann Ulrich, Jahr Michael, Knight Kevin, Marcu Daniel, and Yamada Kenji. 2004. Fast and optimal decoding for machine translation. Artificial Intelligence 154, 1-2 (2004), 127–143.Google ScholarDigital Library
[32] Guzmán Francisco, Chen Peng-Jen, Ott Myle, Pino Juan, Lample Guillaume, Koehn Philipp, Chaudhary Vishrav, and Ranzato Marc’Aurelio. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali–English and Sinhala–English. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19). 6100–6113.Google ScholarCross Ref
[33] HaCohen-Kerner Yaakov, Kass Ariel, and Peretz Ariel. 2008. Combined one sense disambiguation of abbreviations. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers (HLT-Short ’08). 61–64. http://dl.acm.org/citation.cfm?id=1557690.1557707Google ScholarDigital Library
[34] HaCohen-Kerner Yaakov, Kass Ariel, and Peretz Ariel. 2010. HAADS: A Hebrew Aramaic abbreviation disambiguation system. Journal of the American Society for Information Science and Technology 61, 9 (2010), 1923–1932.Google ScholarDigital Library
[35] HaCohen-Kerner Yaakov, Kass Ariel, and Peretz Ariel. 2013. Initialism disambiguation: Man versus machine. Journal of the American Society for Information Science and Technology 64, 10 (2013), 2133–2148.Google ScholarCross Ref
[36] HaCohen-Kerner Yaakov, Schweitzer Nadav, and Mughaz Dror. 2011. Automatically identifying citations in Hebrew-Aramaic documents. Cybernetics and Systems: An International Journal 42, 3 (2011), 180–197.Google ScholarDigital Library
[37] Ide Nancy. 1998. Encoding linguistic corpora. In Proceedings of the 6th Workshop on Very Large Corpora.Google Scholar
[38] Irvine Ann and Callison-Burch Chris. 2013. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation. 262–270.Google Scholar
[39] Ittycheriah Abraham and Roukos Salim. 2005. A maximum entropy word aligner for Arabic-English machine translation. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 89–96.Google ScholarDigital Library
[40] Jelinek Frederick. 1997. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA.Google ScholarDigital Library
[41] Johnson Melvin, Schuster Mike, Le Quoc V., Krikun Maxim, Wu Yonghui, Chen Zhifeng, Thorat Nikhil, Viégas Fernanda, Wattenberg Martin, Corrado Greg, et al. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339–351.Google ScholarCross Ref
[42] Jurafsky Dan and Martin James H.. 2009. Speech and Language Processing: An Introduction to Natural LanguageProcessing, Computational Linguistics, and Speech Recognition (2nd ed.). Prentice Hall.Google Scholar
[43] Kalchbrenner Nal and Blunsom Phil. 2013. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1700–1709.Google Scholar
[44] Kindt Bastien. 2018. Processing tools for Greek and other languages of the Christian Middle East. Journal of Data Mining & Digital Humanities. Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages.Google ScholarCross Ref
[45] Klein Guillaume, Hernandez François, Nguyen Vincent, and Senellart Jean. 2020. The OpenNMT neural machine translation toolkit: 2020 edition. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA ’20). 102–109.Google Scholar
[46] Klein Guillaume, Hernandez François, Nguyen Vincent, and Senellart Jean. 2020. The OpenNMT neural machine translation toolkit: 2020 edition. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA ’20). 102–109.Google Scholar
[47] Klein Guillaume, Kim Yoon, Deng Yuntian, Nguyen Vincent, Senellart Jean, and Rush Alexander. 2018. OpenNMT: Neural machine translation toolkit. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers). 177–184. https://www.aclweb.org/anthology/W18-1817Google Scholar
[48] Klein Guillaume, Kim Yoon, Deng Yuntian, Senellart Jean, and Rush Alexander. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics, Vancouver, Canada, 67–72. https://www.aclweb.org/anthology/P17-4012Google ScholarCross Ref
[49] Klein Guillaume, Zhang Dakun, Chouteau Clément, Crego Josep M., and Senellart Jean. 2020. Efficient and high-quality neural machine translation with OpenNMT. In Proceedings of the 4th Workshop on Neural Generation and Translation. 211–217.Google ScholarCross Ref
[50] Klein Michael L.. 1976. Converse translation: A Targumic technique. Biblica 57, 4 (1976), 515–537.Google Scholar
[51] Klein Stav and Tsarfaty Reut. 2020. Getting the## life out of living: How adequate are word-pieces for modelling complex morphology? In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology. 204–209.Google ScholarCross Ref
[52] Knight Kevin. 1997. Automating knowledge acquisition for machine translation. AI Magazine 18, 4 (1997), 81.Google Scholar
[53] Knight Kevin. 1999. A Statistical MT Tutorial Workbook. Kevin Knight.Google Scholar
[54] Kocmi Tom and Bojar Ondřej. 2018. Trivial transfer learning for low-resource neural machine translation. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. 244–252.Google ScholarCross Ref
[55] Koehn Philipp. 2004. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In Proceedings of the Conference of the Association for Machine Translation in the Americas. 115–124.Google ScholarCross Ref
[56] Koehn Philipp. 2009. Statistical Machine Translation. Cambridge University Press.Google ScholarCross Ref
[57] Koehn Philipp, Hoang Hieu, Birch Alexandra, Callison-Burch Chris, Federico Marcello, Bertoldi Nicola, Cowan Brooke, Shen Wade, Moran Christine, Zens Richard, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. 177–180.Google ScholarCross Ref
[58] Koehn Philipp, Och Franz Josef, and Marcu Daniel. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 127–133.Google ScholarCross Ref
[59] Koppel Moshe. 2011. The Responsa Project: Some promising future directions. In Language, Culture, Computation, Computing of the Humanities, Law, and Narratives. Lecture Notes in Computer Science, Vol. 8002. Springer, 1–8.Google Scholar
[60] Koppel Moshe, Mughaz Dror, and Akiva Navot. 2003. CHAT: A system for stylistic classification of Hebrew-Aramaic texts. In Proceedings of the 3rd KDD Workshop on Operational Text Categorization.Google Scholar
[61] Koppel Moshe, Mughaz Dror, and Akiva Navot. 2006. New methods for attribution of rabbinic literature. Hebrew Linguistics: A Journal for Hebrew Descriptive, Computational and Applied Linguistics 57 (2006), 5–18.Google Scholar
[62] Koppel Moshe and Schweitzer Nadav. 2014. Measuring direct and indirect authorial influence in historical corpora. Journal of the Association for Information Science and Technology 65, 10 (2014), 2138–2144.Google ScholarDigital Library
[63] Lacoste-Julien Simon, Taskar Ben, Klein Dan, and Jordan Michael I.. 2006. Word alignment via quadratic assignment. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 112–119.Google ScholarDigital Library
[64] Lankford Séamus, Afli Haithem, and Way Andy. 2021. Transformers for low-resource languages: Is Féidir Linn! In Proceedings of the Machine Translation Summit XVIII: Research Track.Google Scholar
[65] Laskar Sahinur Rahman, Singh Rahul, Karim Md. Faizal, Manna Riyanka, Pakray Partha, and Bandyopadhyay Sivaji. 2022. Investigation of English to Hindi multimodal neural machine translation using transliteration-based phrase pairs augmentation. In Proceedings of the 9th Workshop on Asian Translation. 117–122. https://aclanthology.org/2022.wat-1.15Google Scholar
[66] Lavie Alon, Peterson Erik, Probst Katharina, Wintner Shuly, and Eytani Yaniv. 2004. Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages.Google Scholar
[67] Lee Jason, Cho Kyunghyun, and Hofmann Thomas. 2017. Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics 5 (2017), 365–378.Google ScholarCross Ref
[68] Liang Percy, Taskar Ben, and Klein Dan. 2006. Alignment by agreement. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 104–111.Google ScholarDigital Library
[69] Liebeskind Chaya, Dagan Ido, and Schler Jonathan. 2012. Statistical thesaurus construction for a morphologically rich language. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics—Volume 1: Proceedings of the Main Conference and the Shared Task (*SEM ’12), and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval ’12). 59–64.Google Scholar
[70] Liebeskind Chaya, Dagan Ido, and Schler Jonathan. 2016. Semiautomatic construction of cross-period thesaurus. Journal on Computing and Cultural Heritage 9, 4 (2016), 22.Google ScholarDigital Library
[71] Liebeskind Chaya, Dagan Ido, and Schler Jonathan. 2019. An algorithmic scheme for statistical thesaurus construction in a morphologically rich language. Applied Artificial Intelligence 33, 6 (2019), 483–496.Google ScholarCross Ref
[72] Liebeskind Chaya and Liebeskind Shmuel. 2020. Automatic construction of Aramaic-Hebrew translation lexicon. In Proceedings of the 1st Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA ’20). 10–16.Google Scholar
[73] Liebeskind Chaya and Liebeskind Shmuel. 2020. Deep learning for period classification of historical Hebrew texts. Journal of Data Mining & Digital Humanities 20 (2020), 1–22.Google ScholarCross Ref
[74] Ling Jeffrey and Rush Alexander M.. 2017. Coarse-to-fine attention models for document summarization. In Proceedings of the Workshop on New Frontiers in Summarization. 33–42.Google ScholarCross Ref
[75] Liu Yang, Liu Qun, and Lin Shouxun. 2005. Log-linear models for word alignment. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 459–466.Google ScholarDigital Library
[76] Lopez Adam. 2008. Statistical machine translation. ACM Computing Surveys 40, 3 (2008), 8.Google ScholarDigital Library
[77] Malaviya Chaitanya, Neubig Graham, and Littell Patrick. 2017. Learning language representations for typology prediction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2529–2535.Google ScholarCross Ref
[78] Marcu Daniel and Wong William. 2002. A phrase-based, joint probability model for statistical machine translation. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing—Volume 10. 133–139.Google ScholarDigital Library
[79] Mayer Thomas and Cysouw Michael. 2014. Creating a massively parallel Bible corpus. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC ’14). 3158–3163.Google Scholar
[80] McCarthy Arya D., Wicks Rachel, Lewis Dylan, Mueller Aaron, Wu Winston, Adams Oliver, Nicolai Garrett, Post Matt, and Yarowsky David. 2020. The Johns Hopkins University Bible Corpus: 1600+ tongues for typological exploration. In Proceedings of the 12th Language Resources and Evaluation Conference. 2884–2892.Google Scholar
[81] Melamed Ezra Zion. 2005. Aramaic-Hebrew-English Dictionary of the Babylonian Talmud. Feldheim Publishers.Google Scholar
[82] Moghaz Dror, Hacohen-Kerner Yaakov, and Gabbay Dov. 2019. Text mining for evaluating authors’ birth and death years. ACM Transactions on Knowledge Discovery from Data 13, 1 (2019), 7.Google ScholarDigital Library
[83] Moore Robert C., Yih Wen-Tau, and Bode Andreas. 2006. Improved discriminative bilingual word alignment. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 513–520.Google ScholarDigital Library
[84] Mortensen Beverly P.. 2006. The Priesthood in Targum Pseudo-Jonathan. Renewing the Profession. Vol. 1. Brill.Google Scholar
[85] Mughaz Dror. 2003. Classification of Hebrew Texts According to Style. Unpublished Master’s Thesis [in Hebrew]. Bar-Ilan University, Ramat-Gan, Israel.Google Scholar
[86] Mughaz Dror, HaCohen-Kerner Yaakov, and Gabbay Dov. 2017. Mining and using key-words and key-phrases to identify the era of an anonymous text. In Transactions on Computational Collective Intelligence XXVI. Springer, 119–143.Google ScholarCross Ref
[87] Murao Kazuma, Kobayashi Ken, Kobayashi Hayato, Yatsuka Taichi, Masuyama Takeshi, Higurashi Tatsuru, and Tabuchi Yoshimune. 2019. A case study on neural headline generation for editing support. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 73–82.Google ScholarCross Ref
[88] Nguyen Toan Q. and Chiang David. 2017. Transfer learning across low-resource, related languages for neural machine translation. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 296–301.Google Scholar
[89] Nießen Sonja, Vogel Stephan, Ney Hermann, and Tillmann Christoph. 1998. A DP based search algorithm for statistical machine translation. In Proceedings of the 17th International Conference on Computational Linguistics—Volume 2. 960–967.Google Scholar
[90] Och Franz Josef. 2003. Statistical Machine Translation: From Single Word Models to Alignment Templates. Ph.D. Dissertation. Aachen, Technische Hochschule.Google Scholar
[91] Och Franz Josef and Ney Hermann. 2000. A comparison of alignment models for statistical machine translation. In Proceedings of the 18th Conference on Computational Linguistics—Volume 2.1086–1090.Google ScholarDigital Library
[92] Och Franz Josef and Ney Hermann. 2001. Statistical multi-source translation. In Proceedings of the Machine Translation Summit, Vol. 8. 253–258.Google Scholar
[93] Och Franz Josef and Ney Hermann. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19–51.Google ScholarDigital Library
[94] Och Franz Josef, Tillmann Christoph, and Ney Hermann. 1999. Improved alignment models for statistical machine translation. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.Google Scholar
[95] Östling Robert and Tiedemann Jörg. 2017. Continuous multilinguality with language vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 2. 644–649.Google ScholarCross Ref
[96] Pagé-Perron Émilie, Sukhareva Maria, Khait Ilya, and Chiarcos Christian. 2017. Machine translation and automated analysis of the Sumerian language. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities, and Literature. 10–16.Google ScholarCross Ref
[97] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.Google ScholarDigital Library
[98] Park Chanjun, Lee Chanhee, Yang Yeongwook, and Lim Heuiseok. 2020. Ancient Korean neural machine translation. IEEE Access 8 (2020), 116617–116625.Google ScholarCross Ref
[99] Popović Maja. 2015. chrF: Character n-gram F-score for automatic MT evaluation. In Proceedings of the 10th Workshop on Statistical Machine Translation. 392–395. DOI:Google ScholarCross Ref
[100] Popović Maja. 2017. chrF++: Words helping character n-grams. In Proceedings of the 2nd Conference on Machine Translation. 612–618.Google ScholarCross Ref
[101] Resnik Philip, Olsen Mari Broman, and Diab Mona. 1999. The Bible as a parallel corpus: Annotating the ‘Book of 2000 Tongues.’ Computers and the Humanities 33, 1 (1999), 129–153.Google ScholarCross Ref
[102] Ribeiro Marco Tulio, Singh Sameer, and Guestrin Carlos. 2018. Semantically equivalent adversarial rules for debugging NLP models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 856–865.Google ScholarCross Ref
[103] Riyadh Rashed Rubby and Kondrak Grzegorz. 2019. Joint approach to deromanization of code-mixed texts. In Proceedings of the 6th Workshop on NLP for Similar Languages, Varieties, and Dialects. 26–34.Google ScholarCross Ref
[104] Seker Amit, Bandel Elron, Bareket Dan, Brusilovsky Idan, Greenfeld Refael Shaked, and Tsarfaty Reut. 2021. AlephBERT: A Hebrew large pre-trained language model to start-off your Hebrew NLP application with. arXiv preprint arXiv:2104.04052 (2021).Google Scholar
[105] Sen Sukanta, Hasanuzzaman Mohammed, Ekbal Asif, Bhattacharyya Pushpak, and Way Andy. 2021. Neural machine translation of low-resource languages using SMT phrase pair injection. Natural Language Engineering 27, 3 (2021), 271–292.Google ScholarCross Ref
[106] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Controlling politeness in neural machine translation via side constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 35–40.Google ScholarCross Ref
[107] Sennrich Rico and Zhang Biao. 2019. Revisiting low-resource neural machine translation: A case study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 211–221.Google ScholarCross Ref
[108] Shinan Avigdor. 1990. Dating Targum Pseudo-Jonathan: Some more comments. Journal of Jewish Studies 41, 1 (1990), 57–61.Google ScholarCross Ref
[109] Shmidman Avi, Guedalia Joshua, Shmidman Shaltiel, Shmidman Cheyn Shmuel, Handel Eli, and Koppel Moshe. 2022. Introducing BEREL: BERT embeddings for rabbinic-encoded language. arXiv preprint arXiv:2208.01875 (2022).Google Scholar
[110] Shmidman Avi, Koppel Moshe, and Porat Ely. 2018. Identification of parallel passages across a large Hebrew/Aramaic corpus. Journal of Data Mining & Digital Humanities. Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages.Google ScholarCross Ref
[111] Shmidman Avi, Shmidman Shaltiel, Koppel Moshe, and Goldberg Yoav. 2020. Nakdan: Professional Hebrew diacritizer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 197–203.Google ScholarCross Ref
[112] Skadiņa Inguna, Aker Ahmet, Mastropavlos Nikos, Su Fangzhong, Tufis Dan, Verlic Mateja, Vasiļjevs Andrejs, Babych Bogdan, Clough Paul, Gaizauskas Robert, et al. 2012. Collecting and using comparable corpora for statistical machine translation. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC ’12).Google Scholar
[113] Snover Matthew, Dorr Bonnie, Schwartz Richard, Micciulla Linnea, and Makhoul John. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 223–231.Google Scholar
[114] Snyder Benjamin and Barzilay Regina. 2008. Unsupervised multilingual learning for morphological segmentation. In Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, OH, 737–745. https://www.aclweb.org/anthology/P08-1084Google ScholarDigital Library
[115] Stringham Nathan and Izbicki Mike. 2020. Evaluating word embeddings on low-resource languages. In Proceedings of the 1st Workshop on Evaluation and Comparison of NLP Systems. 176–186.Google ScholarCross Ref
[116] Suissa Omri, Elmalech Avshalom, and Zhitomirsky-Geffet Maayan. 2022. Text analysis using deep neural networks in digital humanities and information science. Journal of the Association for Information Science and Technology 73, 2 (2022), 268–287.Google ScholarCross Ref
[117] Suissa Omri, Zhitomirsky-Geffet Maayan, and Elmalech Avshalom. 2022. Toward a period-specific optimized neural network for OCR error correction of historical Hebrew texts. ACM Journal on Computing and Cultural Heritage 15, 2 (2022), 1–20. Google ScholarDigital Library
[118] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104–3112.Google ScholarDigital Library
[119] Swanson Daniel and Tyers Francis. 2022. A universal dependencies treebank of ancient Hebrew. In Proceedings of the 13th Language Resources and Evaluation Conference. 2353–2361.Google Scholar
[120] Tillmann Christoph and Ney Hermann. 2003. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computational Linguistics 29, 1 (2003), 97–133.Google ScholarDigital Library
[121] Tribble Alicia, Vogel Stephan, and Waibel Alex. 2003. Overlapping phrase-level translation rules in an SMT engine. In Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. IEEE, Los Alamitos, CA, 574–579.Google ScholarCross Ref
[122] Biljon Elan Van, Pretorius Arnu, and Kreutzer Julia. 2020. On optimal Transformer depth for low-resource language translation. arXiv preprint arXiv:2004.04418 (2020).Google Scholar
[123] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS ’17). 1–11.Google ScholarDigital Library
[124] Venugopal Ashish, Vogel Stephan, and Waibel Alex. 2003. Effective phrase translation extraction from alignment models. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics—Volume 1. 319–326.Google ScholarDigital Library
[125] Vogel Stephan, Ney Hermann, and Tillmann Christoph. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th Conference on Computational Linguistics—Volume 2. 836–841.Google ScholarDigital Library
[126] Vogel Stephan, Zhang Ying, Huang Fei, Tribble Alicia, Venugopal Ashish, Zhao Bing, and Waibel Alex. 2003. The CMU statistical machine translation system. In Proceedings of the Machine Translation Summit, Vol. 9. 54–61.Google Scholar
[127] Wang Longyue, Wong Derek F., Chao Lidia S., Lu Yi, and Xing Junwen. 2014. A systematic comparison of data selection criteria for SMT domain adaptation. Scientific World Journal 2014 (2014), 745485.Google Scholar
[128] Wang Xinyi, Pham Hieu, Yin Pengcheng, and Neubig Graham. 2018. A tree-based decoder for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4772–4777.Google ScholarCross Ref
[129] Wang Ye-Yi and Waibel Alex. 1997. Decoding algorithm in statistical machine translation. In Proceedings of the 8th Conference of the European Chapter of the Association for Computational Linguistics. 366–372.Google Scholar
[130] Wołk Krzysztof, Rejmund Emilia, and Marasek Krzysztof. 2016. Multi-domain machine translation enhancements by parallel data extraction from comparable corpora. arXiv preprint arXiv:1603.06785 (2016).Google Scholar
[131] Wu Dekai. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23, 3 (1997), 377–403.Google ScholarDigital Library
[132] Yang Shuoheng, Wang Yuxin, and Chu Xiaowen. 2020. A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526 (2020).Google Scholar
[133] Yang Wei and Lepage Yves. 2014. Inflating a training corpus for SMT by using unrelated unaligned monolingual data. In Proceedings of the International Conference on Natural Language Processing. 236–248.Google ScholarCross Ref
[134] Zens Richard, Matusov Evgeny, and Ney Hermann. 2004. Improved word alignment using a symmetric lexicon model. In Proceedings of the 20th International Conference on Computational Linguistics. 36.Google ScholarDigital Library
[135] Zens Richard, Och Franz Josef, and Ney Hermann. 2002. Phrase-based statistical machine translation. In Proceedings of the Annual Conference on Artificial Intelligence. 18–32.Google ScholarCross Ref
[136] Zhang Ying, Vogel Stephan, and Waibel Alex. 2003. Integrated phrase segmentation and alignment algorithm for statistical machine translation. In Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. IEEE, Los Alamitos, CA, 567–573.Google ScholarCross Ref
[137] Zohar Hadas, Liebeskind Chaya, Schler Jonathan, and Dagan Ido. 2013. Automatic thesaurus construction for cross generation corpus. Journal on Computing and Cultural Heritage 6, 1 (2013), 4.Google ScholarDigital Library
[138] Zoph Barret, Yuret Deniz, May Jonathan, and Knight Kevin. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1568–1575.Google ScholarCross Ref

Index Terms

Machine Translation for Historical Research: A Case Study of Aramaic-Ancient Hebrew Translations
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
      2. Machine translation
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Structured outputs

Recommendations

A Survey of Orthographic Information in Machine Translation
Abstract
Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related ...
Read More
Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies

In this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Read More
Machine translation and its evaluation: a study
Abstract
Machine translation (namely MT) has been one of the most popular fields in computational linguistics and Artificial Intelligence (AI). As one of the most promising approaches, MT can potentially break the language barrier of people from all over ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal on Computing and Cultural Heritage Volume 17, Issue 2
June 2024
355 pages
ISSN:1556-4673
EISSN:1556-4711
DOI:10.1145/3613557
Editor:
Karina Rodriguez-Echavarria
University of Brighton, UK
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 February 2024
- Online AM: 16 October 2023
- Accepted: 18 August 2023
- Revised: 27 June 2023
- Received: 28 February 2023
Published in jocch Volume 17, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bible translation
neural machine translation
statistical machine translation
low-resource languages
Aramaic-Hebrew
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 337
  Total Downloads
- Downloads (Last 12 months)337
- Downloads (Last 6 weeks)45
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Machine Translation for Historical Research: A Case Study of Aramaic-Ancient Hebrew Translations

Journal on Computing and Cultural Heritage

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A Survey of Orthographic Information in Machine Translation

Using Translation Memory to Improve Neural Machine Translations

Machine translation and its evaluation: a study