skip to main content
research-article

Machine Translation for Historical Research: A Case Study of Aramaic-Ancient Hebrew Translations

Published:23 February 2024Publication History
Skip Abstract Section

Abstract

In this article, by the ability to translate Aramaic to another spoken languages, we investigated machine translation in a cultural heritage domain for two primary purposes: evaluating the quality of ancient translations and preserving Aramaic (an endangered language). First, we detailed the construction of a publicly available Biblical parallel Aramaic-Hebrew corpus based on two ancient (early 2nd to late 4th century) Hebrew-Aramaic translations: Targum Onkelus and Targum Jonathan. Then using the statistical machine translation approach, which in our use case significantly outperforms neural machine translation, we validated the excepted high quality of the translations. The trained model failed to translate Aramaic texts of other dialects. However, when we trained the same statistical machine translation model on another Aramaic-Hebrew corpus of a different dialect (Zohar, 13th century), a very high translation score was achieved. We examined an additional important cultural heritage source of Aramaic texts, the Babylonian Talmud (early 3rd to late 5th century). Since we do not have a parallel Aramaic-Hebrew corpus of the Talmud, we used the model trained on the Bible corpus for translation. We performed an analysis of the results and suggest some potential promising future research.

REFERENCES

  1. [1] Adelani David Ifeoluwa, Ruiter Dana, Alabi Jesujoba O., Adebonojo Damilola, Ayeni Adesina, Adeyemi Mofetoluwa, Awokoya Ayodele, and España-Bonet Cristina. 2021. MENYO-20k: A multi-domain English-Yorubá corpus for machine translation and domain adaptation. CoRR abs/2103.08647 (2021).Google ScholarGoogle Scholar
  2. [2] Ahia Orevaoghene, Kreutzer Julia, and Hooker Sara. 2021. The low-resource double bind: An empirical study of pruning for low-resource machine translation. arXiv preprint arXiv:2110.03036 (2021).Google ScholarGoogle Scholar
  3. [3] Aji Alham Fikri, Bogoychev Nikolay, Heafield Kenneth, and Sennrich Rico. 2020. In neural machine translation, what does transfer learning transfer? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 77017710. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Alexander Philip S.. 1992. “Targum, Targumim.” In Anchor Bible Dictionary, David Noel Freedman (Ed.). Anchor Bible, 320–31.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Alvarez-Melis David and Jaakkola Tommi. 2017. A causal framework for explaining the predictions of black-box sequence-to-sequence models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 412421.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Araabi Ali and Monz Christof. 2020. Optimizing Transformer for low-resource neural machine translation. In Proceedings of the 28th International Conference on Computational Linguistics. 34293435.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Armaselu Florentina, Apostol Elena-Simona, Chiarcos Christian, Khan Anas Fahad, Liebeskind Chaya, McGillivray Barbara, Truică Ciprian-Octavian, and Valūnaitė-Oleškevičienė Giedrė. 2022. Tracing semantic change with multilingual LLOD and diachronic word embeddings. In Proceedings of the International Scientific Interdisciplinary Conference.Google ScholarGoogle Scholar
  8. [8] Armaselu Florentina, Apostol Elena-Simona, Khan Anas Fahad, Liebeskind Chaya, McGillivray Barbara, Truică Ciprian-Octavian, and Oleškevičienė Giedrė Valūnaitė. 2021. HISTORIAE, history of socio-cultural transformation as linguistic data science. A humanities use case. In Proceedings of the 3rd Conference on Language, Data, and Knowledge (LDK ’21).Google ScholarGoogle Scholar
  9. [9] Asgari Ehsaneddin and Schütze Hinrich. 2017. Past, present, future: A computational investigation of the typology of tense in 1000 languages. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 113124.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Assael Yannis, Sommerschield Thea, and Prag Jonathan. 2019. Restoring ancient text using deep learning: A case study on Greek epigraphy. arXiv preprint arXiv:1910.06262 (2019).Google ScholarGoogle Scholar
  11. [11] Axelrod Amittai, He Xiaodong, and Gao Jianfeng. 2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 355362.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Ayan Necip Fazil and Dorr Bonnie J.. 2006. A maximum entropy approach to combining word alignments. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 96103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Ballier Nicolas, Amari Nabil, Merat Laure, and Yunès Jean-Baptiste. 2020. The learnability of the annotated input in NMT replicating (Vanmassenhove and Way, 2018) with OpenNMT. In Proceedings of the 12th Language Resources and Evaluation Conference. 56315640.Google ScholarGoogle Scholar
  14. [14] Bareket Dan and Tsarfaty Reut. 2021. Neural modeling for named entities and morphology (NEMO2). Transactions of the Association for Computational Linguistics 9 (2021), 909928.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Basmatkar Pranjali, Holani Hemant, and Kaushal Shivani. 2019. Survey on neural machine translation for multilingual translation system. In Proceedings of the 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC ’19). IEEE, Los Alamitos, CA, 443448.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Birch Alexandra, Callison-Burch Chris, Osborne Miles, and Koehn Philipp. 2006. Constraining the phrase-based, joint probability statistical translation model. In Proceedings of the Workshop on Statistical Machine Translation. 154157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Bouma Gerlof, Coussé Evie, Dijkstra Trude, and Sijs Nicoline van der. 2020. The EDGeS Diachronic Bible Corpus. In Proceedings of the 12th Language Resources and Evaluation Conference. 52325239.Google ScholarGoogle Scholar
  18. [18] Brown Peter F., Pietra Vincent J. Della, Pietra Stephen A. Della, and Mercer Robert L.. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 2 (1993), 263311.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Cho Kyunghyun, Merriënboer Bart van, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Syntax, Semantics, and Structure in Statistical Translation (SSST ’14). 103.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Chriqui Avihay and Yahav Inbal. 2022. HeBERT and HebEMO: A Hebrew BERT model and a tool for polarity analysis and emotion recognition. INFORMS Journal on Data Science 1, 1 (2022), 8195.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Christodouloupoulos Christos and Steedman Mark. 2015. A massively parallel corpus: The Bible in 100 languages. Language Resources and Evaluation 49, 2 (2015), 375395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Chu Chenhui and Wang Rui. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. 13041319.Google ScholarGoogle Scholar
  23. [23] Chu Chenhui and Wang Rui. 2020. A survey of domain adaptation for machine translation. Journal of Information Processing 28 (2020), 413426.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Clanuwat Tarin, Lamb Alex, and Kitamoto Asanobu. 2019. KuroNet: Pre-modern Japanese Kuzushiji character recognition with deep learning. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR ’19). IEEE, Los Alamitos, CA, 607614.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Crater David T.. 2022. Hebrew Transformed: Machine Translation of Hebrew Using the Transformer Architecture. Ph.D. Dissertation. Harvard University.Google ScholarGoogle Scholar
  26. [26] Dabre Raj, Chu Chenhui, and Kunchukuttan Anoop. 2020. A survey of multilingual neural machine translation. ACM Computing Surveys 53, 5 (2020), 138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] DeNero John, Gillick Dan, Zhang James, and Klein Dan. 2006. Why generative phrase models underperform surface heuristics. In Proceedings of the Workshop on Statistical Machine Translation. 3138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  29. [29] Fraser Alexander and Marcu Daniel. 2007. Getting the structure right for word alignment: LEAF. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL ’07).Google ScholarGoogle Scholar
  30. [30] Garcia Eva Martínez and Tejedor Álvaro García. 2020. Latin-Spanish neural machine translation: From the Bible to Saint Augustine. In Proceedings of the 2020 1st Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA ’20). 9499.Google ScholarGoogle Scholar
  31. [31] Germann Ulrich, Jahr Michael, Knight Kevin, Marcu Daniel, and Yamada Kenji. 2004. Fast and optimal decoding for machine translation. Artificial Intelligence 154, 1-2 (2004), 127143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Guzmán Francisco, Chen Peng-Jen, Ott Myle, Pino Juan, Lample Guillaume, Koehn Philipp, Chaudhary Vishrav, and Ranzato Marc’Aurelio. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali–English and Sinhala–English. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19). 61006113.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] HaCohen-Kerner Yaakov, Kass Ariel, and Peretz Ariel. 2008. Combined one sense disambiguation of abbreviations. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers (HLT-Short ’08). 6164. http://dl.acm.org/citation.cfm?id=1557690.1557707Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] HaCohen-Kerner Yaakov, Kass Ariel, and Peretz Ariel. 2010. HAADS: A Hebrew Aramaic abbreviation disambiguation system. Journal of the American Society for Information Science and Technology 61, 9 (2010), 19231932.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] HaCohen-Kerner Yaakov, Kass Ariel, and Peretz Ariel. 2013. Initialism disambiguation: Man versus machine. Journal of the American Society for Information Science and Technology 64, 10 (2013), 21332148.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] HaCohen-Kerner Yaakov, Schweitzer Nadav, and Mughaz Dror. 2011. Automatically identifying citations in Hebrew-Aramaic documents. Cybernetics and Systems: An International Journal 42, 3 (2011), 180197.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Ide Nancy. 1998. Encoding linguistic corpora. In Proceedings of the 6th Workshop on Very Large Corpora.Google ScholarGoogle Scholar
  38. [38] Irvine Ann and Callison-Burch Chris. 2013. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation. 262270.Google ScholarGoogle Scholar
  39. [39] Ittycheriah Abraham and Roukos Salim. 2005. A maximum entropy word aligner for Arabic-English machine translation. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 8996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Jelinek Frederick. 1997. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Johnson Melvin, Schuster Mike, Le Quoc V., Krikun Maxim, Wu Yonghui, Chen Zhifeng, Thorat Nikhil, Viégas Fernanda, Wattenberg Martin, Corrado Greg, et al. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339351.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Jurafsky Dan and Martin James H.. 2009. Speech and Language Processing: An Introduction to Natural LanguageProcessing, Computational Linguistics, and Speech Recognition (2nd ed.). Prentice Hall.Google ScholarGoogle Scholar
  43. [43] Kalchbrenner Nal and Blunsom Phil. 2013. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 17001709.Google ScholarGoogle Scholar
  44. [44] Kindt Bastien. 2018. Processing tools for Greek and other languages of the Christian Middle East. Journal of Data Mining & Digital Humanities. Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Klein Guillaume, Hernandez François, Nguyen Vincent, and Senellart Jean. 2020. The OpenNMT neural machine translation toolkit: 2020 edition. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA ’20). 102109.Google ScholarGoogle Scholar
  46. [46] Klein Guillaume, Hernandez François, Nguyen Vincent, and Senellart Jean. 2020. The OpenNMT neural machine translation toolkit: 2020 edition. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA ’20). 102109.Google ScholarGoogle Scholar
  47. [47] Klein Guillaume, Kim Yoon, Deng Yuntian, Nguyen Vincent, Senellart Jean, and Rush Alexander. 2018. OpenNMT: Neural machine translation toolkit. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers). 177184. https://www.aclweb.org/anthology/W18-1817Google ScholarGoogle Scholar
  48. [48] Klein Guillaume, Kim Yoon, Deng Yuntian, Senellart Jean, and Rush Alexander. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics, Vancouver, Canada, 6772. https://www.aclweb.org/anthology/P17-4012Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Klein Guillaume, Zhang Dakun, Chouteau Clément, Crego Josep M., and Senellart Jean. 2020. Efficient and high-quality neural machine translation with OpenNMT. In Proceedings of the 4th Workshop on Neural Generation and Translation. 211217.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Klein Michael L.. 1976. Converse translation: A Targumic technique. Biblica 57, 4 (1976), 515537.Google ScholarGoogle Scholar
  51. [51] Klein Stav and Tsarfaty Reut. 2020. Getting the## life out of living: How adequate are word-pieces for modelling complex morphology? In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology. 204209.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Knight Kevin. 1997. Automating knowledge acquisition for machine translation. AI Magazine 18, 4 (1997), 81.Google ScholarGoogle Scholar
  53. [53] Knight Kevin. 1999. A Statistical MT Tutorial Workbook. Kevin Knight.Google ScholarGoogle Scholar
  54. [54] Kocmi Tom and Bojar Ondřej. 2018. Trivial transfer learning for low-resource neural machine translation. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. 244252.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Koehn Philipp. 2004. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In Proceedings of the Conference of the Association for Machine Translation in the Americas. 115124.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Koehn Philipp. 2009. Statistical Machine Translation. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Koehn Philipp, Hoang Hieu, Birch Alexandra, Callison-Burch Chris, Federico Marcello, Bertoldi Nicola, Cowan Brooke, Shen Wade, Moran Christine, Zens Richard, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. 177180.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Koehn Philipp, Och Franz Josef, and Marcu Daniel. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 127133.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Koppel Moshe. 2011. The Responsa Project: Some promising future directions. In Language, Culture, Computation, Computing of the Humanities, Law, and Narratives. Lecture Notes in Computer Science, Vol. 8002. Springer, 1–8.Google ScholarGoogle Scholar
  60. [60] Koppel Moshe, Mughaz Dror, and Akiva Navot. 2003. CHAT: A system for stylistic classification of Hebrew-Aramaic texts. In Proceedings of the 3rd KDD Workshop on Operational Text Categorization.Google ScholarGoogle Scholar
  61. [61] Koppel Moshe, Mughaz Dror, and Akiva Navot. 2006. New methods for attribution of rabbinic literature. Hebrew Linguistics: A Journal for Hebrew Descriptive, Computational and Applied Linguistics 57 (2006), 518.Google ScholarGoogle Scholar
  62. [62] Koppel Moshe and Schweitzer Nadav. 2014. Measuring direct and indirect authorial influence in historical corpora. Journal of the Association for Information Science and Technology 65, 10 (2014), 21382144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Lacoste-Julien Simon, Taskar Ben, Klein Dan, and Jordan Michael I.. 2006. Word alignment via quadratic assignment. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 112119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Lankford Séamus, Afli Haithem, and Way Andy. 2021. Transformers for low-resource languages: Is Féidir Linn! In Proceedings of the Machine Translation Summit XVIII: Research Track.Google ScholarGoogle Scholar
  65. [65] Laskar Sahinur Rahman, Singh Rahul, Karim Md. Faizal, Manna Riyanka, Pakray Partha, and Bandyopadhyay Sivaji. 2022. Investigation of English to Hindi multimodal neural machine translation using transliteration-based phrase pairs augmentation. In Proceedings of the 9th Workshop on Asian Translation. 117122. https://aclanthology.org/2022.wat-1.15Google ScholarGoogle Scholar
  66. [66] Lavie Alon, Peterson Erik, Probst Katharina, Wintner Shuly, and Eytani Yaniv. 2004. Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages.Google ScholarGoogle Scholar
  67. [67] Lee Jason, Cho Kyunghyun, and Hofmann Thomas. 2017. Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics 5 (2017), 365378.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Liang Percy, Taskar Ben, and Klein Dan. 2006. Alignment by agreement. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 104111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Liebeskind Chaya, Dagan Ido, and Schler Jonathan. 2012. Statistical thesaurus construction for a morphologically rich language. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics—Volume 1: Proceedings of the Main Conference and the Shared Task (*SEM ’12), and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval ’12). 5964.Google ScholarGoogle Scholar
  70. [70] Liebeskind Chaya, Dagan Ido, and Schler Jonathan. 2016. Semiautomatic construction of cross-period thesaurus. Journal on Computing and Cultural Heritage 9, 4 (2016), 22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Liebeskind Chaya, Dagan Ido, and Schler Jonathan. 2019. An algorithmic scheme for statistical thesaurus construction in a morphologically rich language. Applied Artificial Intelligence 33, 6 (2019), 483496.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Liebeskind Chaya and Liebeskind Shmuel. 2020. Automatic construction of Aramaic-Hebrew translation lexicon. In Proceedings of the 1st Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA ’20). 1016.Google ScholarGoogle Scholar
  73. [73] Liebeskind Chaya and Liebeskind Shmuel. 2020. Deep learning for period classification of historical Hebrew texts. Journal of Data Mining & Digital Humanities 20 (2020), 1–22.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Ling Jeffrey and Rush Alexander M.. 2017. Coarse-to-fine attention models for document summarization. In Proceedings of the Workshop on New Frontiers in Summarization. 33–42.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Liu Yang, Liu Qun, and Lin Shouxun. 2005. Log-linear models for word alignment. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 459466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Lopez Adam. 2008. Statistical machine translation. ACM Computing Surveys 40, 3 (2008), 8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Malaviya Chaitanya, Neubig Graham, and Littell Patrick. 2017. Learning language representations for typology prediction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 25292535.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Marcu Daniel and Wong William. 2002. A phrase-based, joint probability model for statistical machine translation. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing—Volume 10. 133139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. [79] Mayer Thomas and Cysouw Michael. 2014. Creating a massively parallel Bible corpus. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC ’14). 31583163.Google ScholarGoogle Scholar
  80. [80] McCarthy Arya D., Wicks Rachel, Lewis Dylan, Mueller Aaron, Wu Winston, Adams Oliver, Nicolai Garrett, Post Matt, and Yarowsky David. 2020. The Johns Hopkins University Bible Corpus: 1600+ tongues for typological exploration. In Proceedings of the 12th Language Resources and Evaluation Conference. 28842892.Google ScholarGoogle Scholar
  81. [81] Melamed Ezra Zion. 2005. Aramaic-Hebrew-English Dictionary of the Babylonian Talmud. Feldheim Publishers.Google ScholarGoogle Scholar
  82. [82] Moghaz Dror, Hacohen-Kerner Yaakov, and Gabbay Dov. 2019. Text mining for evaluating authors’ birth and death years. ACM Transactions on Knowledge Discovery from Data 13, 1 (2019), 7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Moore Robert C., Yih Wen-Tau, and Bode Andreas. 2006. Improved discriminative bilingual word alignment. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 513520.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. [84] Mortensen Beverly P.. 2006. The Priesthood in Targum Pseudo-Jonathan. Renewing the Profession. Vol. 1. Brill.Google ScholarGoogle Scholar
  85. [85] Mughaz Dror. 2003. Classification of Hebrew Texts According to Style. Unpublished Master’s Thesis [in Hebrew]. Bar-Ilan University, Ramat-Gan, Israel.Google ScholarGoogle Scholar
  86. [86] Mughaz Dror, HaCohen-Kerner Yaakov, and Gabbay Dov. 2017. Mining and using key-words and key-phrases to identify the era of an anonymous text. In Transactions on Computational Collective Intelligence XXVI. Springer, 119143.Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] Murao Kazuma, Kobayashi Ken, Kobayashi Hayato, Yatsuka Taichi, Masuyama Takeshi, Higurashi Tatsuru, and Tabuchi Yoshimune. 2019. A case study on neural headline generation for editing support. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 7382.Google ScholarGoogle ScholarCross RefCross Ref
  88. [88] Nguyen Toan Q. and Chiang David. 2017. Transfer learning across low-resource, related languages for neural machine translation. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 296301.Google ScholarGoogle Scholar
  89. [89] Nießen Sonja, Vogel Stephan, Ney Hermann, and Tillmann Christoph. 1998. A DP based search algorithm for statistical machine translation. In Proceedings of the 17th International Conference on Computational Linguistics—Volume 2. 960967.Google ScholarGoogle Scholar
  90. [90] Och Franz Josef. 2003. Statistical Machine Translation: From Single Word Models to Alignment Templates. Ph.D. Dissertation. Aachen, Technische Hochschule.Google ScholarGoogle Scholar
  91. [91] Och Franz Josef and Ney Hermann. 2000. A comparison of alignment models for statistical machine translation. In Proceedings of the 18th Conference on Computational Linguistics—Volume 2.10861090.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. [92] Och Franz Josef and Ney Hermann. 2001. Statistical multi-source translation. In Proceedings of the Machine Translation Summit, Vol. 8. 253258.Google ScholarGoogle Scholar
  93. [93] Och Franz Josef and Ney Hermann. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 1951.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. [94] Och Franz Josef, Tillmann Christoph, and Ney Hermann. 1999. Improved alignment models for statistical machine translation. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.Google ScholarGoogle Scholar
  95. [95] Östling Robert and Tiedemann Jörg. 2017. Continuous multilinguality with language vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 2. 644649.Google ScholarGoogle ScholarCross RefCross Ref
  96. [96] Pagé-Perron Émilie, Sukhareva Maria, Khait Ilya, and Chiarcos Christian. 2017. Machine translation and automated analysis of the Sumerian language. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities, and Literature. 1016.Google ScholarGoogle ScholarCross RefCross Ref
  97. [97] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. [98] Park Chanjun, Lee Chanhee, Yang Yeongwook, and Lim Heuiseok. 2020. Ancient Korean neural machine translation. IEEE Access 8 (2020), 116617116625.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Popović Maja. 2015. chrF: Character n-gram F-score for automatic MT evaluation. In Proceedings of the 10th Workshop on Statistical Machine Translation. 392395. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  100. [100] Popović Maja. 2017. chrF++: Words helping character n-grams. In Proceedings of the 2nd Conference on Machine Translation. 612618.Google ScholarGoogle ScholarCross RefCross Ref
  101. [101] Resnik Philip, Olsen Mari Broman, and Diab Mona. 1999. The Bible as a parallel corpus: Annotating the ‘Book of 2000 Tongues.’ Computers and the Humanities 33, 1 (1999), 129153.Google ScholarGoogle ScholarCross RefCross Ref
  102. [102] Ribeiro Marco Tulio, Singh Sameer, and Guestrin Carlos. 2018. Semantically equivalent adversarial rules for debugging NLP models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 856865.Google ScholarGoogle ScholarCross RefCross Ref
  103. [103] Riyadh Rashed Rubby and Kondrak Grzegorz. 2019. Joint approach to deromanization of code-mixed texts. In Proceedings of the 6th Workshop on NLP for Similar Languages, Varieties, and Dialects. 2634.Google ScholarGoogle ScholarCross RefCross Ref
  104. [104] Seker Amit, Bandel Elron, Bareket Dan, Brusilovsky Idan, Greenfeld Refael Shaked, and Tsarfaty Reut. 2021. AlephBERT: A Hebrew large pre-trained language model to start-off your Hebrew NLP application with. arXiv preprint arXiv:2104.04052 (2021).Google ScholarGoogle Scholar
  105. [105] Sen Sukanta, Hasanuzzaman Mohammed, Ekbal Asif, Bhattacharyya Pushpak, and Way Andy. 2021. Neural machine translation of low-resource languages using SMT phrase pair injection. Natural Language Engineering 27, 3 (2021), 271292.Google ScholarGoogle ScholarCross RefCross Ref
  106. [106] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Controlling politeness in neural machine translation via side constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3540.Google ScholarGoogle ScholarCross RefCross Ref
  107. [107] Sennrich Rico and Zhang Biao. 2019. Revisiting low-resource neural machine translation: A case study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 211221.Google ScholarGoogle ScholarCross RefCross Ref
  108. [108] Shinan Avigdor. 1990. Dating Targum Pseudo-Jonathan: Some more comments. Journal of Jewish Studies 41, 1 (1990), 5761.Google ScholarGoogle ScholarCross RefCross Ref
  109. [109] Shmidman Avi, Guedalia Joshua, Shmidman Shaltiel, Shmidman Cheyn Shmuel, Handel Eli, and Koppel Moshe. 2022. Introducing BEREL: BERT embeddings for rabbinic-encoded language. arXiv preprint arXiv:2208.01875 (2022).Google ScholarGoogle Scholar
  110. [110] Shmidman Avi, Koppel Moshe, and Porat Ely. 2018. Identification of parallel passages across a large Hebrew/Aramaic corpus. Journal of Data Mining & Digital Humanities. Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages.Google ScholarGoogle ScholarCross RefCross Ref
  111. [111] Shmidman Avi, Shmidman Shaltiel, Koppel Moshe, and Goldberg Yoav. 2020. Nakdan: Professional Hebrew diacritizer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 197203.Google ScholarGoogle ScholarCross RefCross Ref
  112. [112] Skadiņa Inguna, Aker Ahmet, Mastropavlos Nikos, Su Fangzhong, Tufis Dan, Verlic Mateja, Vasiļjevs Andrejs, Babych Bogdan, Clough Paul, Gaizauskas Robert, et al. 2012. Collecting and using comparable corpora for statistical machine translation. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC ’12).Google ScholarGoogle Scholar
  113. [113] Snover Matthew, Dorr Bonnie, Schwartz Richard, Micciulla Linnea, and Makhoul John. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 223231.Google ScholarGoogle Scholar
  114. [114] Snyder Benjamin and Barzilay Regina. 2008. Unsupervised multilingual learning for morphological segmentation. In Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, OH, 737745. https://www.aclweb.org/anthology/P08-1084Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. [115] Stringham Nathan and Izbicki Mike. 2020. Evaluating word embeddings on low-resource languages. In Proceedings of the 1st Workshop on Evaluation and Comparison of NLP Systems. 176186.Google ScholarGoogle ScholarCross RefCross Ref
  116. [116] Suissa Omri, Elmalech Avshalom, and Zhitomirsky-Geffet Maayan. 2022. Text analysis using deep neural networks in digital humanities and information science. Journal of the Association for Information Science and Technology 73, 2 (2022), 268287.Google ScholarGoogle ScholarCross RefCross Ref
  117. [117] Suissa Omri, Zhitomirsky-Geffet Maayan, and Elmalech Avshalom. 2022. Toward a period-specific optimized neural network for OCR error correction of historical Hebrew texts. ACM Journal on Computing and Cultural Heritage 15, 2 (2022), 120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. [118] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 31043112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. [119] Swanson Daniel and Tyers Francis. 2022. A universal dependencies treebank of ancient Hebrew. In Proceedings of the 13th Language Resources and Evaluation Conference. 23532361.Google ScholarGoogle Scholar
  120. [120] Tillmann Christoph and Ney Hermann. 2003. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computational Linguistics 29, 1 (2003), 97133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. [121] Tribble Alicia, Vogel Stephan, and Waibel Alex. 2003. Overlapping phrase-level translation rules in an SMT engine. In Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. IEEE, Los Alamitos, CA, 574579.Google ScholarGoogle ScholarCross RefCross Ref
  122. [122] Biljon Elan Van, Pretorius Arnu, and Kreutzer Julia. 2020. On optimal Transformer depth for low-resource language translation. arXiv preprint arXiv:2004.04418 (2020).Google ScholarGoogle Scholar
  123. [123] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS ’17). 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. [124] Venugopal Ashish, Vogel Stephan, and Waibel Alex. 2003. Effective phrase translation extraction from alignment models. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics—Volume 1. 319326.Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. [125] Vogel Stephan, Ney Hermann, and Tillmann Christoph. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th Conference on Computational Linguistics—Volume 2. 836841.Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. [126] Vogel Stephan, Zhang Ying, Huang Fei, Tribble Alicia, Venugopal Ashish, Zhao Bing, and Waibel Alex. 2003. The CMU statistical machine translation system. In Proceedings of the Machine Translation Summit, Vol. 9. 5461.Google ScholarGoogle Scholar
  127. [127] Wang Longyue, Wong Derek F., Chao Lidia S., Lu Yi, and Xing Junwen. 2014. A systematic comparison of data selection criteria for SMT domain adaptation. Scientific World Journal 2014 (2014), 745485.Google ScholarGoogle Scholar
  128. [128] Wang Xinyi, Pham Hieu, Yin Pengcheng, and Neubig Graham. 2018. A tree-based decoder for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 47724777.Google ScholarGoogle ScholarCross RefCross Ref
  129. [129] Wang Ye-Yi and Waibel Alex. 1997. Decoding algorithm in statistical machine translation. In Proceedings of the 8th Conference of the European Chapter of the Association for Computational Linguistics. 366372.Google ScholarGoogle Scholar
  130. [130] Wołk Krzysztof, Rejmund Emilia, and Marasek Krzysztof. 2016. Multi-domain machine translation enhancements by parallel data extraction from comparable corpora. arXiv preprint arXiv:1603.06785 (2016).Google ScholarGoogle Scholar
  131. [131] Wu Dekai. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23, 3 (1997), 377403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. [132] Yang Shuoheng, Wang Yuxin, and Chu Xiaowen. 2020. A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526 (2020).Google ScholarGoogle Scholar
  133. [133] Yang Wei and Lepage Yves. 2014. Inflating a training corpus for SMT by using unrelated unaligned monolingual data. In Proceedings of the International Conference on Natural Language Processing. 236248.Google ScholarGoogle ScholarCross RefCross Ref
  134. [134] Zens Richard, Matusov Evgeny, and Ney Hermann. 2004. Improved word alignment using a symmetric lexicon model. In Proceedings of the 20th International Conference on Computational Linguistics. 36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. [135] Zens Richard, Och Franz Josef, and Ney Hermann. 2002. Phrase-based statistical machine translation. In Proceedings of the Annual Conference on Artificial Intelligence. 1832.Google ScholarGoogle ScholarCross RefCross Ref
  136. [136] Zhang Ying, Vogel Stephan, and Waibel Alex. 2003. Integrated phrase segmentation and alignment algorithm for statistical machine translation. In Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. IEEE, Los Alamitos, CA, 567573.Google ScholarGoogle ScholarCross RefCross Ref
  137. [137] Zohar Hadas, Liebeskind Chaya, Schler Jonathan, and Dagan Ido. 2013. Automatic thesaurus construction for cross generation corpus. Journal on Computing and Cultural Heritage 6, 1 (2013), 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. [138] Zoph Barret, Yuret Deniz, May Jonathan, and Knight Kevin. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 15681575.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Machine Translation for Historical Research: A Case Study of Aramaic-Ancient Hebrew Translations

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal on Computing and Cultural Heritage
            Journal on Computing and Cultural Heritage   Volume 17, Issue 2
            June 2024
            355 pages
            ISSN:1556-4673
            EISSN:1556-4711
            DOI:10.1145/3613557
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 February 2024
            • Online AM: 16 October 2023
            • Accepted: 18 August 2023
            • Revised: 27 June 2023
            • Received: 28 February 2023
            Published in jocch Volume 17, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
          • Article Metrics

            • Downloads (Last 12 months)337
            • Downloads (Last 6 weeks)45

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text