Skip to main content
Log in

Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Transliteration pair extraction, the identification of transliterations of foreign loanwords in literature, is a challenging task in research fields such as historical linguistics and digital humanities. In this paper, we focus on one important type of historical literature: classical Chinese Buddhist texts. We propose an approach which can identify transliteration pairs automatically in classical Chinese texts. Our approach comprises two stages: transliteration extraction and transliteration pair identification. In order to extract more possible transliterations without introducing too many false positives, we adopt a hybrid method consisting of a suffix-array-based extraction step and a language-model based filtering process. Using the ALINE algorithm, we then compare the extracted transliteration candidates for phonetic similarity based on their pronunciations in the middle Chinese rime book Guangyun (

). Pairs with similarity above a certain threshold are considered transliteration pairs. To evaluate our method, we constructed an evaluation set from several Buddhist texts such as the Samyuktagama and the Mahavibhasa, which were translated into Chinese in different eras. Precision and recall are used to measure and show the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Shieh, Y.-P., “Appositional Term Clip: A Subject-oriented Appositional Term Extraction Algorithm,” New Eyes for Discovery: Foundations and Imaginations of Digital Humanities, National Taiwan University Press, pp. 133–162, 2011.

  2. Sherif, T. and Kondrak, G., “Bootstrapping a stochastic transducer for Arabic-English transliteration extraction,” In Proc. of Annual Meeting-Association for Computational Linguistics, 2007.

  3. Kuo, J-S., Li, H. and Yang, Y-K., “A Phonetic Similarity Model for Automatic Extraction of Transliteration Pairs,” ACM Trans. Asian Language Information Processing, 6, 2, 2007.

  4. Oh J., Choi K.: “A statistical model for Automatic Extraction of Korean Transliterated Foreign words”. International Journal of Computer Processing of Oriental Languages 16(1), 41–62 (2003)

    Article  Google Scholar 

  5. Goldberg, Y. and Elhadad, M., “Identification of transliterated foreign words in Hebrew script,” Computational Linguistics and Intelligent Text Processing, 2008.

  6. Covington M.A.: “An algorithm to align words for historical comparison”. Computational Linguistics 22(4), 481–496 (1996)

    Google Scholar 

  7. Kondrak G.: “Phonetic alignment and similarity”. Computers and the Humanities 37(3), 273–291 (2003)

    Article  Google Scholar 

  8. Tiedemann, J., “Extraction of translation equivalents from parallel corpora,” Proc. of the 11th Nordic conference on computational linguistics, pp. 120–128, 1998.

  9. Nakov, P., Pacovski, V. and Paskaleva, E., “Extraction of translation equivalents from parallel corpora,” Proc. of the 11th Nordic conference on computational linguistics, pp. 120–128, 1998.

  10. Ristad E.S., Yianilos P.N.: “Learning string-edit distance”. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5), 522–532 (1998)

    Article  Google Scholar 

  11. Mackay, W. and Kondrak, G., “Computing word similarity and identifying cognates with Pair Hidden Markov Models,” Proc. of the Ninth Conference on Computational Natural Language Learning, pp. 40–47, 2005.

  12. Manzini G., Ferragina P.: “Engineering a lightweight suffix array construction algorithm”. Algorithmica 40(1), 33–50 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. Wang, L., Historical Chinese Phonology, Zhonghua Book Company, 2002.

  14. Cambel, L., Historical linguistics: an introduction, The MIT Press, 1987.

  15. Ciyi, Fo Guang Buddhist Dictionary, Buddha’s Light Publishing, 1988.

  16. Ding, F.-B., Great Dictionary of Buddhism, The Medical Press, 1922.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Tzong-Han Tsai.

About this article

Cite this article

Wang, YC., Wu, CK., Tsai, R.TH. et al. Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement. New Gener. Comput. 31, 265–283 (2013). https://doi.org/10.1007/s00354-013-0402-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-013-0402-1

Keywords

Navigation