Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement

Wang, Yu-Chun; Wu, Chun-Kai; Tsai, Richard Tzong-Han; Hsiang, Jieh

doi:10.1007/s00354-013-0402-1

Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement

Published: 29 October 2013

Volume 31, pages 265–283, (2013)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Yu-Chun Wang^1,2,
Chun-Kai Wu³,
Richard Tzong-Han Tsai⁴ &
…
Jieh Hsiang¹

204 Accesses
1 Citation
Explore all metrics

Abstract

Transliteration pair extraction, the identification of transliterations of foreign loanwords in literature, is a challenging task in research fields such as historical linguistics and digital humanities. In this paper, we focus on one important type of historical literature: classical Chinese Buddhist texts. We propose an approach which can identify transliteration pairs automatically in classical Chinese texts. Our approach comprises two stages: transliteration extraction and transliteration pair identification. In order to extract more possible transliterations without introducing too many false positives, we adopt a hybrid method consisting of a suffix-array-based extraction step and a language-model based filtering process. Using the ALINE algorithm, we then compare the extracted transliteration candidates for phonetic similarity based on their pronunciations in the middle Chinese rime book Guangyun (

). Pairs with similarity above a certain threshold are considered transliteration pairs. To evaluate our method, we constructed an evaluation set from several Buddhist texts such as the Samyuktagama and the Mahavibhasa, which were translated into Chinese in different eras. Precision and recall are used to measure and show the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Transliteration Techniques and Conventional Transliteration Schemes for Indian Languages

A Corpus-Based Examination of the Translation of the Suffix –ism into Chinese

Automatic Translation from Belarusian into Spanish Based on Using Nooj’s Linguistic Resources

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Shieh, Y.-P., “Appositional Term Clip: A Subject-oriented Appositional Term Extraction Algorithm,” New Eyes for Discovery: Foundations and Imaginations of Digital Humanities, National Taiwan University Press, pp. 133–162, 2011.
Sherif, T. and Kondrak, G., “Bootstrapping a stochastic transducer for Arabic-English transliteration extraction,” In Proc. of Annual Meeting-Association for Computational Linguistics, 2007.
Kuo, J-S., Li, H. and Yang, Y-K., “A Phonetic Similarity Model for Automatic Extraction of Transliteration Pairs,” ACM Trans. Asian Language Information Processing, 6, 2, 2007.
Oh J., Choi K.: “A statistical model for Automatic Extraction of Korean Transliterated Foreign words”. International Journal of Computer Processing of Oriental Languages 16(1), 41–62 (2003)
Article Google Scholar
Goldberg, Y. and Elhadad, M., “Identification of transliterated foreign words in Hebrew script,” Computational Linguistics and Intelligent Text Processing, 2008.
Covington M.A.: “An algorithm to align words for historical comparison”. Computational Linguistics 22(4), 481–496 (1996)
Google Scholar
Kondrak G.: “Phonetic alignment and similarity”. Computers and the Humanities 37(3), 273–291 (2003)
Article Google Scholar
Tiedemann, J., “Extraction of translation equivalents from parallel corpora,” Proc. of the 11th Nordic conference on computational linguistics, pp. 120–128, 1998.
Nakov, P., Pacovski, V. and Paskaleva, E., “Extraction of translation equivalents from parallel corpora,” Proc. of the 11th Nordic conference on computational linguistics, pp. 120–128, 1998.
Ristad E.S., Yianilos P.N.: “Learning string-edit distance”. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5), 522–532 (1998)
Article Google Scholar
Mackay, W. and Kondrak, G., “Computing word similarity and identifying cognates with Pair Hidden Markov Models,” Proc. of the Ninth Conference on Computational Natural Language Learning, pp. 40–47, 2005.
Manzini G., Ferragina P.: “Engineering a lightweight suffix array construction algorithm”. Algorithmica 40(1), 33–50 (2004)
Article MathSciNet MATH Google Scholar
Wang, L., Historical Chinese Phonology, Zhonghua Book Company, 2002.
Cambel, L., Historical linguistics: an introduction, The MIT Press, 1987.
Ciyi, Fo Guang Buddhist Dictionary, Buddha’s Light Publishing, 1988.
Ding, F.-B., Great Dictionary of Buddhism, The Medical Press, 1922.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Yu-Chun Wang & Jieh Hsiang
Telecommunication Laboratories, Chunghwa Telecom, Taipei, Taiwan
Yu-Chun Wang
Department of Computer Science and Engineering, Yuan Ze University, Zhongli, Taiwan
Chun-Kai Wu
Department of Computer Science and Information Engineering, National Central University, Zhongli, Taiwan
Richard Tzong-Han Tsai

Authors

Yu-Chun Wang
View author publications
You can also search for this author inPubMed Google Scholar
Chun-Kai Wu
View author publications
You can also search for this author inPubMed Google Scholar
Richard Tzong-Han Tsai
View author publications
You can also search for this author inPubMed Google Scholar
Jieh Hsiang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Richard Tzong-Han Tsai.

About this article

Cite this article

Wang, YC., Wu, CK., Tsai, R.TH. et al. Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement. New Gener. Comput. 31, 265–283 (2013). https://doi.org/10.1007/s00354-013-0402-1

Download citation

Received: 29 March 2013
Revised: 12 July 2013
Published: 29 October 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s00354-013-0402-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Study on Transliteration Techniques and Conventional Transliteration Schemes for Indian Languages

A Corpus-Based Examination of the Translation of the Suffix –ism into Chinese

Automatic Translation from Belarusian into Spanish Based on Using Nooj’s Linguistic Resources

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now