Asian language resources: the state-of-the-art

Tokunaga, Takenobu; Huang, Chu-Ren; Lee, Sophia Yat Mei

doi:10.1007/s10579-008-9071-y

Asian language resources: the state-of-the-art

Introduction
Published: 16 July 2008

Volume 42, pages 109–116, (2008)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Takenobu Tokunaga¹,
Chu-Ren Huang² &
Sophia Yat Mei Lee²

156 Accesses
2 Citations
Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allan, K. (1977). Classifiers. Language, 53(2), 285–311.
Article Google Scholar
Bird, S., & Simons, G. (2003). Seven dimensions of portability for language documentation and description. Language, 79(4), 557–582.
Article Google Scholar
Bond, F., & Paik, K. (2000). Reusing an ontology to generate numerical classifiers. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), pp. 90–96.
Brants, T., & Franz, A. (2006). Web 1T 5-gram Version 1. LCD Catalog No. LDC2006T13.
Butt, M., & King, T. H. (2007). Urdu in a parallel grammar development environment. Language Resources and Evaluation, 41(2), 191–207.
Article Google Scholar
Clarke, C., Craswell, N., & Soboroff, I. (2004). Overview of the TREC 2004 terabyte track. In Proceedings of the 13th Text Retrieval Conference (TREC 2004).
Huang, C.-R., Tokunaga, T., & Lee, S. Y. M. (2006). Special issue on: Asian language processing: state-of-the art resources and processing. Language Resources and Evaluation, 40(3–4).
Google Scholar
Kilgarriff, A. (2007). Googleology is bad science. Computational Linguistics, 33(1), 147–151.
Article Google Scholar
Kilgarriff, A., & Grenfenstette, G. (2003). Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3), 333–347.
Article Google Scholar
Nakramura, J., & Nagao, M. (1988). Extraction of semantic information from an ordinary English dictionary and its evaluation. In Proceedings of the 12th International Conference on Computational linguistics (COLING 1988), pp. 459–464.
Naseem, T., & Hussain, S. (2007). A novel approach for ranking spelling error corrections for Urdu. Language Resources and Evaluation, 41(2), 117–128.
Article Google Scholar
Pantel, P., & Pennacchiotti, M. (2006). Espresso: leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics/the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), pp. 113–120.
Ringlstetter, C., Schulz, K.U., & Mihov, S. (2006). Orthographic errors in web pages: toward cleaner web corpora. Computational Linguistics, 32(3), 295–340.
Article Google Scholar
Shirai, K., Tokunaga, T., Huang, C.-R., Hsieh, S.-K., Kuo, T.-Y., Sornlertlamvanich, V., & Charoenporn, T. (2008). Constructing taxonomy of numerative classifiers for Asian languages. In Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), pp. 397–402.
Tanaka, K., & Iwasaki, H. (1996). Extraction of lexical translations from non-aligned corpora. In Proceedings of the 16th International Conference on Computational linguistics (COLING 1996), pp. 580–585.
Tsurumaru, H., Hitaka, T., & Yoshida, S. (1986). An attempt to automatic thesaurus construction from an ordinary Japanese language dictionary. In Proceedings of the 11th International Coference on Computational linguistics (COLING 1986), pp. 445–447

Resources

British National Corpus. http://www.natcorp.ox.ac.uk/.
Brown Corpus. http://icame.uib.no/brown/bcm.html.
Cobuild Project. http://www.collins.co.uk/corpus/CorpusSearch.aspx.
Sinica Corpus. http://www.sinica.edu.tw/SinicaCorpus.
Chinese Gigaword. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T09.
English Gigaword. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T05.
Tagged Chinese Gigaword. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T03.

Download references

Acknowledgements

We would like to thank all the authors who submitted 74 papers on a wide range of research topics on Asian languages. We had the privilege of going through all these papers and wished that the full range of resources and topics could have been presented. We would also like to thank all the reviewers, whose prompt action helped us through all the submitted papers with helpful comments. We would like to thank AFNLP for its support of the initiative to promote Asian language processing. Various colleagues helped us processing all the papers, including Dr. Sara Goggi at CNR-Italy, and Liwu Chen at Academia Sinica. Finally, we could like to thank four people at LRE and Springer that made this special issue possible. Without the generous support of the chief editors Nancy Ide and Nicoletta Calzolari, this volume would not have been possible. In addition, without the diligent work of both Estella La Jappon and Jenna Cataluna at Springer, we would never have been able to negotiate all the steps of publication. For this introductory chapter, we would like to thank Kathleen Ahrens, Nicoletta Calzolari, and Nancy Ide for their detailed comments. Any remaining errors are, of course, ours.

Author information

Authors and Affiliations

Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, 2-12-1 Ôokayama, Meguro, Tokyo, 152-8552, Japan
Takenobu Tokunaga
Institute of Linguistics, Academia Sinica, Nankang, Taipei, 115, Taiwan
Chu-Ren Huang & Sophia Yat Mei Lee

Authors

Takenobu Tokunaga
View author publications
You can also search for this author in PubMed Google Scholar
Chu-Ren Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Yat Mei Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takenobu Tokunaga.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tokunaga, T., Huang, CR. & Lee, S.Y.M. Asian language resources: the state-of-the-art. Lang Resources & Evaluation 42, 109–116 (2008). https://doi.org/10.1007/s10579-008-9071-y

Download citation

Published: 16 July 2008
Issue Date: May 2008
DOI: https://doi.org/10.1007/s10579-008-9071-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Access this article

References

Resources

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation