Skip to main content
Log in

Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however, make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese abbreviation prediction. In this study, each abbreviation is taken as a reduced form of the corresponding definition (expanded form), and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates, which are automatically generated from the corresponding definition. By employing Support Vector Regression (SVR) for scoring, we can obtain multiple abbreviation candidates together with their SVR values, which are used for candidate ranking. Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction. In addition, in abbreviation prediction, the SVR method outperforms the hidden Markov model (HMM).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wren J D, Chang J T, Pustejovsky J, Adar E, Garner H R, Altman R B. Biomedical term mapping databases. Nucleic Acid Research, 2005, 33: 289–293.

    Article  Google Scholar 

  2. Yoshida M, Fukuda K, Takagi T. Pnad-css: A workbench for constructing a protein name abbreviation dictionary. Bioinformatics, 2000, 16(2): 169–175.

    Article  Google Scholar 

  3. Nenadic G, Spasic I, Ananiadou S. Automatic acronym acquisition and term variation management within domain-specific texts. In Proc. the LREC-3, Las Palmas, Spain, 2002, pp.2155–2162.

  4. Schwartz A, Hearst M. A simple algorithm for identifying abbreviation definitions in biomedical texts. In Proc. the Pacific Symposium on Biocomputing (PSB 2003), pp.451–462.

  5. Manuel Zahariev. An efficient methodology for acronym-expansion matching. In Proc. the International Conference on Information and Knowledge Engineering (IKE), Las Vegas, USA, 2003, pp.32–37.

  6. Adar E. Sarad: A simple and robust abbreviation dictionary. Bioinformatics, 2004, 20(4): 527–533.

    Article  Google Scholar 

  7. Tsuruoka Y, Ananiadou S, Tsujii J. A machine learning approach to abbreviation generation. In Proc. the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Michigan, USA, 2005, pp.25–31.

  8. Fu G, Luke K, Zhang M, Zhou G. A hybrid approach to Chinese abbreviation expansion. In Proc ICCPOL’06: 21st International Conference on Computer Processing of Oriental Languages, Singapore, 2006, pp.277–287.

  9. Huang C R, Ahrens K, Chen K J. A data-driven approach to psychological reality of the mental lexicon: Two studies on Chinese corpus linguistics. In Proc. Language and Its Psychobiological Bases, Taipei, 1994a.

  10. Huang C R, Hong W M, Chen K J. Suoxie: An information based lexical rule of abbreviation. In Proc. the Second Pacific Asia Conference on Formal and Computational Linguistics II, Japan, 1994b, pp.49–52.

  11. Chang J, Lai L. A preliminary study on probabilistic models for Chinese abbreviations. In Proc. the Third SIGHAN Workshop on Chinese Language Learning, ACL, Barcelona, Spain, 2004, pp.9–16.

  12. Chang J, Teng T. Mining atomic Chinese abbreviation pairs: A probabilistic model for single character word recovery. Language Resources and Evaluation, 2007, 40(3/4): 367–374.

    Article  Google Scholar 

  13. Christianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-Based Methods. Cambridge University Press, 2000.

  14. Eubank R L. Spline Smoothing and Nonparametric Regression. New York: Marcel Dekker, 1988.

    MATH  Google Scholar 

  15. Smola A, Schölkopf B. A tutorial on support vector regression. Statistics and Computing, 2003, 14(3): 199–222.

    Article  Google Scholar 

  16. Chang C C, Lin C J. LIBSVM: A library for support vector machines. Software available at http://www.csie. ntu.edu.tw/~cjin/libsvm.

  17. Hsu C W, Chang C C, Lin C J. A Practical Guide to Support Vector Classification, 2003, Working Paper, http://www.csie.ntu.edu.tw/~cjlin/talks/freiburg.pdf.

  18. Och F J. An efficient method for determining bilingual word classes. In Proc. Ninth Conference of the European Chapter of the Association for Computational Linguistics, EACL’99, 1999, pp.71–76.

  19. Martin S, Liermann J, Ney H. Algorithms for bigram and trigram word clustering. Speech Communication, 1998, 24(1): 19–37.

    Article  Google Scholar 

  20. Katz S M. Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Trans. Acoustics, Speech, and Signal Processing, 1987, 35(3): 400–401.

    Article  Google Scholar 

  21. Yan H, Wan X. Modern Chinese Abbreviation Dictionary. China: Yuwen Publisher, 2002. (In Chinese)

    Google Scholar 

  22. Sun X, Wang H F. Chinese abbreviation identification using abbreviation-template features and context information. In Proc. 21st International Conference on Computer Processing of Oriental Languages (ICCPOL-06), Singapore, 2006, pp.245–255.

  23. Sun X, Wang H F, Zhang Y. Chinese abbreviation-definition identification: A SVM approach using context information. In Proc. PRICAI-06: the 9th Pacific Rim International Conference on Artificial Intelligence, 2006, pp.495–504.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Sun.

Additional information

Supported by the National Natural Science Foundation of China (Grant Nos. 60473138 and 60675035) and the Beijing Natural Science Foundation (Grant No. 4072012).

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 107 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., Wang, HF. & Wang, B. Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression. J. Comput. Sci. Technol. 23, 602–611 (2008). https://doi.org/10.1007/s11390-008-9156-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-008-9156-5

Keywords

Navigation