Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression

Sun, Xu; Wang, Hou-Feng; Wang, Bo

doi:10.1007/s11390-008-9156-5

Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression

Regular Paper
Published: 05 August 2008

Volume 23, pages 602–611, (2008)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Xu Sun^1,2,
Hou-Feng Wang¹ &
Bo Wang¹

90 Accesses
7 Citations
Explore all metrics

Abstract

In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however, make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese abbreviation prediction. In this study, each abbreviation is taken as a reduced form of the corresponding definition (expanded form), and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates, which are automatically generated from the corresponding definition. By employing Support Vector Regression (SVR) for scoring, we can obtain multiple abbreviation candidates together with their SVR values, which are used for candidate ranking. Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction. In addition, in abbreviation prediction, the SVR method outperforms the hidden Markov model (HMM).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Early dementia detection with speech analysis and machine learning techniques

Article Open access 11 April 2024

A Review on Word Embedding Techniques for Text Classification

References

Wren J D, Chang J T, Pustejovsky J, Adar E, Garner H R, Altman R B. Biomedical term mapping databases. Nucleic Acid Research, 2005, 33: 289–293.
Article Google Scholar
Yoshida M, Fukuda K, Takagi T. Pnad-css: A workbench for constructing a protein name abbreviation dictionary. Bioinformatics, 2000, 16(2): 169–175.
Article Google Scholar
Nenadic G, Spasic I, Ananiadou S. Automatic acronym acquisition and term variation management within domain-specific texts. In Proc. the LREC-3, Las Palmas, Spain, 2002, pp.2155–2162.
Schwartz A, Hearst M. A simple algorithm for identifying abbreviation definitions in biomedical texts. In Proc. the Pacific Symposium on Biocomputing (PSB 2003), pp.451–462.
Manuel Zahariev. An efficient methodology for acronym-expansion matching. In Proc. the International Conference on Information and Knowledge Engineering (IKE), Las Vegas, USA, 2003, pp.32–37.
Adar E. Sarad: A simple and robust abbreviation dictionary. Bioinformatics, 2004, 20(4): 527–533.
Article Google Scholar
Tsuruoka Y, Ananiadou S, Tsujii J. A machine learning approach to abbreviation generation. In Proc. the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Michigan, USA, 2005, pp.25–31.
Fu G, Luke K, Zhang M, Zhou G. A hybrid approach to Chinese abbreviation expansion. In Proc ICCPOL’06: 21st International Conference on Computer Processing of Oriental Languages, Singapore, 2006, pp.277–287.
Huang C R, Ahrens K, Chen K J. A data-driven approach to psychological reality of the mental lexicon: Two studies on Chinese corpus linguistics. In Proc. Language and Its Psychobiological Bases, Taipei, 1994a.
Huang C R, Hong W M, Chen K J. Suoxie: An information based lexical rule of abbreviation. In Proc. the Second Pacific Asia Conference on Formal and Computational Linguistics II, Japan, 1994b, pp.49–52.
Chang J, Lai L. A preliminary study on probabilistic models for Chinese abbreviations. In Proc. the Third SIGHAN Workshop on Chinese Language Learning, ACL, Barcelona, Spain, 2004, pp.9–16.
Chang J, Teng T. Mining atomic Chinese abbreviation pairs: A probabilistic model for single character word recovery. Language Resources and Evaluation, 2007, 40(3/4): 367–374.
Article Google Scholar
Christianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-Based Methods. Cambridge University Press, 2000.
Eubank R L. Spline Smoothing and Nonparametric Regression. New York: Marcel Dekker, 1988.
MATH Google Scholar
Smola A, Schölkopf B. A tutorial on support vector regression. Statistics and Computing, 2003, 14(3): 199–222.
Article Google Scholar
Chang C C, Lin C J. LIBSVM: A library for support vector machines. Software available at http://www.csie. ntu.edu.tw/~cjin/libsvm.
Hsu C W, Chang C C, Lin C J. A Practical Guide to Support Vector Classification, 2003, Working Paper, http://www.csie.ntu.edu.tw/~cjlin/talks/freiburg.pdf.
Och F J. An efficient method for determining bilingual word classes. In Proc. Ninth Conference of the European Chapter of the Association for Computational Linguistics, EACL’99, 1999, pp.71–76.
Martin S, Liermann J, Ney H. Algorithms for bigram and trigram word clustering. Speech Communication, 1998, 24(1): 19–37.
Article Google Scholar
Katz S M. Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Trans. Acoustics, Speech, and Signal Processing, 1987, 35(3): 400–401.
Article Google Scholar
Yan H, Wan X. Modern Chinese Abbreviation Dictionary. China: Yuwen Publisher, 2002. (In Chinese)
Google Scholar
Sun X, Wang H F. Chinese abbreviation identification using abbreviation-template features and context information. In Proc. 21st International Conference on Computer Processing of Oriental Languages (ICCPOL-06), Singapore, 2006, pp.245–255.
Sun X, Wang H F, Zhang Y. Chinese abbreviation-definition identification: A SVM approach using context information. In Proc. PRICAI-06: the 9th Pacific Rim International Conference on Artificial Intelligence, 2006, pp.495–504.

Download references

Author information

Authors and Affiliations

Institute of Computational Linguistics, School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Xu Sun, Hou-Feng Wang & Bo Wang
Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, 113-0033, Japan
Xu Sun

Authors

Xu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hou-Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Sun.

Additional information

Supported by the National Natural Science Foundation of China (Grant Nos. 60473138 and 60675035) and the Beijing Natural Science Foundation (Grant No. 4072012).

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 107 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., Wang, HF. & Wang, B. Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression. J. Comput. Sci. Technol. 23, 602–611 (2008). https://doi.org/10.1007/s11390-008-9156-5

Download citation

Received: 08 May 2007
Revised: 02 April 2008
Published: 05 August 2008
Issue Date: July 2008
DOI: https://doi.org/10.1007/s11390-008-9156-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Early dementia detection with speech analysis and machine learning techniques

A Review on Word Embedding Techniques for Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(PDF 107 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Early dementia detection with speech analysis and machine learning techniques

A Review on Word Embedding Techniques for Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(PDF 107 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation