Skip to main content
Log in

Using SVM to Extract Acronyms from Text

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The paper addresses the problem of extracting acronyms and their expansions from text. We propose a support vector machines (SVM) based approach to deal with the problem. First, all likely acronyms are identified using heuristic rules. Second, expansion candidates are generated from surrounding text of acronyms. Last, SVM model is employed to select the genuine expansions. Analysis shows that the proposed approach has the advantages of saving over the conventional rule based approaches. Experimental results show that our approach outperforms the baseline method of using rules. We also show that the trained SVM model is generic and can adapt to other domains easily.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Adar E (2004) SaRAD: a simple and robust abbreviation dictionary. Bioinformatics 20:527–533

    Article  Google Scholar 

  2. Bowden PR, Automatic (1999) Glossary construction for technical papers. Department Working Paper, Nottingham Trent University

  3. Bowden PR, Halstead P, Rose TG (2000). Dictionaryless English plural noun singularisation using a corpus-based list of irregular forms. In: Proceedings of the 17th international conference on English Language Research on Computerized Corpora, Rodopi, Amersterdam, The Netherlands, pp 130–137

  4. Chang JT, Schutze H, Altman RB (2002) Create an online dictionary of abbreviation from MEDLINE. J Am Med Inform Assoc 9(6):612–620

    Article  Google Scholar 

  5. Hettich S, Bay SD (1999) The UCI KDD Archive. [http:// kdd.ics.uci.edu]. Department of Information and Computer Science, University of California, Irvine

    Google Scholar 

  6. Larkey LS, Ogilvie P, Price MA, Tamilio B (2000) Acrophile: An automated acronym extractor and server. In: Proceedings of the 5th ACM conference on digital libraries. ACM Press, San Antonio, pp 205–214

  7. Park Y, Byrd RJ (2001) Hybrid text mining for finding abbreviations and their definitions. In: Proceedings of the 2001 conference on empirical methods in natural language processing, Pittsburgh, pp 126–133

  8. Pustejovsky J, Castano J, Cochran B, Kotecki M, Morrell M (2001) Automatic extraction of acronym-meaning pairs from MEDLINE databases. Medinfo 10(Pt 1):371–375

    Google Scholar 

  9. Schwartz A, Hearst M (2003) A simple algorithm for identifying abbreviation definitions in biomedical text. In: Proceedings of the 2003 pacific symposium on biocomputing. World Scientific Press, Singapore

  10. Taghva K, Gilbreth J (1999) Recognizing acronyms and their definitions. Technical Report, ISRI (Information Science Research Institute), UNLV

  11. Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  12. Yeates S (1999) Automatic extraction of acronyms from text. In: Proceedings of the 3rd new zealand computer science research students’ conference, University of Waikato, Hamilton, pp 117–124

  13. Yeates S, Bainbridge D, Witten IH (2000) Using compression to identify acronyms in text. In: Proceedings of data compression conference, IEEE Press, New York, pp 582

  14. Yoshida M, Fukuda K, Takagi T (2000) PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary. Bioinformatics 16:169–175

    Article  Google Scholar 

  15. Yu H, Hripcsak G, Friedman C (2002) Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc 9:262–272

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Huang, Y. Using SVM to Extract Acronyms from Text. Soft Comput 11, 369–373 (2007). https://doi.org/10.1007/s00500-006-0091-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-006-0091-5

Keywords

Navigation