A Language Modeling Approach for Acronym Expansion Disambiguation

Ahmed, Akram Gaballah; Hady, Mohamed Farouk Abdel; Nabil, Emad; Badr, Amr

doi:10.1007/978-3-319-18111-0_21

Akram Gaballah Ahmed¹⁵,
Mohamed Farouk Abdel Hady¹⁴,
Emad Nabil¹⁵ &
…
Amr Badr¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9041))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2950 Accesses
1 Citations

Abstract

Nonstandard words such as proper nouns, abbreviations, and acronyms are a major obstacle in natural language text processing and information retrieval. Acronyms, in particular, are difficult to read and process because they are often domain-specific with high degree of polysemy. In this paper, we propose a language modeling approach for the automatic disambiguation of acronym senses using context information. First, a dictionary of all possible expansions of acronyms is generated automatically. The dictionary is used to search for all possible expansions or senses to expand a given acronym. The extracted dictionary consists of about 17 thousands acronym-expansion pairs defining 1,829 expansions from different fields where the average number of expansions per acronym was 9.47. Training data is automatically collected from downloaded documents identified from the results of search engine queries. The collected data is used to build a unigram language model that models the context of each candidate expansion. At the in-context expansion prediction phase, the relevance of acronym expansion candidates is calculated based on the similarity between the context of each specific acronym occurrence and the language model of each candidate expansion. Unlike other work in the literature, our approach has the option to reject to expand an acronym if it is not confident on disambiguation. We have evaluated the performance of our language modeling approach and compared it with tf-idf discriminative approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ammar, W., Darwish, K., El Kahki, A., Hafez, K.: ICE-TEA: In-context expansion and translation of english abbreviations. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 41–54. Springer, Heidelberg (2011)
Chapter Google Scholar
Terada, A., Tokunaga, T., Tanaka, H.: Automatic expansion of abbreviations by using context and character. Information Processing and Management 40(1) (2004)
Google Scholar
Yu, H., Kim, W., Hatzivassiloglou, V., Wilbur, J.: A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations. ACM Transactions on Information Systems 24(3) (2006)
Google Scholar
Zahariev, M.: Automatic sense disambiguation for acronyms. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), pp. 124–132 (2004)
Google Scholar
Fellbaum, C.: MIT Press (1998)
Google Scholar
Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys 41(2) (2009)
Google Scholar
Klavans, J., Chodorow, M., Wachokder, N.: From dictionary to knowledge base via taxononym. In: Proceedings of the 6th Conference of the UW Contre for the New OED, pp. 41–54 (1990)
Google Scholar
Taghva, K., Gilbreth, J.: Recognizing acronyms and their definitions. International Journal on Document Analysis and Recognition, 191–198 (1999)
Google Scholar
Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in biomedical texts. In: Proceedings of the Pacific Symposium on Biocomputing (PSB) (2003)
Google Scholar
Jain, A., Cucerzan, S., Azzam, S.: Acronym-expansion recognition and ranking on the web. In: Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI 2007), pp. 209–214 (2007)
Google Scholar
Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving abbreviations to their senses in medline. Bioinformatics 21(18), 3658–3664 (2005)
Article Google Scholar
Stevenson, M., Guo, Y., Amri, A.A., Gaizauskas, R.: Disambiguation of biomedical abbreviations. In: BioNLP Workshop, HLT 2009 (2009)
Google Scholar
Ponte, J., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 275–281 (1998)
Google Scholar
Mahajan, M., Beeferman, D., Huang, X.D.: Improved topic-dependent language modeling using information retrieval techniques. In: Proceedings of ICASSP (1999)
Google Scholar
Kuncheva, L., Bezdek, J.: An integrated framework for generalized nearest prototype classifier design. International Journal of Uncertainty, Fuzziness and Knowledge-based Systems 6(5), 437–457 (1998)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft, Redmond, WA, USA
Mohamed Farouk Abdel Hady
Faculty of Computers and Information, Cairo University, Cairo, Egypt
Akram Gaballah Ahmed, Emad Nabil & Amr Badr

Authors

Akram Gaballah Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Farouk Abdel Hady
View author publications
You can also search for this author in PubMed Google Scholar
Emad Nabil
View author publications
You can also search for this author in PubMed Google Scholar
Amr Badr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akram Gaballah Ahmed .

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmed, A.G., Hady, M.F.A., Nabil, E., Badr, A. (2015). A Language Modeling Approach for Acronym Expansion Disambiguation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-18111-0_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18110-3
Online ISBN: 978-3-319-18111-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics