loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Nagesh Yadav ; Alessandro Dibari ; Miao Wei ; John Segrave-Daly ; Conor Cullen ; Denisa Moga ; Jillian Scalvini ; Ciaran Hennessy ; Morten Kristiansen and Omar O’Sullivan

Affiliation: International Business Machines, U.S.A.

Keyword(s): Information Retrieval, Query Expansion, Word Embedding, Thesaurus.

Abstract: Query expansion is an extensively researched topic in the field of information retrieval that helps to bridge the vocabulary mismatch problem, i.e., the way users express concepts differs from the way they appear in the corpus. In this paper, we propose a query-expansion technique for searching a corpus that contains a mix of terminology from several domains - some of which have well-curated thesauri and some of which do not. An iterative fusion technique is proposed that exploits thesauri for those domains that have them, and word embeddings for those that do not. For our experiments, we have used a corpus of Medicaid healthcare policies that contain a mix of terminology from medical and insurance domains. The Unified Medical Language System (UMLS) thesaurus was used to expand medical concepts and a word embeddings model was used to expand non-medical concepts. The technique was evaluated against elastic search using no expansion. The results show 8% improvement in recall and 12% im provement in mean average precision. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.116.63.236

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Yadav, N.; Dibari, A.; Wei, M.; Segrave-Daly, J.; Cullen, C.; Moga, D.; Scalvini, J.; Hennessy, C.; Kristiansen, M. and O’Sullivan, O. (2020). Mitigating Vocabulary Mismatch on Multi-domain Corpus using Word Embeddings and Thesaurus. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI; ISBN 978-989-758-395-7; ISSN 2184-433X, SciTePress, pages 441-445. DOI: 10.5220/0009090804410445

@conference{nlpinai20,
author={Nagesh Yadav. and Alessandro Dibari. and Miao Wei. and John Segrave{-}Daly. and Conor Cullen. and Denisa Moga. and Jillian Scalvini. and Ciaran Hennessy. and Morten Kristiansen. and Omar O’Sullivan.},
title={Mitigating Vocabulary Mismatch on Multi-domain Corpus using Word Embeddings and Thesaurus},
booktitle={Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI},
year={2020},
pages={441-445},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009090804410445},
isbn={978-989-758-395-7},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI
TI - Mitigating Vocabulary Mismatch on Multi-domain Corpus using Word Embeddings and Thesaurus
SN - 978-989-758-395-7
IS - 2184-433X
AU - Yadav, N.
AU - Dibari, A.
AU - Wei, M.
AU - Segrave-Daly, J.
AU - Cullen, C.
AU - Moga, D.
AU - Scalvini, J.
AU - Hennessy, C.
AU - Kristiansen, M.
AU - O’Sullivan, O.
PY - 2020
SP - 441
EP - 445
DO - 10.5220/0009090804410445
PB - SciTePress