loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Chris J. Lu 1 ; Destinee Tormey 1 ; Lynn McCreedy 1 and Allen C. Browne 2

Affiliations: 1 National Library of Medicine, Medical Science & Computing and LLC, United States ; 2 National Library of Medicine, United States

Keyword(s): MEDLINE N-Gram Set, Multiwords, Medical Language Processing, Natural Language Processing, the SPECIALIST Lexicon.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Biomedical Engineering ; Data Mining ; Databases and Information Systems Integration ; Enterprise Information Systems ; Health Information Systems ; Practice-based Research Methods for Healthcare IT ; Sensor Networks ; Signal Processing ; Soft Computing

Abstract: Multiwords are vital to better Natural Language Processing (NLP) systems for more effective and efficient parsers, refining information retrieval searches, enhancing precision and recall in Medical Language Processing (MLP) applications, etc. The Lexical Systems Group has enhanced the coverage of multiwords in the Lexicon to provide a more comprehensive resource for such applications. This paper describes a new systematic approach to lexical multiword acquisition from MEDLINE through filters and matchers based on empirical models. The design goal, function description, various tests and applications of filters, matchers, and data are discussed. Results include: 1) Generating a smaller (38%) distilled MEDLINE n-gram set with better precision and similar recall to the MEDLINE n-gram set; 2) Establishing a system for generating high precision multiword candidates for effective Lexicon building. We believe the MLP/NLP community can benefit from access to these big data (MEDLINE n-gram) s ets. We also anticipate an accelerated growth of multiwords in the Lexicon with this system. Ultimately, improvement in recall or precision can be anticipated in NLP projects using the MEDLINE distilled n-gram set, SPECIALIST Lexicon and its applications. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.22.249.158

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Lu, C.; Tormey, D.; McCreedy, L. and Browne, A. (2017). Generating a Distilled N-Gram Set - Effective Lexical Multiword Building in the SPECIALIST Lexicon. In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017) - HEALTHINF; ISBN 978-989-758-213-4; ISSN 2184-4305, SciTePress, pages 77-87. DOI: 10.5220/0006142000770087

@conference{healthinf17,
author={Chris J. Lu. and Destinee Tormey. and Lynn McCreedy. and Allen C. Browne.},
title={Generating a Distilled N-Gram Set - Effective Lexical Multiword Building in the SPECIALIST Lexicon},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017) - HEALTHINF},
year={2017},
pages={77-87},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006142000770087},
isbn={978-989-758-213-4},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017) - HEALTHINF
TI - Generating a Distilled N-Gram Set - Effective Lexical Multiword Building in the SPECIALIST Lexicon
SN - 978-989-758-213-4
IS - 2184-4305
AU - Lu, C.
AU - Tormey, D.
AU - McCreedy, L.
AU - Browne, A.
PY - 2017
SP - 77
EP - 87
DO - 10.5220/0006142000770087
PB - SciTePress