Building a Spanish MMTx by Using Automatic Translation and Biomedical Ontologies

Carrero, Francisco; Cortizo, José Carlos; Gómez, José María

doi:10.1007/978-3-540-88906-9_44

Francisco Carrero⁵,
José Carlos Cortizo^5,6 &
José María Gómez⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5326))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1741 Accesses
1 Citations

Abstract

The use of domain ontologies is becoming increasingly popular in Medical Natural Language Processing Systems. A wide variety of knowledge bases in multiple languages has been integrated into the Unified Medical Language System (UMLS) to create a huge knowledge source that can be accessed with diverse lexical tools. MetaMap (and its java version MMTx) is a tool that allows extracting medical concepts from free text, but currently there not exists a Spanish version. Our ongoing research is centered on the application of biomedical concepts to cross-lingual text classification, what makes it necessary to have a Spanish MMTx available. We have combined automatic translation techniques with biomedical ontologies and the existing English MMTx to produce a Spanish version of MMTx. We have evaluated different approaches and applied several types of evaluation according to different concept representations for text classification. Our results prove that the use of existing translation tools such as Google Translate produce translations with a high similarity to original texts in terms of extracted concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

MEDLINE Factsheet, http://www.nlm.nih.gov/pubs/factsheets/medline.html
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus. In: Proceedings of the American Medical Informatics Association Symp., pp. 17–21 (2001)
Google Scholar
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 2004 32, D267–D270 (2004)
Article Google Scholar
Carrero García, F., et al.: Attribute Analysis in Biomedical Text Classification. In: Second BioCreAtIvE Challenge Workshop: Critical Assessment of Information Extraction in Molecular Biology, Spanish Nacional Cancer Research Centre (CNIO), Madrid, SPAIN (2007)
Google Scholar
Cortizo, J.C., Giraldez, I.: Discovering Data Dependencies in Web Content Mining. In: Proceedings of the IADIS International Conference WWW/Internet 2004, Madrid, Spain, October 6-9, 2004, pp. 881–884 (2004)
Google Scholar
Cortizo, J.C., Giraldez, I., Gaya, M.C.: Wrapping the Naïve Bayes Classifier to Relax the Effect of Dependences. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 229–239. Springer, Heidelberg (2007)
Chapter Google Scholar
Gaya, M.C., Giraldez, I., Cortizo, J.C.: Uso de algoritmos evolutivos para la fusion de teorías en minería de datos distribuida. In: Actas de la XII Conferencia de la Asociación Española para la Inteligencia Artificial – CAEPIA/TTIA 2007, vol. 2, pp. 121–130 (2007)
Google Scholar
Gómez Hidalgo, J.M., et al.: Concept Indexing for Automated Text Categorization. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 195–206. Springer, Heidelberg (2004)
Chapter Google Scholar
Gonzalo, J., et al.: Indexing with WordNet synsets can improve Text Retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing, Montreal (1998)
Google Scholar
Gonzalo, J., et al.: Applying EuroWordNet to Cross-Language Text Retrieval. Computers and the Humanities 32, 2–3, 185–207 (1998)
Article Google Scholar
Marko, K., Schulz, S., Hahn, U.: MorphoSaurus–design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain. Methods of Information in Medicine 44(4), 537–545 (2005)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Snyder, B., Palmer, M.: The English all words task. In: SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2004)
Google Scholar
Volk, M., et al.: Semantic annotation for concept-based cross-language medical information retrieval. International Journal of Medical Informatics 67(1-3), 97–112 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Europea de Madrid, C/Tajo s/n, Villaviciosa de Odón, 28670, Madrid, Spain
Francisco Carrero & José Carlos Cortizo
Artificial Intelligence & Network Solutions S.L., Spain
José Carlos Cortizo
Departamento de I+D, Optenet, Parque Empresarial Alvia, Las Rozas, 28230, Madrid, Spain
José María Gómez

Authors

Francisco Carrero
View author publications
You can also search for this author in PubMed Google Scholar
José Carlos Cortizo
View author publications
You can also search for this author in PubMed Google Scholar
José María Gómez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Scotland, PA1 2BE, Paisley, Scotland
Colin Fyfe
KAIST, Daejeon, Korea
Dongsup Kim
Brain Science Research Center and Department of Bio & Brain Engineering, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, 305-701, Daejeon, Korea
Soo-Young Lee
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carrero, F., Cortizo, J.C., Gómez, J.M. (2008). Building a Spanish MMTx by Using Automatic Translation and Biomedical Ontologies. In: Fyfe, C., Kim, D., Lee, SY., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2008. IDEAL 2008. Lecture Notes in Computer Science, vol 5326. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88906-9_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-88906-9_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88905-2
Online ISBN: 978-3-540-88906-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics