Abstract
The use of domain ontologies is becoming increasingly popular in Medical Natural Language Processing Systems. A wide variety of knowledge bases in multiple languages has been integrated into the Unified Medical Language System (UMLS) to create a huge knowledge source that can be accessed with diverse lexical tools. MetaMap (and its java version MMTx) is a tool that allows extracting medical concepts from free text, but currently there not exists a Spanish version. Our ongoing research is centered on the application of biomedical concepts to cross-lingual text classification, what makes it necessary to have a Spanish MMTx available. We have combined automatic translation techniques with biomedical ontologies and the existing English MMTx to produce a Spanish version of MMTx. We have evaluated different approaches and applied several types of evaluation according to different concept representations for text classification. Our results prove that the use of existing translation tools such as Google Translate produce translations with a high similarity to original texts in terms of extracted concepts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
MEDLINE Factsheet, http://www.nlm.nih.gov/pubs/factsheets/medline.html
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus. In: Proceedings of the American Medical Informatics Association Symp., pp. 17–21 (2001)
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 2004 32, D267–D270 (2004)
Carrero García, F., et al.: Attribute Analysis in Biomedical Text Classification. In: Second BioCreAtIvE Challenge Workshop: Critical Assessment of Information Extraction in Molecular Biology, Spanish Nacional Cancer Research Centre (CNIO), Madrid, SPAIN (2007)
Cortizo, J.C., Giraldez, I.: Discovering Data Dependencies in Web Content Mining. In: Proceedings of the IADIS International Conference WWW/Internet 2004, Madrid, Spain, October 6-9, 2004, pp. 881–884 (2004)
Cortizo, J.C., Giraldez, I., Gaya, M.C.: Wrapping the Naïve Bayes Classifier to Relax the Effect of Dependences. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 229–239. Springer, Heidelberg (2007)
Gaya, M.C., Giraldez, I., Cortizo, J.C.: Uso de algoritmos evolutivos para la fusion de teorías en minería de datos distribuida. In: Actas de la XII Conferencia de la Asociación Española para la Inteligencia Artificial – CAEPIA/TTIA 2007, vol. 2, pp. 121–130 (2007)
Gómez Hidalgo, J.M., et al.: Concept Indexing for Automated Text Categorization. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 195–206. Springer, Heidelberg (2004)
Gonzalo, J., et al.: Indexing with WordNet synsets can improve Text Retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing, Montreal (1998)
Gonzalo, J., et al.: Applying EuroWordNet to Cross-Language Text Retrieval. Computers and the Humanities 32, 2–3, 185–207 (1998)
Marko, K., Schulz, S., Hahn, U.: MorphoSaurus–design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain. Methods of Information in Medicine 44(4), 537–545 (2005)
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Snyder, B., Palmer, M.: The English all words task. In: SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2004)
Volk, M., et al.: Semantic annotation for concept-based cross-language medical information retrieval. International Journal of Medical Informatics 67(1-3), 97–112 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carrero, F., Cortizo, J.C., Gómez, J.M. (2008). Building a Spanish MMTx by Using Automatic Translation and Biomedical Ontologies. In: Fyfe, C., Kim, D., Lee, SY., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2008. IDEAL 2008. Lecture Notes in Computer Science, vol 5326. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88906-9_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-88906-9_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88905-2
Online ISBN: 978-3-540-88906-9
eBook Packages: Computer ScienceComputer Science (R0)