Skip to main content

Building a Spanish MMTx by Using Automatic Translation and Biomedical Ontologies

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2008 (IDEAL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5326))

Abstract

The use of domain ontologies is becoming increasingly popular in Medical Natural Language Processing Systems. A wide variety of knowledge bases in multiple languages has been integrated into the Unified Medical Language System (UMLS) to create a huge knowledge source that can be accessed with diverse lexical tools. MetaMap (and its java version MMTx) is a tool that allows extracting medical concepts from free text, but currently there not exists a Spanish version. Our ongoing research is centered on the application of biomedical concepts to cross-lingual text classification, what makes it necessary to have a Spanish MMTx available. We have combined automatic translation techniques with biomedical ontologies and the existing English MMTx to produce a Spanish version of MMTx. We have evaluated different approaches and applied several types of evaluation according to different concept representations for text classification. Our results prove that the use of existing translation tools such as Google Translate produce translations with a high similarity to original texts in terms of extracted concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MEDLINE Factsheet, http://www.nlm.nih.gov/pubs/factsheets/medline.html

  2. Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus. In: Proceedings of the American Medical Informatics Association Symp., pp. 17–21 (2001)

    Google Scholar 

  3. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 2004 32, D267–D270 (2004)

    Article  Google Scholar 

  4. Carrero García, F., et al.: Attribute Analysis in Biomedical Text Classification. In: Second BioCreAtIvE Challenge Workshop: Critical Assessment of Information Extraction in Molecular Biology, Spanish Nacional Cancer Research Centre (CNIO), Madrid, SPAIN (2007)

    Google Scholar 

  5. Cortizo, J.C., Giraldez, I.: Discovering Data Dependencies in Web Content Mining. In: Proceedings of the IADIS International Conference WWW/Internet 2004, Madrid, Spain, October 6-9, 2004, pp. 881–884 (2004)

    Google Scholar 

  6. Cortizo, J.C., Giraldez, I., Gaya, M.C.: Wrapping the Naïve Bayes Classifier to Relax the Effect of Dependences. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 229–239. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Gaya, M.C., Giraldez, I., Cortizo, J.C.: Uso de algoritmos evolutivos para la fusion de teorías en minería de datos distribuida. In: Actas de la XII Conferencia de la Asociación Española para la Inteligencia Artificial – CAEPIA/TTIA 2007, vol. 2, pp. 121–130 (2007)

    Google Scholar 

  8. Gómez Hidalgo, J.M., et al.: Concept Indexing for Automated Text Categorization. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 195–206. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Gonzalo, J., et al.: Indexing with WordNet synsets can improve Text Retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing, Montreal (1998)

    Google Scholar 

  10. Gonzalo, J., et al.: Applying EuroWordNet to Cross-Language Text Retrieval. Computers and the Humanities 32, 2–3, 185–207 (1998)

    Article  Google Scholar 

  11. Marko, K., Schulz, S., Hahn, U.: MorphoSaurus–design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain. Methods of Information in Medicine 44(4), 537–545 (2005)

    Google Scholar 

  12. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  13. Snyder, B., Palmer, M.: The English all words task. In: SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2004)

    Google Scholar 

  14. Volk, M., et al.: Semantic annotation for concept-based cross-language medical information retrieval. International Journal of Medical Informatics 67(1-3), 97–112 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carrero, F., Cortizo, J.C., Gómez, J.M. (2008). Building a Spanish MMTx by Using Automatic Translation and Biomedical Ontologies. In: Fyfe, C., Kim, D., Lee, SY., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2008. IDEAL 2008. Lecture Notes in Computer Science, vol 5326. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88906-9_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88906-9_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88905-2

  • Online ISBN: 978-3-540-88906-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics