Skip to main content

Advanced Clustering Technique for Medical Data Using Semantic Information

  • Conference paper
MICAI 2004: Advances in Artificial Intelligence (MICAI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2972))

Included in the following conference series:

Abstract

MEDLINE is a representative collection of medical documents supplied with original full-text natural-language abstracts as well as with representative keywords (called MeSH-terms) manually selected by the expert annotators from a pre-defined ontology and structured according to their relation to the document. We show how the structured manually assigned semantic descriptions can be combined with the original full-text abstracts to improve quality of clustering the documents into a small number of clusters. As a baseline, we compare our results with clustering using only abstracts or only MeSH-terms. Our experiments show 36% to 47% higher cluster coherence, as well as more refined keywords for the produced clusters.

Work done under partial support of the ITRI of Chung-Ang University, Korean Government (KIPA Professorship for Visiting Faculty Positions in Korea), and Mexican Government (CONACyT, SNI, IPN). The third author is currently on Sabbatical leave at Chung-Ang University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Iliopoulos, I., Enright, A., Ouzounis, C.: Textquest: document clustering of medline abstracts for concept discovery in molecular biology. In: Pac. Symp. on Biocomput. pp. 384–395 (2001)

    Google Scholar 

  2. Kubat, M., Bratko, I., Michalski, R.S.: In: Michalski, R.S., Bratko, I., Kubat, M. (eds.) Machine Learning and Data Mining: methods and applications: A review of machine learning methods, John Wiley & Sons, New York (1997)

    Google Scholar 

  3. Sekimizu, T., Park, H.S., Tsujii, J.: Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. In: Genome Informatics Workshop, Tokyo, p. 62 (1998)

    Google Scholar 

  4. Thomas, J., Milward, D., Ouzounis, C., Pulman, S., Carroll, M.: Automatic extraction of protein interactions from scientific abstracts. In: Pac. Symp. Biocomput, pp. 538–549 (2000)

    Google Scholar 

  5. Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14, 600 (1998)

    Article  Google Scholar 

  6. Proux, D., Rechenmann, F., Julliard, L., Pillet, V., Jacq, B.: Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction. In: Genome Informatics Workshop, Tokyo, pp. 72–80 (1998)

    Google Scholar 

  7. Salton, G., McGill, M.J.: Introduction to Modern Retrieval. McGraw-Hill Book Company, New York (1983)

    MATH  Google Scholar 

  8. Dhillon, I.S., Modha, D.S.: Concept Decomposition for Large Sparse Text Data using Clustering, Technical Report RJ 10147(9502), IBM Almaden Research Center (1999)

    Google Scholar 

  9. Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentince Hall, Englewood Cliffs (1992)

    Google Scholar 

  10. Dhillon, I.S., Fan, J., Guan, Y.: Efficient Clustering of Very Large Document Collections. In: Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, Dordrecht (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shin, K., Han, SY., Gelbukh, A. (2004). Advanced Clustering Technique for Medical Data Using Semantic Information. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds) MICAI 2004: Advances in Artificial Intelligence. MICAI 2004. Lecture Notes in Computer Science(), vol 2972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24694-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24694-7_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21459-5

  • Online ISBN: 978-3-540-24694-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics