Advanced Clustering Technique for Medical Data Using Semantic Information

Shin, Kwangcheol; Han, Sang-Yong; Gelbukh, Alexander

doi:10.1007/978-3-540-24694-7_33

Kwangcheol Shin¹⁰,
Sang-Yong Han¹⁰ &
Alexander Gelbukh^10,11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2972))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

Abstract

MEDLINE is a representative collection of medical documents supplied with original full-text natural-language abstracts as well as with representative keywords (called MeSH-terms) manually selected by the expert annotators from a pre-defined ontology and structured according to their relation to the document. We show how the structured manually assigned semantic descriptions can be combined with the original full-text abstracts to improve quality of clustering the documents into a small number of clusters. As a baseline, we compare our results with clustering using only abstracts or only MeSH-terms. Our experiments show 36% to 47% higher cluster coherence, as well as more refined keywords for the produced clusters.

Work done under partial support of the ITRI of Chung-Ang University, Korean Government (KIPA Professorship for Visiting Faculty Positions in Korea), and Mexican Government (CONACyT, SNI, IPN). The third author is currently on Sabbatical leave at Chung-Ang University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Document Clustering by Relevant Terms: An Approach

Clustering of biomedical documents using ontology-based TF-IGM enriched semantic smoothing model for telemedicine applications

Article 20 March 2018

Biomedical Document Clustering

References

Iliopoulos, I., Enright, A., Ouzounis, C.: Textquest: document clustering of medline abstracts for concept discovery in molecular biology. In: Pac. Symp. on Biocomput. pp. 384–395 (2001)
Google Scholar
Kubat, M., Bratko, I., Michalski, R.S.: In: Michalski, R.S., Bratko, I., Kubat, M. (eds.) Machine Learning and Data Mining: methods and applications: A review of machine learning methods, John Wiley & Sons, New York (1997)
Google Scholar
Sekimizu, T., Park, H.S., Tsujii, J.: Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. In: Genome Informatics Workshop, Tokyo, p. 62 (1998)
Google Scholar
Thomas, J., Milward, D., Ouzounis, C., Pulman, S., Carroll, M.: Automatic extraction of protein interactions from scientific abstracts. In: Pac. Symp. Biocomput, pp. 538–549 (2000)
Google Scholar
Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14, 600 (1998)
Article Google Scholar
Proux, D., Rechenmann, F., Julliard, L., Pillet, V., Jacq, B.: Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction. In: Genome Informatics Workshop, Tokyo, pp. 72–80 (1998)
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Retrieval. McGraw-Hill Book Company, New York (1983)
MATH Google Scholar
Dhillon, I.S., Modha, D.S.: Concept Decomposition for Large Sparse Text Data using Clustering, Technical Report RJ 10147(9502), IBM Almaden Research Center (1999)
Google Scholar
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentince Hall, Englewood Cliffs (1992)
Google Scholar
Dhillon, I.S., Fan, J., Guan, Y.: Efficient Clustering of Very Large Document Collections. In: Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, Dordrecht (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Chung-Ang University, 221 HukSuk-Dong, DongJak-Ku, Seoul, 156-756, Korea
Kwangcheol Shin, Sang-Yong Han & Alexander Gelbukh
Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico
Alexander Gelbukh

Authors

Kwangcheol Shin
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Yong Han
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Tecnológico de Monterrey, Campus Estado de México, Carretera al lago de Guadalupe, Km 3.5, Atizapán, 52926, Mexico
Raúl Monroy
Instituto de Investigaciones Electricas, Reforma # 113, Col. Palmira, 62490, Morelos, Cuernavaca, Mexico
Gustavo Arroyo-Figueroa
Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro No. 1, 72840, Puebla, México
Luis Enrique Sucar
Centro de Investigación en Computación – IPN, Av. Juan de Dios Batíz, esquina con Miguel Othón de Mendizábal, Ciudad de México, 07738, México
Humberto Sossa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shin, K., Han, SY., Gelbukh, A. (2004). Advanced Clustering Technique for Medical Data Using Semantic Information. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds) MICAI 2004: Advances in Artificial Intelligence. MICAI 2004. Lecture Notes in Computer Science(), vol 2972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24694-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-540-24694-7_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21459-5
Online ISBN: 978-3-540-24694-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics