skip to main content
10.1145/1150402.1150505acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

Published: 20 August 2006 Publication History

Abstract

We introduce a novel document clustering approach that overcomes those problems by combining a semantic-based bipartite graph representation and a mutual refinement strategy. The primary contributions of this paper are the following. First, we introduce a new representation of documents using a bipartite graph between documents and co-occurrence concepts in the documents. Second, we show how to enhance clustering quality by applying the mutual refinement strategy to the initial clustering results. Third, through the experiments on MEDLINE documents, we show that our integrated method significantly enhances cluster quality and clustering reliability compared to existing clustering methods. Our approach improves on the average 29.5 cluster quality and 26.3 clustering reliability, in terms of misclassification index, over Bisecting K-means with the best parameters.

References

[1]
Aggarwal, C. C., Wolf, J. L., Yu, P. S., Procopiuc, C., and Park, J. S. Fast algorithms for projected clustering. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of data, 1999, 61--72.
[2]
Beil, F., Ester, M. and Xu, X. Frequent term-based text clustering, In Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada, 436--442.
[3]
Buckley, C. and Lewit, A. F. Optimization of inverted vector searches. In Proceedings of SIGIR-85, 1985, 97--110.
[4]
Butte, A.J. and Kohane, I. S. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput. 2000, 418--429.
[5]
Church, K. W. and Hanks, P. Word association norms, mutual information, and lexicography. In Proceedings of the 27th Meeting of the ACL, 1989, 40--62.
[6]
Conrad, J and Utt, M. A system for discovering relationships by feature extraction from text databases. SIGIR, 1994, 260--270.
[7]
Cutting, D., Karger, D., Pedersen, J. and Tukey, J. Scatter/Gather: A cluster-based approach to browsing large document collections, In Proceedings of SIGIR '92, 1992, 318--329.
[8]
Fano, R. Transmission of information. MIT Press, Cambridge, 1961
[9]
Ghosh, J. Scalable clustering methods for data mining. In N. Ye (Ed.), Handbook of data mining. Lawrence Erlbaum, 2003.
[10]
Hearst, M. A. and Pedersen, J. O. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR-96, 1996, 76--84.
[11]
Hristovski, D. et al, Supporting discovery in medicine by association rule mining in Medline and UMLS, Medinfo, 10, 2001, 1344--1348.
[12]
Hu, X. Mining novel connections from large online digital library using biomedical ontologies, Library Management Journal, 26, 4/5, 2005, 261--270.
[13]
Jenssen, T. K., et al. A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet., 28, 2001, 21--28
[14]
Koller, D. and Sahami, M. Hierarchically classifying documents using very few words. In Proceedings of ICML-97, 1997, 170--176.
[15]
Larsen, B. and Aone, C. Fast and effective text mining using linear-time document clustering, KDD-99, San Diego, California, 1999, 16--22.
[16]
Lin, D. An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning, 1998, 296--304
[17]
Perez-Iratxeta, C., Bork, P. and Andrade, M. A. Association of genes to genetically inherited diseases using data mining. Nat. Genet., 31, 2002, 316--319.
[18]
Slonim, N. and Tishby, N. Document clustering using word clusters via the information bottleneck method. ACM SIGIR, 2000, 208--215.
[19]
Steinbach, M., Karypis, G., and Kumar, V. A comparison of document clustering techniques. Technical Report #00-034. Department of Computer Science and Engineering, University of Minnesota, 2000.
[20]
van Rijsbergen, C. J. Information Retrieval, 2nd edition, London: Buttersworth, 1979.
[21]
Willett, P. Recent trends in hierarchical document clustering: A critical review. Information Processing & Management, 24, 5, 1988, 577--597.
[22]
Wren, J. D. Extending the mutual information measure to rank inferred literature relationships, BMC Bioinformatics, 5, 2004, 145.
[23]
Xu, W. and Gong, Y. Document clustering by concept factorization. Proceedings of SIGIR-04, 2004, 202--209.
[24]
Yoo I., Hu X., and Song I. Y., Clustering Ontology-enriched Graph Representation for Biomedical Documents based on Scale-Free Network Theory, accepted in the IEEE Conference on Intelligent Systems, Sept 4-6, 2006
[25]
Yoo, I. and Hu, X., A Comprehensive comparison study of document clustering for a biomedical digital library MDELINE, accepted in ACM/IEEE Joint Conference on Digital Libraries, Chapel Hill, NC, June 11-15, 2006.
[26]
Zamir, O., and Etzioni O. Web document clustering: a feasibility demonstration, In Proceedings of SIGIR 98, 1998, 46--54.
[27]
Zeng, Y., Tang, J., Garcia-Frias, J. and Gao, G. R. An adaptive meta-clustering approach: combining the information from different clustering results, IEEE Computer Society Bioinformatics Conference (CSB2002), 2002, 276--287.
[28]
Zha, H. Generic Summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering, ACM SIGIR, 2002, 113--120.
[29]
Zhao, Y. and Karypis, G. Criterion functions for document clustering: experiments and analysis, Technical Report, Department of Computer Science, University of Minnesota, 2001.
[30]
Zhong, S. and Ghosh, J. A comparative study of generative models for document clustering. Proceedings of the workshop on Clustering High Dimensional Data and Its Applications in SIAM Data Mining Conference, 2003.
[31]
http://www-users.cs.umn.edu/~karypis/cluto/download.html

Cited By

View all
  • (2021)Unsupervised Topic Aware Document-Level Semantic Representation for Document Clustering2021 22nd International Arab Conference on Information Technology (ACIT)10.1109/ACIT53391.2021.9677217(1-10)Online publication date: 21-Dec-2021
  • (2018)Clustering of biomedical documents using ontology-based TF-IGM enriched semantic smoothing model for telemedicine applicationsCluster Computing10.1007/s10586-018-2023-4Online publication date: 20-Mar-2018
  • (2016)A data mining approach to selecting herbs with similar efficacy: Targeted selection methods based on medical subject headings (MeSH)Journal of Ethnopharmacology10.1016/j.jep.2016.02.007182(27-34)Online publication date: Apr-2016
  • Show More Cited By

Index Terms

  1. Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2006
      986 pages
      ISBN:1595933395
      DOI:10.1145/1150402
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 August 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. bipartite graph representation
      2. document clustering
      3. mutual refinement strategy
      4. ontology

      Qualifiers

      • Article

      Conference

      KDD06

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Unsupervised Topic Aware Document-Level Semantic Representation for Document Clustering2021 22nd International Arab Conference on Information Technology (ACIT)10.1109/ACIT53391.2021.9677217(1-10)Online publication date: 21-Dec-2021
      • (2018)Clustering of biomedical documents using ontology-based TF-IGM enriched semantic smoothing model for telemedicine applicationsCluster Computing10.1007/s10586-018-2023-4Online publication date: 20-Mar-2018
      • (2016)A data mining approach to selecting herbs with similar efficacy: Targeted selection methods based on medical subject headings (MeSH)Journal of Ethnopharmacology10.1016/j.jep.2016.02.007182(27-34)Online publication date: Apr-2016
      • (2015)The impact of titles expansion based on ontology in document retrievalInternational Journal of Metadata, Semantics and Ontologies10.1504/IJMSO.2015.07387510:3(170-181)Online publication date: 1-Dec-2015
      • (2015)Analysis of standard clustering algorithms for grouping MEDLINE abstracts into evidence-based medicine intervention categories2015 International Conference "Stability and Control Processes" in Memory of V.I. Zubov (SCP)10.1109/SCP.2015.7342223(555-557)Online publication date: Oct-2015
      • (2015)Adaptive Concept Resolution for document representation and its applications in text miningKnowledge-Based Systems10.1016/j.knosys.2014.10.00374:1(1-13)Online publication date: 1-Jan-2015
      • (2014)Ontology-based text summarization. The case of TexminerLibrary Hi Tech10.1108/LHT-01-2014-000532:2(229-248)Online publication date: 10-Jun-2014
      • (2013)A semantic social network-based expert recommender systemApplied Intelligence10.1007/s10489-012-0389-139:1(1-13)Online publication date: 1-Jul-2013
      • (2012)Enriching short text representation in microblog for clusteringFrontiers of Computer Science in China10.5555/2125163.21251896:1(88-101)Online publication date: 1-Feb-2012
      • (2012)Enriching short text representation in microblog for clusteringFrontiers of Computer Science10.1007/s11704-011-1167-76:1(88-101)Online publication date: 27-Jan-2012
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media