Abstract
In this paper, we explore how global ranking method in conjunction with local density method help identify meaningful term clusters from ontology enriched graph representation of biomedical literature corpus. One big problem with document clustering is how to discount the effects of class-unspecific general terms and strengthen the effects of class-specific core terms. We claim that a well constructed term graph can help improve the global ranking of class-specific core terms. We first apply PageRank and HITS to a directed abstract-title term graph to target class specific core terms. Then k dense term clusters (graphs) are identified from these terms. Last, each document is assigned to its closest core term graph. A series of experiments are conducted on a document corpus collected from PubMed. Experimental results show that our approach is very effective to identify class-specific core terms and thus help document clustering.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: SIGIR 2006, pp. 485–492 (2006)
Charkrabarti, S., Dom, B.E., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: SIGMOD 1998, pp. 307–318 (1998)
Cohen, W.W., Hofmann, T.: The missing link—a probabilistic model of document conent and hypertext connectivity. In: NIPS 13 (2001)
Erkan, G., Radev, D.R.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res (JAIR) 22, 457–479 (2004)
Hassan, S., Banea, C.: Random-Walk TermWeighting for Improved Text Classification. In: Workshop on TextGraphs, at HLT-NAACL 2006, pp. 53–60 (2006)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 668–677. ACM Press, New York (1998)
Marklv, A., Last, M., Kandel, A.: Model-based classification of web documents represented by Graphs. In: Proceedings of WebKDD 2006 workshop on knowledge discovery (2006)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking:Bringing order to theWeb. Technical report, Stanford Digital Library Technologies Project (1998)
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. Technical Report #00-034. Department of Computer Science and Engineering, University of Minnesota (2000)
Wang, B.B., McKay, R.I., Abbass, H.A., Barlow, M.: Learning Text Classifier using the Domain Concept Hierarchy. In: Proceedings of International Conference on Communications, Circuits and Systems 2002, China (2002)
Zhong, S., Ghosh, J.: A comparative study of generative models for document clustering. In: Proceedings of the workshop on Clustering High Dimensional Data and Its Applications in SIAM Data Mining Conference (2003)
Zipf, G.K.: Human Behaviour and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA (1949)
Zhao, Y., Karypis, G.: Criterion functions for document clustering: experiments and analysis, Technical Report, Department of Computer Science, University of Minnesota (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, X., Hu, X., Xia, J., Zhou, X., Achananuparp, P. (2007). Utilization of Global Ranking Information in Graph- Based Biomedical Literature Clustering. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-74553-2_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)