Finding cohesive clusters for analyzing knowledge communities

Kandylas, Vasileios; Upham, S. Phineas; Ungar, Lyle H.

doi:10.1007/s10115-008-0135-5

Finding cohesive clusters for analyzing knowledge communities

Regular Paper
Published: 04 April 2008

Volume 17, pages 335–354, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Vasileios Kandylas¹,
S. Phineas Upham² &
Lyle H. Ungar¹

137 Accesses
16 Citations
Explore all metrics

Abstract

Documents and authors can be clustered into “knowledge communities” based on the overlap in the papers they cite. We introduce a new clustering algorithm, Streemer, which finds cohesive foreground clusters embedded in a diffuse background, and use it to identify knowledge communities as foreground clusters of papers which share common citations. To analyze the evolution of these communities over time, we build predictive models with features based on the citation structure, the vocabulary of the papers, and the affiliations and prestige of the authors. Findings include that scientific knowledge communities tend to grow more rapidly if their publications build on diverse information and if they use a narrow vocabulary.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Blei D, Lafferty J (2006) Dynamic topic models. 23rd ICML, 113–120
Crane D (1972) Invisible colleges: diffusion of knowledge in scientific communities. University of Chicago Press
Dhillon I, Guan Y (2003) Information theoretic clustering of sparse cooccurrence data. ICDM 517–520
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. KDD, pp 269–274, ACM Press, New York
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, AAAI Press, Portland, OR, pp 226–231
Fern X, Brodley C (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. ICML, pp 186–193
Flake G, Lawrence S, Giles C (2000) Efficient identification of Web communities. KDD pp 150–160
Gibson D, Kleinberg J and Raghavan P (1998). Inferring web communities from link topology. ACM Press, New York
Google Scholar
Griffith B, Small H, Stonehill J and Dey S (1974). The structure of scientific literatures II: toward a macro- and microstructure for Science. Sci Studies 4(4): 339–365
Article Google Scholar
Guha S, Meyerson A, Mishra N, Motwani R and O’Callaghan L (2003). Clustering data streams: theory and practice. IEEE Trans Knowledge Data Eng 15(3): 515–528
Article Google Scholar
Hopcroft J, Khan O, Kulis B, Selman B (2003) Natural communities in large linked networks. KDD, pp 541–546
Huang Q, Dom B, Steele D, Ashley J and Niblack W (1995). Foreground/background segmentation of color images by integration of multiple cues. IEEE Int Conf Image Process 1: 246–249
Google Scholar
Kearns MJ, Mansour Y, Ng AY (1997) An information-theoretic analysis of hard and soft assignment methods for clustering. UAI, pp 282–293
McGann A (2002). The advantages of ideological cohesion a model of constituency representation and electoral competition in multi-party democracies. J Theor Politics 14(1): 37–70
Article Google Scholar
McGovern A, Friedland L, Hay M, Gallagher B, Fast A, Neville J and Jensen D (2003). Exploiting relational structure to understand publication patterns in high-energy physics. SIGKDD Explor Newslett 5(2): 165–172
Article Google Scholar
Pantel P, Lin D (2002) Document clustering with committees. SIGIR ’02, ACM Press, New York, pp 199–206
Popescul A, Flake G, Lawrence S, Ungar L, Giles C (2000) Clustering and identifying temporal trends in document databases. Advances in digital libraries, 2000. ADL 2000. proceedings. IEEE, pp 173–182
Savakis A (1998) Adaptive document image thresholding using foreground and background clustering. Proceedings of international conference on image processing ICIP98
Small H (2003). Paradigms, citations and maps of science: a personal history. J Am Soc Informat Sci Technol 54(5): 394–399
Article MathSciNet Google Scholar
Small H and Crane D (1979). Specialties and disciplines in science and social science: an examination of their structure using citation indexes. Scientometrics 1(5): 445–461
Article Google Scholar
Steinbach M, Karypis G and Kumar V (2000). A comparison of document clustering techniques. KDD workshop text mining 34: 35
Google Scholar
Strehl A and Ghosh J (2002). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. JMLR 3: 583–617
Article MathSciNet Google Scholar
Sullivan D, White DH and Barboni EJ (1977). Co-citation analyses of science: an evaluation. Social Studies Sci 7(2): 223–240
Article Google Scholar
Upham SP (2006) Communities of innovation. PhD thesis, University of Pennsylvania
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. KDD, pp 424–433
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. Proceedings of the 1996 ACM SIGMOD international conference on Management of data, pp 103–114

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
Vasileios Kandylas & Lyle H. Ungar
Wharton School, University of Pennsylvania, Philadelphia, PA, USA
S. Phineas Upham

Authors

Vasileios Kandylas
View author publications
You can also search for this author in PubMed Google Scholar
S. Phineas Upham
View author publications
You can also search for this author in PubMed Google Scholar
Lyle H. Ungar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lyle H. Ungar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kandylas, V., Upham, S.P. & Ungar, L.H. Finding cohesive clusters for analyzing knowledge communities. Knowl Inf Syst 17, 335–354 (2008). https://doi.org/10.1007/s10115-008-0135-5

Download citation

Received: 29 October 2007
Revised: 23 December 2007
Accepted: 29 January 2008
Published: 04 April 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s10115-008-0135-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding cohesive clusters for analyzing knowledge communities

Abstract

Access this article

Similar content being viewed by others

Identifying Diachronic Topic-Based Research Communities by Clustering Shared Research Trajectories

BIBLIOBICLUSTER: A Bicluster Algorithm for Bibliometrics

Identification of research communities in cited and uncited publications using a co-authorship network

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding cohesive clusters for analyzing knowledge communities

Abstract

Access this article

Similar content being viewed by others

Identifying Diachronic Topic-Based Research Communities by Clustering Shared Research Trajectories

BIBLIOBICLUSTER: A Bicluster Algorithm for Bibliometrics

Identification of research communities in cited and uncited publications using a co-authorship network

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation