A dynamic bibliometric model for identifying online communities

Wang, Xin; Kabán, Ata

doi:10.1007/s10618-007-0081-y

A dynamic bibliometric model for identifying online communities

Published: 25 August 2007

Volume 16, pages 67–107, (2008)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Xin Wang¹ &
Ata Kabán¹

220 Accesses
5 Citations
Explore all metrics

Abstract

Predictive modelling of online dynamic user-interaction recordings and community identification from such data becomes more and more important with the widespread use of online communication technologies. Despite of the time-dependent nature of the problem, existing approaches of community identification are based on static or fully observed network connections. Here we present a new, dynamic generative model for the inference of communities from a sequence of temporal events produced through online computer- mediated interactions. The distinctive feature of our approach is that it tries to model the process in a more realistic manner, including an account for possible random temporal delays between the intended connections. The inference of these delays from the data then forms an integral part of our state-clustering methodology, so that the most likely communities are found on the basis of the likely intended connections rather than just the observed ones. We derive a maximum likelihood estimation algorithm for the identification of our model, which turns out to be computationally efficient for the analysis of historical data and it scales linearly with the number of non-zero observed (L + 1)-grams, where L is the Markov memory length. In addition, we also derive an incremental version of the algorithm, which could be used for real-time analysis. Results obtained on both synthetic and real-world data sets demonstrate the approach is flexible and able to reveal novel and insightful structural aspects of online interactions. In particular, the analysis of a full day worth synchronous Internet relay chat participation sequence, reveals the formation of an extremely clear community structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Baldi P, Frasconi P and Smyth P (2003). Modeling the internet and the web: probabilistic methods and algorithms. Wiley, San Francisco, CA, USA
Google Scholar
Bingham E, Gionis A, Haiminen N, Hiisilä H, Mannila H, Terzi E (2006) Segmentation and dimensionality reduction. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J (eds) Proceedings of the 6th SIAM international conference on data mining, April 20–22, 2006, Bethesda, MD, USA, SIAM
Brin S and Page L (1998). The anatomy of a large-scale hypertextual web search engine. Comp Netw 30(1–7): 107–117
Article Google Scholar
Cadez IV, Heckerman D, Meek C, Smyth P and White S (2003). Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Discov 7(4): 399–424
Article MathSciNet Google Scholar
Celeux G, Chrétien S, Forbes F and Mkhadri A (2001). A component-wise EM algorithm for mixtures. J Comput Graph Stat 10(4): 697–712
Article Google Scholar
Choudhury T, Basu S (2004) Modeling conversational dynamics as a mixed-memory markov process. In: Advances in neural information processing systems 17 (NIPS 2004), Vancouver, British Columbia, Canada
Cohn D, Chang H (2000) Learning to probabilistically identify authoritative documents. In: Langley P (ed) Proceedings of the 17th international conference on machine learning (ICML 2000), Stanford University, Standord, CA, USA, Morgan Kaufmann, pp 167–174
Cooley R, Mobasher B and Srivastava J (1999). Data preparation for mining world wide web browsing patterns. Knowl Inf Syst 1(1): 5–32
Google Scholar
Danon L, Díaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp P09008
Dempster AP, Laird NM and Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1): 1–38
MATH MathSciNet Google Scholar
Flake GW, Lawrence S, Giles CL and Coetzee F (2002). Self-organization and identification of web communities. IEEE Comp 35(3): 66–71
Google Scholar
Guedalia ID, London M and Werman M (1999). An on-line agglomerative clustering method for nonstationary data. Neural Comput 11(2): 521–540
Article Google Scholar
He X, Ding CHQ, Zha H, Simon HD (2001) Automatic topic identification using webpage clustering. In: Cercone N, Lin TY, Wu X (eds) Proceedings of the 2001 IEEE international conference on data mining, San Jose, California, USA, IEEE computer society, pp 195–202
Jain AK and Dubes RC (1988). Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA
MATH Google Scholar
Kabán A (2007). Predictive modelling of heterogeneous sequence collections by topographic ordering of histories. Mach Learn 68(1): 63–95
Article Google Scholar
Kabán A, Wang X (2004) Context based identification of user communities from internet chat. In: Proceedings of IEEE International Joint Conference Neural Networks (IJCNN 2004), IEEE computer society, pp 3287–3292
Kabán A, Wang X (2006) Deconvolutive clustering of markov states. In: Scheffer T, Fuernkranz J, Spiliopoulou M (eds) 17th European conference on machine learning (ECML2006), Vol 4212 LNAI, Springer-Verlag, pp 246–257
Kleinberg JM (1999). Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632
Article MATH MathSciNet Google Scholar
Kleinberg JM (2003). Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4): 373–397
Article MathSciNet Google Scholar
Kleinberg JM (2006) Temporal dynamics of on-line information streams. In: Garofalakis M, Gehrke J, Rastogi R (eds) Data stream management: processing high-speed data streams. Springer
Krishnan T and McLachlan GJ (1997). The EM algorithm and extensions. John Wiley and Sons, New York, NY, USA
MATH Google Scholar
Manning CD and Schütze H (1999). Foundations of statistical natural language processing. MIT Press, Cambridge, MA, USA
MATH Google Scholar
Neal RM, Hinton GE (1999) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Learning in graphical models. MIT Press, Cambridge, MA, USA, pp 355–368
Newman MEJ (2004). Detecting community structure in networks. Euro Phys J B 38: 321–330
Article Google Scholar
Ng AY, Zheng AX, Jordan MI (2001) Link analysis, eigenvectors and stability. In: Nebel B (ed) Proceedings of the 17th international joint conference on artificial intelligence, IJCAI 2001, Seattle, Washington, USA, Morgan Kaufmann, pp 903–910
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286
Raftery A (1985). A model for high-order markov chains. Roy Stat Soc B 47(3): 528–539
MATH MathSciNet Google Scholar
Raftery A and Berchtold A (2002). The mixture transition distribution model for high-order markov chains and non-gaussian time series. Stat Sci 17(3): 328–356
Article MATH MathSciNet Google Scholar
Ripley BD and Hjort NL (1995). Pattern recognition and neural networks. Cambridge University Press, New York, NY, USA
Google Scholar
Saul LK and Jordan MI (1999). Mixed memory markov models: decomposing complex stochastic processes as mixtures of simpler ones. Mach Learn 37(1): 75–87
Article MATH Google Scholar
Saul LK, Pereira F (1997) Aggregate and mixed-order markov models for statistical language processing. CoRR, cmp-lg/9706007
Ueda N and Nakano R (1994). A new competitive learning approach based on an equidistortion principle for designing optimal vector quantizers. Neural Netw 7(8): 1211–1227
Article Google Scholar
Wang X, Kabán A (2006) State aggregation in higher-order markov chains for finding online communities. In: Corchado E et al (ed) 7th international conference on intelligent data engineering and automated learning (IDEAL06), LNCS, vol 4224 Springer-Verlag, pp 1023–1030
Wasserman S, Faust K, Iacobucci D (1994) Social network analysis: methods and applications (Structural Analysis in the Social Sciences). Cambridge University Press
Ypma A, Heskes T (2002) Automatic categorization of web pages and user clustering with mixtures of hidden markov models. In: Zaïane OR, Srivastava J, Spiliopoulou M, Masand BM (eds) WEBKDD, Lecture notes in computer science, vol 2703 Springer, pp 35–49
Zhang D, Chen S and Tan K (2005). Improving the robustness of ‘online agglomerative clustering method’ based on kernel-induce distance measures. Neural Process Lett 21(1): 45–51
Article Google Scholar
Zhong S (2005) Efficient online spherical k-means clustering. In: Proceedings of the IEEE international joint conference neural networks (IJCNN 2005), IEEE computer society, pp 3180–3185

Download references

Author information

Authors and Affiliations

School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK
Xin Wang & Ata Kabán

Authors

Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ata Kabán
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Wang.

Additional information

Communicated by Chang-shing Perng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Kabán, A. A dynamic bibliometric model for identifying online communities. Data Min Knowl Disc 16, 67–107 (2008). https://doi.org/10.1007/s10618-007-0081-y

Download citation

Received: 27 February 2006
Accepted: 28 June 2007
Published: 25 August 2007
Issue Date: February 2008
DOI: https://doi.org/10.1007/s10618-007-0081-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A dynamic bibliometric model for identifying online communities

Abstract

Access this article

Similar content being viewed by others

A novel framework for community modeling and characterization in directed temporal networks

Interpreting communities based on the evolution of a dynamic attributed network

Modeling Community Structure and Topics in Dynamic Text Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A dynamic bibliometric model for identifying online communities

Abstract

Access this article

Similar content being viewed by others

A novel framework for community modeling and characterization in directed temporal networks

Interpreting communities based on the evolution of a dynamic attributed network

Modeling Community Structure and Topics in Dynamic Text Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation