Abstract
Recent years have witnessed increased interests in topic detection and tracking (TDT). However, existing work mainly focuses on overall trend analysis, and is not developed for understanding the evolving process of topics. To this end, this paper aims to reveal the underlying process and reasons for topic formation and development (TFD). Along this line, based on community partitioning in social networks, a core-group model is proposed to explain the dynamics and to segment topic development. This model is inspired by the cell division mechanism in biology. Furthermore, according to the division phase and interphase in the life cycle of a core group, a topic is separated into four states including birth state, extending state, saturation state and shrinkage state. In this paper, we mainly focus our studies on scientific topic formation and development using the citation network structure among scientific papers. Experimental results on two real-world data sets show that the division of a core group brings on the generation of a new scientific topic. The results also reveal that the progress of an entire scientific topic is closely correlated to the growth of a core group during its interphase. Finally, we demonstrate the effectiveness of the proposed method in several real-life scenarios.
Similar content being viewed by others
References
Allan, J. (ed): Introduction to topic detection and tracking. In: Topic detection and tracking, pp. 1–16. Kluwer Academic Publishers, MA (2002)
Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X.: Group formation in large social networks: Membership, growth, and evolution. In: Proc. of the 12th SIGKDD, pp. 44–54 (2006)
Berger-Wolf, T.Y., Saia, J.: A framework for analysis of dynamic social networks. In: Proc. of the 12th SIGKDD, pp. 523–528 (2006)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. (JMLR) 3, 993–1022 (2003)
Carley, K.M., Diesner, J., Reminga, J., Tsvetovat, M.: Toward an interoperable dynamic network analysis toolkit. Decis. Support. Syst. 43(4), 1324–1347 (2007)
Fiedler, M.: Algebraic connectivity of graphs. Czechoslov. Math. J. 23, 298–305 (1973)
Fortunato S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)
Graham, J.: Dynamic network analysis estimation of shared situation awareness. Ph.D. Dissertation, Carnegie Mellon University, Pittsburgh, PA (1995)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl. 1), 5228–5235 (2004)
Guimerà, R., Amaral., L.A.N.: Cartography of complex networks: Modules and universal roles. J. Stat. Mech. P02001, P02001-1–P02001-13, (2005)
Guimerà, R., Sales-Pardo, M., Amaral., L.A.N.: Classes of complex networks defined by role-to-role connectivity profiles. Nat. Phys. 3, 63–69 (2007)
Guimerà, R., Sales-Pardo, M., Amaral., L.A.N.: Supplementary discussion: Classes of complex networks defined by role-to-role connectivity profiles. Nat. Phys. 3(1), 63–69 (2007)
Harel, D., Koren, Y.: Clustering spatial data using random walks. In: Proc. of the 7th SIGKDD, pp. 281–286 (2001)
Jacovi, M., Soroka, V., Gilboa-Freedman, G., Shahar, S.U.E., Marmasse, N.: The chasms of cscw: A citation graph analysis of the cscw conference. In: Proc. of the 2006 20th anniversary conference on Computer supported cooperative work, pp. 289–298 (2006)
KDDCup: In: www.cs.cornell.edu/projects/kddcup/datasets.html (2003)
Kossinets, G., Kleinberg, J., Watts, D.: The structure of information pathways in a social communication network. In: Proc. of the 14th SIGKDD, pp. 435–443 (2008)
Kuhn, T.S.: The structure of scientific revolutions. Chicago University Press, Chicago, IL (1970)
Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: Proc. of the 12th SIGKDD, pp. 611–617 (2006)
Kumpula, J.M., Kivelä, M., Kaski, K., Saramäki, J.: Sequential algorithm for fast clique percolation. Phys. Rev. E 78(2), 026109 (2008)
Latapy, M., Pons, P.: Computing communities in large networks using random walks. In: Proc. of the 20th ISCIS, pp. 284–293 (2005)
Leicht, E.A., Clarkson, G., Shedden, K., Newman, M.E.J.: Large-scale structure of time evolving citation networks. Eur. Phys. J. B 59, 75–83 (2007)
Leskovec, J., Backstrom, L., Kumar, R., Tomkins, A.: Microscopic evolution of social networks. In: Proc. of the 14th SIGKDD, pp. 462–470 (2008)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: Densification laws, shrinking diameters and possible explanations. In: Proc. of the 11th SIGKDD, pp. 177–187 (2005)
Leskovec, J., Lang, K.J., Mahoney., M.W.: Empirical comparison of algorithms for network community detection. In: Proc. of WWW (2010)
Li, H., Nie, Z., Lee, W.C., Giles, L., Wen, J.R.: Scalable community discovery on textual data with relations. In: Proc. of the 17th CIKM, pp. 1203–1212 (2008)
Palla, G., Dernyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)
Lu, Y., Zhai, C.: Opinion integration through semi-supervised topic modeling. In: Proc. of the 17th WWW, pp. 121–130 (2008)
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proc. of the 17th WWW, pp. 101–110 (2008)
Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proc. of the 11th SIGKDD, pp. 198–207 (2005)
Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proc. of the 7th SIGCOMM, pp. 29–42 (2007)
Morchen, F., Dejori, M., Fradkin, D., Etienne, J., Wachmann, B., Bundschus, M.: Anticipating annotations and emerging trends in biomedical literature. In: Proc. of the 14th SIGKDD, pp. 954–962 (2008)
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. PRE 69(6), 066613 (2004)
Pothen, A., Simon, H., Liou, K.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. A. 11, 430–452 (1990)
Rogers, E.M.: Diffusion of Innovations. The Free Press, New York (1995)
Scott, J.: Social Network Analysis: A Handbook. Sage Publications, London (2000)
Shibata, N., Kajikawa, Y., Takeda, Y., Sakata, I., Matsushima, K.: Detecting emerging research fronts in regenerative medicine by the citation network analysis of scientific publications. Technol. Forecast. Soc. Chang. 78, 274–282 (2011)
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proc. of the 10th SIGKDD, pp. 306–315 (2004)
Sun, Y., Tang, J., Han, J., Gupta, M., Zhao, B.: Community evolution detection in dynamic heterogeneous information networks. In: Proc. of the Eighth Workshop on Mining and Learning with Graphs (MLG), pp. 137–146 (2010)
Tang, L., Liu, H., Zhang, J., Nazeri, Z.: Community evolution in dynamic multi-mode networks. In: Proc. of the 14th SIGKDD, pp. 677–685 (2008)
Tantipathananandh, C., Berger-Wolf, T.Y., Kempe, D.: A framework for community identification in dynamic social networks. In: Proc. of the 13th SIGKDD, pp. 717–726 (2007)
Virchow, R.: Die Cellularpathologie in ihrer Begrndung auf physiologische und pathologische Gewebelehre. A. Hirschwald, Berlin (1858)
Wang, X., McCallum, A.: Topics over time: A non-markov continuous time model of topical trends. In: Proc. of the 12th SIGKDD, pp. 424–433 (2006)
Wei, F., Qian, W., Wang, C., Zhou, A.: Detecting overlapping community structures in networks. World Wide Web J. 12, 235–261 (2009)
Wang, X., Zhai, C., Hu, X., Sproat, R.: Mining correlated bursty topic patterns from coordinated text streams. In: Proc. of the 13th SIGKDD, pp. 784–793 (2007)
Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B.T., Liu, X.: Learning approaches for detecting and tracking news events. IEEE Intell. Syst. 14, 32–43 (1999)
Zhou, D., Councill, I., Zha, H., Giles, C.L.: Discovering temporal communities from social network documents. In: Proc. of the 7th ICDM, pp. 745–750 (2007)
Zhou, D., Ji, X., Zha, H., Giles, C.L.: Topic evolution and social interactions: how authors effect research. In: Proc. of the 15th CIKM, pp. 248–257 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qian, T., Li, Q., Liu, B. et al. Topic formation and development: a core-group evolving process. World Wide Web 17, 1343–1373 (2014). https://doi.org/10.1007/s11280-013-0245-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-013-0245-1