Skip to main content
Log in

Topic formation and development: a core-group evolving process

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Recent years have witnessed increased interests in topic detection and tracking (TDT). However, existing work mainly focuses on overall trend analysis, and is not developed for understanding the evolving process of topics. To this end, this paper aims to reveal the underlying process and reasons for topic formation and development (TFD). Along this line, based on community partitioning in social networks, a core-group model is proposed to explain the dynamics and to segment topic development. This model is inspired by the cell division mechanism in biology. Furthermore, according to the division phase and interphase in the life cycle of a core group, a topic is separated into four states including birth state, extending state, saturation state and shrinkage state. In this paper, we mainly focus our studies on scientific topic formation and development using the citation network structure among scientific papers. Experimental results on two real-world data sets show that the division of a core group brings on the generation of a new scientific topic. The results also reveal that the progress of an entire scientific topic is closely correlated to the growth of a core group during its interphase. Finally, we demonstrate the effectiveness of the proposed method in several real-life scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Allan, J. (ed): Introduction to topic detection and tracking. In: Topic detection and tracking, pp. 1–16. Kluwer Academic Publishers, MA (2002)

    Chapter  Google Scholar 

  2. Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X.: Group formation in large social networks: Membership, growth, and evolution. In: Proc. of the 12th SIGKDD, pp. 44–54 (2006)

  3. Berger-Wolf, T.Y., Saia, J.: A framework for analysis of dynamic social networks. In: Proc. of the 12th SIGKDD, pp. 523–528 (2006)

  4. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. (JMLR) 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Carley, K.M., Diesner, J., Reminga, J., Tsvetovat, M.: Toward an interoperable dynamic network analysis toolkit. Decis. Support. Syst. 43(4), 1324–1347 (2007)

    Article  Google Scholar 

  6. Fiedler, M.: Algebraic connectivity of graphs. Czechoslov. Math. J. 23, 298–305 (1973)

    MathSciNet  Google Scholar 

  7. Fortunato S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  8. Graham, J.: Dynamic network analysis estimation of shared situation awareness. Ph.D. Dissertation, Carnegie Mellon University, Pittsburgh, PA (1995)

  9. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl. 1), 5228–5235 (2004)

    Article  Google Scholar 

  10. Guimerà, R., Amaral., L.A.N.: Cartography of complex networks: Modules and universal roles. J. Stat. Mech. P02001, P02001-1–P02001-13, (2005)

  11. Guimerà, R., Sales-Pardo, M., Amaral., L.A.N.: Classes of complex networks defined by role-to-role connectivity profiles. Nat. Phys. 3, 63–69 (2007)

    Article  Google Scholar 

  12. Guimerà, R., Sales-Pardo, M., Amaral., L.A.N.: Supplementary discussion: Classes of complex networks defined by role-to-role connectivity profiles. Nat. Phys. 3(1), 63–69 (2007)

    Article  Google Scholar 

  13. Harel, D., Koren, Y.: Clustering spatial data using random walks. In: Proc. of the 7th SIGKDD, pp. 281–286 (2001)

  14. Jacovi, M., Soroka, V., Gilboa-Freedman, G., Shahar, S.U.E., Marmasse, N.: The chasms of cscw: A citation graph analysis of the cscw conference. In: Proc. of the 2006 20th anniversary conference on Computer supported cooperative work, pp. 289–298 (2006)

  15. KDDCup: In: www.cs.cornell.edu/projects/kddcup/datasets.html (2003)

  16. Kossinets, G., Kleinberg, J., Watts, D.: The structure of information pathways in a social communication network. In: Proc. of the 14th SIGKDD, pp. 435–443 (2008)

  17. Kuhn, T.S.: The structure of scientific revolutions. Chicago University Press, Chicago, IL (1970)

    Google Scholar 

  18. Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: Proc. of the 12th SIGKDD, pp. 611–617 (2006)

  19. Kumpula, J.M., Kivelä, M., Kaski, K., Saramäki, J.: Sequential algorithm for fast clique percolation. Phys. Rev. E 78(2), 026109 (2008)

    Article  Google Scholar 

  20. Latapy, M., Pons, P.: Computing communities in large networks using random walks. In: Proc. of the 20th ISCIS, pp. 284–293 (2005)

  21. Leicht, E.A., Clarkson, G., Shedden, K., Newman, M.E.J.: Large-scale structure of time evolving citation networks. Eur. Phys. J. B 59, 75–83 (2007)

    Article  MATH  Google Scholar 

  22. Leskovec, J., Backstrom, L., Kumar, R., Tomkins, A.: Microscopic evolution of social networks. In: Proc. of the 14th SIGKDD, pp. 462–470 (2008)

  23. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: Densification laws, shrinking diameters and possible explanations. In: Proc. of the 11th SIGKDD, pp. 177–187 (2005)

  24. Leskovec, J., Lang, K.J., Mahoney., M.W.: Empirical comparison of algorithms for network community detection. In: Proc. of WWW (2010)

  25. Li, H., Nie, Z., Lee, W.C., Giles, L., Wen, J.R.: Scalable community discovery on textual data with relations. In: Proc. of the 17th CIKM, pp. 1203–1212 (2008)

  26. Palla, G., Dernyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)

    Article  Google Scholar 

  27. Lu, Y., Zhai, C.: Opinion integration through semi-supervised topic modeling. In: Proc. of the 17th WWW, pp. 121–130 (2008)

  28. Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proc. of the 17th WWW, pp. 101–110 (2008)

  29. Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proc. of the 11th SIGKDD, pp. 198–207 (2005)

  30. Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proc. of the 7th SIGCOMM, pp. 29–42 (2007)

  31. Morchen, F., Dejori, M., Fradkin, D., Etienne, J., Wachmann, B., Bundschus, M.: Anticipating annotations and emerging trends in biomedical literature. In: Proc. of the 14th SIGKDD, pp. 954–962 (2008)

  32. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. PRE 69(6), 066613 (2004)

    Article  Google Scholar 

  33. Pothen, A., Simon, H., Liou, K.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. A. 11, 430–452 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  34. Rogers, E.M.: Diffusion of Innovations. The Free Press, New York (1995)

    Google Scholar 

  35. Scott, J.: Social Network Analysis: A Handbook. Sage Publications, London (2000)

    Google Scholar 

  36. Shibata, N., Kajikawa, Y., Takeda, Y., Sakata, I., Matsushima, K.: Detecting emerging research fronts in regenerative medicine by the citation network analysis of scientific publications. Technol. Forecast. Soc. Chang. 78, 274–282 (2011)

    Article  Google Scholar 

  37. Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proc. of the 10th SIGKDD, pp. 306–315 (2004)

  38. Sun, Y., Tang, J., Han, J., Gupta, M., Zhao, B.: Community evolution detection in dynamic heterogeneous information networks. In: Proc. of the Eighth Workshop on Mining and Learning with Graphs (MLG), pp. 137–146 (2010)

  39. Tang, L., Liu, H., Zhang, J., Nazeri, Z.: Community evolution in dynamic multi-mode networks. In: Proc. of the 14th SIGKDD, pp. 677–685 (2008)

  40. Tantipathananandh, C., Berger-Wolf, T.Y., Kempe, D.: A framework for community identification in dynamic social networks. In: Proc. of the 13th SIGKDD, pp. 717–726 (2007)

  41. Virchow, R.: Die Cellularpathologie in ihrer Begrndung auf physiologische und pathologische Gewebelehre. A. Hirschwald, Berlin (1858)

  42. Wang, X., McCallum, A.: Topics over time: A non-markov continuous time model of topical trends. In: Proc. of the 12th SIGKDD, pp. 424–433 (2006)

  43. Wei, F., Qian, W., Wang, C., Zhou, A.: Detecting overlapping community structures in networks. World Wide Web J. 12, 235–261 (2009)

    Article  Google Scholar 

  44. Wang, X., Zhai, C., Hu, X., Sproat, R.: Mining correlated bursty topic patterns from coordinated text streams. In: Proc. of the 13th SIGKDD, pp. 784–793 (2007)

  45. Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B.T., Liu, X.: Learning approaches for detecting and tracking news events. IEEE Intell. Syst. 14, 32–43 (1999)

    Article  Google Scholar 

  46. Zhou, D., Councill, I., Zha, H., Giles, C.L.: Discovering temporal communities from social network documents. In: Proc. of the 7th ICDM, pp. 745–750 (2007)

  47. Zhou, D., Ji, X., Zha, H., Giles, C.L.: Topic evolution and social interactions: how authors effect research. In: Proc. of the 15th CIKM, pp. 248–257 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tieyun Qian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, T., Li, Q., Liu, B. et al. Topic formation and development: a core-group evolving process. World Wide Web 17, 1343–1373 (2014). https://doi.org/10.1007/s11280-013-0245-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0245-1

Keywords

Navigation