Abstract
In the context of textual analysis, network-based procedures for topic detection are gaining attention as an alternative to classical topic models. Network-based procedures are based on the idea that documents can be represented as word co-occurrence networks, where topics are defined as groups of strongly connected words. Although many works have used network-based procedures for topic detection, there is a lack of systematic analysis of how different design choices, such as the building of the word co-occurrence matrix and the selection of the community detection algorithm, affect the final results in terms of detected topics. In this work, we present the results obtained by analysing a widely used corpus of news articles, showing how and to what extent the choices made during the design phase affect the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alghamdi, R., Alfalqi, K.: A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 6, 147–153 (2015)
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: A brief survey of text mining: Classification, clustering and extraction techniques, pp. 1–13 (2017). arXiv:1707.02919
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blondel, V.D., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 1–12 (2008)
Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39, 510–526 (2007)
Dang, T., Nguyen, V.T.: ComModeler: topic modeling using community detection. In: Tominski, C., von Landesberger, T. (eds.), EuroVis Workshop on Visual Analytics (EuroVA). The Eurographics Association, pp. 1–5. (CH) (2018)
de Arruda, H.F., Costa, L.F., Amancio, D.R.: Topic segmentation via community detection in complex networks. Chaos 26, 1–10 (2015)
Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings 23rd International Conference on Machine learning (ICML’06), pp. 377–384. ACM Press, New York (2006)
Hamm, A., Odrowski, S.: Term-community-based topic detection with variable resolution. Information 12, 221–252 (2021)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Kim, M., Sayama, H.: The power of communities: a text classification model with automated labeling process using network community detection. In: International Conference on Network Science, pp. 231–243. Springer, Berlin (2020)
Lancichinetti, A., Sirer, M.I., Wang, J.X., Acuna, D., K öording, K., Amaral, L.A.N.: High-reproducibility and high-accuracy method for automated topic classification. Phys. Rev. X. 5, 1–11 (2015)
Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 1–2 (2006)
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)
Salerno, M.D., Tataru, C.A., Mallory, M.R.: Word community allocation: discovering latent topics via word co-occurrence network structure (2015). http://snap.stanford.edu/class/cs224w-2015/projects_2015/Word_Community_Allocation.pdf
Sayyadi, H., Raschid, L.: A graph analytical approach for topic detection. ACM Trans. Internet Technol. 1–23 (2013)
Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. IInf. Process. Manage. 50, 104–112 (2014)
Usai, A., Pironti, M., Mital, M., Mejri, C.A.: Knowledge discovery out of text data: a systematic review via text mining. J. Knowl. Manag. 22, 1471–1488 (2018)
Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv. 45, 1–35 (2013)
Acknowledgements
The authors acknowledge the financial support provided by the “Dipartimenti Eccellenti 2018–2022” ministerial funds. This work has also been partly funded by eSSENCE, an e-Science collaboration funded as a strategic research area of Sweden, and by EU CEF grant number 2394203 (NORDIS—NORdic observatory for digital media and information DISorder).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Galluccio, C., Magnani, M., Vega, D., Ragozini, G., Petrucci, A. (2023). Robustness and Sensitivity of Network-Based Topic Detection. In: Cherifi, H., Mantegna, R.N., Rocha, L.M., Cherifi, C., Micciche, S. (eds) Complex Networks and Their Applications XI. COMPLEX NETWORKS 2016 2022. Studies in Computational Intelligence, vol 1078. Springer, Cham. https://doi.org/10.1007/978-3-031-21131-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-21131-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21130-0
Online ISBN: 978-3-031-21131-7
eBook Packages: EngineeringEngineering (R0)