Skip to main content

Graph Community Detection: Normalized Compression Distance Based Implementation for Text Data

  • Conference paper
  • First Online:
  • 768 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 565))

Abstract

Community detection algorithms are widely used to study the structural properties of real-world networks. In this paper, we experimentally evaluate the qualitative performance of several community detection algorithms based on normalized compression distance (NCD) using diversified datasets like documents, feeds, articles and blogs. We compare the quality of the algorithms based on F-score measure. Text data when given as input to NCD performs better when compare to conventional feed like document term matrix as input. Finally, we reveal that label propagation community detection algorithm is more suitable for clustering text data as compare to other community detection algorithms and it creates more distinct communities on diversified data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Li, M., Vitanyi, P.M.B.: An Introduction to Kolmogorov Complexity and its Applications, 2nd edn. Springer, New York (1997)

    Book  MATH  Google Scholar 

  2. Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inf. Theor. 51(4), 1523–1545 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bennett, C.H., Gács, P., Li, M., Vitányi, P.M.B., Zurek, W.: Information distance. IEEE Trans. Inf. Theor. 44(4), 1407–1423 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  4. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)

    Article  Google Scholar 

  5. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004)

    Article  Google Scholar 

  6. Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Proceedings of Computer and Information Sciences (Iscis 2005), vol. 3733, pp. 284-293 (2005)

    Google Scholar 

  7. Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006)

    Article  MathSciNet  Google Scholar 

  8. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 036106 (2007)

    Article  Google Scholar 

  9. Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys. Rev. E 74, 016110 (2006)

    Article  MathSciNet  Google Scholar 

  10. Strogatz, S.H.: Exploring complex networks. Nature 410, 268–276 (2001)

    Article  Google Scholar 

  11. Albert, R., Barabási, A.-L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  12. Dorogovtsev, S.N., Mendes, J.F.F.: Evolution of networks. Adv. Phys. 51, 1079–1187 (2002)

    Article  Google Scholar 

  13. Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49, 291–307 (1970)

    Article  MATH  Google Scholar 

  15. Fiedler, M.: Algebraic connectivity of graphs. Czech. Math. J. 23, 298–305 (1973)

    MathSciNet  MATH  Google Scholar 

  16. Pothen, A., Simon, H., Liou, K.-P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11, 430–452 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  17. Scott, J.: Social Network Analysis: A Handbook, 2nd edn. Sage, London (2000)

    Google Scholar 

  18. Newman, M.E.J.: Detecting community structure in networks. Eur. Phys. J. B 38, 321–330 (2004)

    Article  Google Scholar 

  19. Yang, A.C.-C., Peng, C.-K., Yien, H.-W., Goldberger, A.L.: Information categorization approach to literary authorship disputes. Physica A 329, 473–483 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  20. Wehner, S.: Analyzing network traffic and worms using compression. Manuscript, CWI (2004). http://homepages.cwi.nl/wehner/worms/

  21. Keogh, E., Lonardi, S., Rtanamahatana, C.A.: Toward parameter free data mining. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, pp. 206–215, 22–25 August (2004)

    Google Scholar 

  22. Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the ICML (2006)

    Google Scholar 

  23. http://www.parrotanalytics.com/pacific-asia-knowledge-discovery-and-data-mining-conference-2016-contest/

  24. http://ankara.lti.cs.cmu.edu/side/

  25. http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop

  26. http://complearn.org/

  27. http://stackoverflow.com/questions/12725263/computing-f-measure-for-clustering

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhishek Sanwaliya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Sanwaliya, A., Chinnamgari, S.K., Desai, A., Saha, A. (2018). Graph Community Detection: Normalized Compression Distance Based Implementation for Text Data. In: Abraham, A., Haqiq, A., Ella Hassanien, A., Snasel, V., Alimi, A. (eds) Proceedings of the Third International Afro-European Conference for Industrial Advancement — AECIA 2016. AECIA 2016. Advances in Intelligent Systems and Computing, vol 565. Springer, Cham. https://doi.org/10.1007/978-3-319-60834-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60834-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60833-4

  • Online ISBN: 978-3-319-60834-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics