Skip to main content
Log in

An approach based on mixed hierarchical clustering and optimization for graph analysis in social media network: toward globally hierarchical community structure

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

As the massive size of contemporary social networks poses a serious challenge to the scalability of traditional graph clustering algorithms and the evaluation of discovered communities, we develop, in this manuscript, an approach used to discover hierarchical community structure in large networks. The introduced hybrid technique combines the strengths of bottom-up hierarchical clustering method with that of top-down hierarchical clustering. In fact, the first approach is efficient in identifying small clusters, while the second one is good at determining large ones. Our mixed hierarchical clustering technique, based on the assumption that there exists an initial solution composed of k classes and the combination of the two previously mentioned methods, does not the change of the number of partitions, modifies the repartition of the initial classes. At the end of the introduced clustering process, a fixed point, representing a local optimum of the cost function which measures the degree of importance between two partitions, is obtained. Consequently, the introduced combined model leads to the emergence of local community structure. To avoid this local optimum and detect community structure converged to the global optimum of the cost function, the detection of community structures, in this study, is not considered only as a clustering problem, but as an optimization issue. Besides, a novel mixed hierarchical clustering algorithm based on swarms intelligence is suggested for identifying community structures in social networks. In order to validate the proposed method, performances of the introduced approach are evaluated using both real and artificial networks as well as internal and external clustering evaluation criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. http://140dev.com/free-twitter-api-source-code-library/.

Abbreviations

SHC:

Similarity-based hierarchical community

HAMUHI-CODE:

Heuristic algorithm for multi-scale hierarchical community detection

PMAC:

Partial matrix approximation convergence

SN:

Social network

JS:

Jaccard similarity measure

AgA :

Agglomerative algorithm

DST:

Dependence similarity table

AHL:

Ascendant hierarchical level

DivA:

Divisive algorithm

DHL:

Descendant hierarchical level

MHA:

Mixed hierarchical algorithm

T-D-H-L:

Top-down hierarchical level

B-U-H-L:

Bottom-up hierarchical level

MHAS:

Mixed hierarchical algorithm-based swarms

AntCDivA:

Ant colony-based divisive algorithm

BeeCAgA:

Bee colony-based agglomerative algorithm

LFR benchmark:

Lancichinetti Fortunato Radicchi benchmark

CEC:

Cross-entropy clustering

NMI:

Normalized mutual information

DBI:

Davies–Bouldin index

PGP:

Pretty good privacy

SI:

Swarm intelligence

\(Q_\mathrm{comb}\) :

Combined modularity function

\(Q_\mathrm{comb}\) :

Separated modularity function

\(\mathrm{SN} = (V; E; \mu )\) :

Graph modeling SN

V :

Nodes representing to social network members

E :

Edges modeling the relationship between social network members

\(\mu \) :

Weight of edges

n :

Number of nodes

\(\ell \) :

Hierarchical level

k :

Number of sub-detected partitions at each hierarchical level

\(P=\{p_{1},p_{2},\ldots ,p_{s}\}\), \(G=\{g_{1},g_{2},\ldots ,g_{r}\}\), \(C=\{c_{1},c_{2},\ldots ,c_{s}\}\) :

SN detected partitions

\(p_{1},p_{2},\ldots ,p_{s}\), \(g_{1},g_{2},\ldots ,g_{r}\), \(c_{1},c_{2},\ldots ,c_{s}\) :

Sub-partitions

m :

Social network members’

D :

Any element contained in SN partitions

A[ij]:

The adjacency matrix of SN

\(\overline{A}{[}i{]}\) :

Average of the vector A[i]

cov(\(E_{i,j}\)):

Covariance function

Op(\(V_{i}\)):

Extracted opinions from the node\(V_{i}\)

Op(\(V_{j}\)):

Extracted opinions from the node\(V_{j}\).

\(N_{i}\) :

Neighbor of node i

\(N_{j}\) :

Neighbor of node j

\(Score_{importantOp}\) :

Function measuring the degree of importance of nodes

\(GScore_{importantOp}\) :

General \(GScore_{importantOp}\)

\(MoyScore_{importantOp}\) :

Average of \(Score_{importantOp}\) of sub-partitions

Initpart:

Initial partition

cordMin:

Function returning m having the least \(Score_{importantOp}\) value

cordMax:

Function returning m having the highest \(Score_{importantOp}\) value

\(Q_{DS}\) :

Dependance similarity-based modularity

\(AgQ_{DS}\) :

\(Q_{DS}\) function for BeeCAgA

\(DivQ_{DS}\) :

\(Q_{DS}\) function for AntCDivA

\(MixQ_{DS}\) :

\(Q_{DS}\) function for MHAS

E :

Energy function

References

  1. Aggarwal CC (2011) An introduction to social network data analytics. In: Social network data analytics. Springer, Berlin, pp 1–15

  2. Ahn JP, Bagrow Y-Y, Lehmann S (2010) Link communities reveal multi-scale complexity in networks. Nature 446:761

    Article  Google Scholar 

  3. Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761

    Article  Google Scholar 

  4. Ahn YY, Lehmann S, Bagrow JP (2009) Communities and hierarchical organization of links in complex networks. arXiv:0903.3178

  5. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008

    Article  Google Scholar 

  6. Boguná M, Pastor-Satorras R, Díaz-Guilera A, Arenas A (2004) Models of social networks based on social distance attachment. Phys Rev E 70(5):056122

    Article  Google Scholar 

  7. Cai Q, Ma L, Gong M, Tian D (2016) A survey on network community detection based on evolutionary computation. Int J Bio Inspir Comput 8(2):84–98

    Article  Google Scholar 

  8. Castrillo E, Leon E, Gomez J (2017) Fast heuristic algorithm for multi-scale hierarchical community detection. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 982–989

  9. Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111

    Article  Google Scholar 

  10. Danon L, DÃaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 2005(09), P09008. Retrieved from http://stacks.iop.org/1742-5468/2005/i=09/a=P09008

  11. Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104

    Article  Google Scholar 

  12. Dutta S, Ghatak S, Roy M, Ghosh S, Das AK (2015) A graph based clustering technique for tweet summarization. In: 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO) (trends and future directions), pp 1–6

  13. Fortunato S (2011) Benchmark graphs to test community detection algorithms. https://sites.google.com/site/santofortunato/inthepress2)

  14. Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41

    Article  Google Scholar 

  15. Fortunato S (2007) Community detection in graphs. Phys Rep 486:75–174

    Article  MathSciNet  Google Scholar 

  16. Frenken K, Mendritzki S (2012) Optimal modularity: a demonstration of the evolutionary advantage of modular architectures. J Evol Econ 22(5):935–956

    Article  Google Scholar 

  17. Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826

    Article  MathSciNet  MATH  Google Scholar 

  18. Gonzalez-Pardo A, Jung JJ, Camacho D (2017) Aco-based clustering for ego network analysis. Fut Gener Comput Syst 66:160–170

    Article  Google Scholar 

  19. Guimera R, Sales-Pardo M, Amaral LAN (2007) Module identification in bipartite and directed networks. Retrieved from http://arxiv.org/abs/physics/0701151 (cite arXiv:physics/0701151)

  20. Gulbahce N, Lehmann S (2008) The art of community detection. BioEssays 30(10):934–938

    Article  Google Scholar 

  21. Harrington J, Salibián-Barrera M (2010) Finding approximate solutions to combinatorial problems with very large data sets using birch. Comput Stat Data Anal 54(3):655–667

    Article  MathSciNet  MATH  Google Scholar 

  22. Herrmann S, Ochoa G, Rothlauf F (2016) Communities of local optima as funnels in fitness landscapes. In: Proceedings of the genetic and evolutionary computation conference 2016, pp 325–331

  23. John Lu Z (2010) The elements of statistical learning: data mining, inference, and prediction. J R Stat Soc Ser A (Stat Soc) 173(3):693–694

    Article  Google Scholar 

  24. Kim B, Kim J, Yi G (2017) Analysis of clustering evaluation considering features of item response data using data mining technique for setting cut-off scores. Symmetry 9(5):62

    Article  Google Scholar 

  25. Kim Y, Son S-W, Jeong H (2010) Finding communities in directed networks. Phys Rev E 81(1):016103

    Article  Google Scholar 

  26. Li Y, He K, Bindel D, Hopcroft J (2015) Overlapping community detection via local spectral clustering. arXiv preprint arXiv:1509.07996

  27. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Assoc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  28. Liu Y, Wang Q, Wang Q, Yao Q, Liu Y (2007) Email community detection using artificial ant colony clustering. In: Advances in web and network technologies, and information management. Springer, Berlin, pp 287–298

  29. LIU Y, YANG T, FU L, LIU J (2015) Community detection in networks based on information bottleneck clustering. J Comput Inf Syst 11(2):693–700

    Google Scholar 

  30. Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405

    Article  Google Scholar 

  31. Mathias SB, Rosset V, Nascimento M (2016) Community detection by consensus genetic-based algorithm for directed networks. Proc Comput Sci 96:90–99

    Article  Google Scholar 

  32. Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl Based Syst 84:144–161

    Article  Google Scholar 

  33. Newman M (2004) Detecting community structure in networks. Eur Phys J 38:321–330

    Article  Google Scholar 

  34. Newman ME (2006a) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104

    Article  MathSciNet  Google Scholar 

  35. Newman ME (2006b) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582

    Article  Google Scholar 

  36. Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  Google Scholar 

  37. Papadopoulos KYVAS, Spyridonos P (2012) Community detection in social media. Data Min Knowl Discov 24:515–554

    Article  Google Scholar 

  38. Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Computer and information sciences-ISCIS 2005. Springer, Berlin, pp 284–293

  39. Ratkiewicz J, Conover M, Meiss MR, Goncalves B, Flammini, A., Menczer F (2011) Detecting and tracking political abuse in social media. In: ICWSM11, pp 297–304

  40. Ravasz E, Barabasi A-L (2003) Hierarchical organization in complex networks. Phys Rev E67(2):026112

    MATH  Google Scholar 

  41. Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417

    Article  Google Scholar 

  42. Richardson T, Mucha PJ, Porter MA (2009) Spectral tripartitioning of networks spectral tripartitioning of networks. Phys Rev E 80(3):036111

    Article  Google Scholar 

  43. Rosset V, Paulo MA, Cespedes JG, Nascimento M (2017) Enhancing the reliability on data delivery and energy efficiency by combining swarm intelligence and community detection in large-scale WSNs. Exp Syst Appl 78:89–102

    Article  Google Scholar 

  44. Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci 104(18):7327–7331

    Article  Google Scholar 

  45. Soumi D, Roy M, Ghosh S, Das AK, Sujata. (n.d.). A graph based clustering technique for tweet summarization, pp 4673–7231

  46. Spurek P (2017) Split-and-merge tweak in cross entropy clustering. In: Computer information systems and industrial management: 16th IFIP TC8 international conference, CISIM 2017, Bialystok, Poland, June 16–18, 2017, proceedings, vol 10244, p 193

  47. Staudt CL, Meyerhenke H (2016) Engineering parallel algorithms for community detection in massive networks. IEEE Trans Paral Distrib Syst 27(1):171–184

    Article  Google Scholar 

  48. Talbi M (2013) Une nouvelle approche de detection de communautes dans les reseaux sociaux (Unpublished doctoral dissertation). Universite du Quebec en Outaouais

  49. Toujani R, Akaichi J (2017) Fuzzy sentiment classification in social network Facebook’statuses mining. In: 2017 international conference on information and digital technologies (IDT), pp 393–397

  50. Toujani R, Akaichi J (2015) Machine learning and metaheuristic for sentiment analysis in social networks. In: Proceedings of the metaheuristic internatianal conference (MIC’15)

  51. Toujani R, Akaichi J (2017) Optimal initial partitionning for high quality hybrid hierarchical community detection in social networks. In Proceedings of the international conference on control, decision and information technologies (\({\rm {codit}}^{TM}\)17)

  52. Van Laarhoven T, Marchiori E (2016) Local network community detection with continuous optimization of conductance and weighted kernel k-means. J Mach Learn Res 17(147):1–28

    MathSciNet  MATH  Google Scholar 

  53. Wang Z, Li Z, Yuan G, Sun Y, Rui X, Xiang X (2018) Tracking the evolution of overlapping communities in dynamic social networks. Knowl Based Syst 157:81–97

    Article  Google Scholar 

  54. Wu J, Hou Y, Jiao Y, Li Y, Li X, Jiao L (2015) Density shrinking algorithm for community detection with path based similarity. Phys A Stat Mech Appl 433:218–228

    Article  Google Scholar 

  55. Xi J, Zhan W, Wang Z (2016) Hierarchical community detection algorithm based on node similarity. Int J Database Theory Appl 9(6):209–218

    Article  Google Scholar 

  56. Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv (CSUR) 45(4):43

    Article  MATH  Google Scholar 

  57. Xu L, Dong-Yun Y (2011) Complex network community detection by local similarity. Acta Autom Sin 37(12):1520–1529

    MATH  Google Scholar 

  58. Yang Z, Algesheimer R, Tessone CJ (2016) A comparative analysis of community detection algorithms on artificial networks. Sci Rep 6:30750

    Article  Google Scholar 

  59. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Article  Google Scholar 

  60. Zhang W, Kong F, Yang L, Chen Y, Zhang M (2018) Hierarchical community detection based on partial matrix convergence using random walks. Tsinghua Sci Technol 1:004

    Google Scholar 

  61. Zhi-Xiao W, Ze-chao L, Xiao-fang D, Jin-hui T (2016) Overlapping community detection based on node location analysis. Knowl Based Syst 105:225–235

    Article  Google Scholar 

  62. Zhou C, Feng L, Zhao Q (2018) A novel community detection method in bipartite networks. Phys A Stat Mech Appl 492:1679–1693

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Radhia Toujani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Toujani, R., Akaichi, J. An approach based on mixed hierarchical clustering and optimization for graph analysis in social media network: toward globally hierarchical community structure. Knowl Inf Syst 60, 907–947 (2019). https://doi.org/10.1007/s10115-019-01329-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01329-2

Keywords

Navigation