Skip to main content
Log in

Two-level interactive identification and derivation of topic clusters in complex networks

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

As complex networks become ubiquitous, an increasing emphasis is being put on identifying topic clusters from the hidden networks. The result of identification can provide valuable support for topic-based entity search. However, current identification models focus on limited factors (either meta-description or entity feature of networks) and lack the derivation process for topic clusters, which often lead to dumb results. In this paper, we present a two-level identification model called MetaEntity to identify meta-level topic clusters and entity-level topic clusters. Unlike traditional approaches, both meta-description and entity feature are fully used to improve the accuracy of identification. Specially, MetaEntity can support identification from multiple points of view. In addition, an interactive derivation algorithm is proposed to incrementally maintain the identified topic clusters. As a result, quality of both meta-level topic clusters and entity-level topic clusters is mutually promoted, which means both of them are getting more accurate and complete. The experiments demonstrate the feasibility and effectiveness of our algorithms compared with traditional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aditya, P., Ashwin, S., Mukund, S.: EigenSpokes: surprising patterns and scalable community chipping in large graphs. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’10), pp. 435–448. Springer, Berlin (2010)

    Google Scholar 

  2. Alonso, O., Gertz, M., Baezayates, R.: Clustering and exploring search results using timeline constructions. In: Proceedings of the 18th ACM International Conference on Information and Knowledge Management (CIKM’09), pp. 97–106. ACM, New York (2009)

    Google Scholar 

  3. Azarias, R., Park, Y.B., Mitul, T., Christian, P., Sam, S.: Metaphor: a system for related search recommendations. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 664–673. ACM, New York (2012)

    Google Scholar 

  4. Chi, Y., Song, X.D., Zhou, D.Y.: Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07), pp. 153–162. ACM, New York (2007)

    Chapter  Google Scholar 

  5. Gupta, C., Grossman, R.G.: GenIc: a single pass generalized incremental algorithm for clustering. In: Proceedings of the 4th SIAM International Conference on Data Mining (SIAM’04), pp. 147–153. (2004)

    Chapter  Google Scholar 

  6. He, B., Tao, T., Chang, K.C.C.: Organizing structured web sources by query schemas: a clustering approach. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04), pp. 22–31. ACM, New York (2004)

    Google Scholar 

  7. Ioannis, A., Hector, G., Chi, C.: Simrank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008)

    Google Scholar 

  8. Kang, U., Brendan, M., Christos, F.: Spectral Analysis for Billion-Scale Graphs: discoveries and implementation. In: Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’11), pp. 13–25. Springer, Berlin (2011)

    Google Scholar 

  9. Kou, Y., Shen, D.R., Nie, T.Z., Yu, G.: Exploring related entities for topic-level dataspaces search. J. Comput. Inf. Syst. 8(3), 1097–1104 (2012)

    Google Scholar 

  10. Lee, J., Hwang, S., Nie, Z.: Query result clustering for object-level search. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 1205–1214. ACM, New York (2009)

    Chapter  Google Scholar 

  11. Leman, A., Mary, M., Christos, F.O.: Spotting anomalies in weighted graphs. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’10), pp. 410–421. Springer, Berlin (2010)

    Google Scholar 

  12. Leroy, V., Cambazoglu, B.B., Bonchi, F.: Cold start link prediction. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10), pp. 393–402. ACM, New York (2010)

    Chapter  Google Scholar 

  13. Lichtenwalter, R.N., Lussier, J.T., Chawla, N.V.: New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10), pp. 243–252. ACM, New York (2010)

    Chapter  Google Scholar 

  14. Liu, W., Meng, X.F., Meng, W.Y.: ViDE: a vision-based approach for deep web data extraction. Knowl. Data Eng. 22(3), 447–460 (2010)

    Article  Google Scholar 

  15. Mahmoud, H.A., Aboulnaga, A.: Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10), pp. 411–422. ACM, New York (2010)

    Google Scholar 

  16. Manish, G., Charu, C.A., Han, J.W.: Evolutionary clustering and analysis of bibliographic networks. In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11), pp. 63–70. IEEE, New York (2011)

    Google Scholar 

  17. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)

    Article  Google Scholar 

  18. Ning, H., Xu, W., Chi, Y.: Incremental spectral clustering with application to monitoring of evolving blog communities. In: Proceedings of the 7th SIAM International Conference on Data Mining (SIAM’07), pp. 261–272. (2007)

    Chapter  Google Scholar 

  19. Prakash, M.C., Tan, P.N., Anil, K.J.: A framework for joint community detection across multiple related networks. In: Proceedings of the 7th International Symposium on Neural Networks (ISNN’10), pp. 93–104. Springer, Berlin ((2010)

    Google Scholar 

  20. Prakash, M.C., Tan, P.N., Anil, K.J.: Identifying cohesive subgroups and their correspondences in multiple related networks. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology (WI-IAT’10), pp. 476–483. IEEE, New York (2010)

    Google Scholar 

  21. Roy, S.B., Yahia, S.A., Chawla, A.: Constructing and exploring composite items. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10), pp. 843–854. ACM, New York (2010)

    Google Scholar 

  22. Shi, C., Kong, X.N., Philip, S.Y., Xie, S.H., Wu, B.: Relevance search in heterogeneous networks. EDBT. 180–191 (2012)

  23. Sun, Y.Z., Brandon, N., Han, J.W.: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12), pp. 1348–1356. ACM, New York (2012a)

    Chapter  Google Scholar 

  24. Sun, Y.Z., Han, J.W., Charu, C.A.: When will it happen?: relationship prediction in heterogeneous information networks. In: Proceedings of the 5th International Conference on Web Search and Web Data Mining (WSDM’12), pp. 663–672. ACM, New York (2012b)

    Chapter  Google Scholar 

  25. Sun, Y.Z., Han, J.W., Yan, X.F.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. PVLDB 4(11), 992–1003 (2011)

    Google Scholar 

  26. Sun, Y.Z., Rick, B., Manish, G.: Co-author relationship prediction in heterogeneous bibliographic networks. In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11), pp. 121–128. IEEE, New York (2011)

    Chapter  Google Scholar 

  27. Sun, Y.Z., Yu, Y.T., Han, J.W.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 797–806. ACM, New York (2009)

    Chapter  Google Scholar 

  28. Tang, J., Zhang, J., Jin, R.M., Yang, Z.: Topic level expertise search over heterogeneous networks. Mach. Learn. 82(2), 211–237 (2011)

    Article  MathSciNet  Google Scholar 

  29. Valter, C., Giansalvatore, M., Paolo, M.: RoadRunner: towards automatic data extraction from large web sites. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB’01), pp. 109–118. Morgan Kaufmann/ACM, New York (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Kou.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kou, Y., Shen, D., Xu, H. et al. Two-level interactive identification and derivation of topic clusters in complex networks. World Wide Web 18, 1093–1122 (2015). https://doi.org/10.1007/s11280-014-0310-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-014-0310-4

Keywords

Navigation