Abstract
As complex networks become ubiquitous, an increasing emphasis is being put on identifying topic clusters from the hidden networks. The result of identification can provide valuable support for topic-based entity search. However, current identification models focus on limited factors (either meta-description or entity feature of networks) and lack the derivation process for topic clusters, which often lead to dumb results. In this paper, we present a two-level identification model called MetaEntity to identify meta-level topic clusters and entity-level topic clusters. Unlike traditional approaches, both meta-description and entity feature are fully used to improve the accuracy of identification. Specially, MetaEntity can support identification from multiple points of view. In addition, an interactive derivation algorithm is proposed to incrementally maintain the identified topic clusters. As a result, quality of both meta-level topic clusters and entity-level topic clusters is mutually promoted, which means both of them are getting more accurate and complete. The experiments demonstrate the feasibility and effectiveness of our algorithms compared with traditional approaches.
Similar content being viewed by others
References
Aditya, P., Ashwin, S., Mukund, S.: EigenSpokes: surprising patterns and scalable community chipping in large graphs. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’10), pp. 435–448. Springer, Berlin (2010)
Alonso, O., Gertz, M., Baezayates, R.: Clustering and exploring search results using timeline constructions. In: Proceedings of the 18th ACM International Conference on Information and Knowledge Management (CIKM’09), pp. 97–106. ACM, New York (2009)
Azarias, R., Park, Y.B., Mitul, T., Christian, P., Sam, S.: Metaphor: a system for related search recommendations. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 664–673. ACM, New York (2012)
Chi, Y., Song, X.D., Zhou, D.Y.: Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07), pp. 153–162. ACM, New York (2007)
Gupta, C., Grossman, R.G.: GenIc: a single pass generalized incremental algorithm for clustering. In: Proceedings of the 4th SIAM International Conference on Data Mining (SIAM’04), pp. 147–153. (2004)
He, B., Tao, T., Chang, K.C.C.: Organizing structured web sources by query schemas: a clustering approach. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04), pp. 22–31. ACM, New York (2004)
Ioannis, A., Hector, G., Chi, C.: Simrank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008)
Kang, U., Brendan, M., Christos, F.: Spectral Analysis for Billion-Scale Graphs: discoveries and implementation. In: Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’11), pp. 13–25. Springer, Berlin (2011)
Kou, Y., Shen, D.R., Nie, T.Z., Yu, G.: Exploring related entities for topic-level dataspaces search. J. Comput. Inf. Syst. 8(3), 1097–1104 (2012)
Lee, J., Hwang, S., Nie, Z.: Query result clustering for object-level search. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 1205–1214. ACM, New York (2009)
Leman, A., Mary, M., Christos, F.O.: Spotting anomalies in weighted graphs. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’10), pp. 410–421. Springer, Berlin (2010)
Leroy, V., Cambazoglu, B.B., Bonchi, F.: Cold start link prediction. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10), pp. 393–402. ACM, New York (2010)
Lichtenwalter, R.N., Lussier, J.T., Chawla, N.V.: New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10), pp. 243–252. ACM, New York (2010)
Liu, W., Meng, X.F., Meng, W.Y.: ViDE: a vision-based approach for deep web data extraction. Knowl. Data Eng. 22(3), 447–460 (2010)
Mahmoud, H.A., Aboulnaga, A.: Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10), pp. 411–422. ACM, New York (2010)
Manish, G., Charu, C.A., Han, J.W.: Evolutionary clustering and analysis of bibliographic networks. In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11), pp. 63–70. IEEE, New York (2011)
Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Ning, H., Xu, W., Chi, Y.: Incremental spectral clustering with application to monitoring of evolving blog communities. In: Proceedings of the 7th SIAM International Conference on Data Mining (SIAM’07), pp. 261–272. (2007)
Prakash, M.C., Tan, P.N., Anil, K.J.: A framework for joint community detection across multiple related networks. In: Proceedings of the 7th International Symposium on Neural Networks (ISNN’10), pp. 93–104. Springer, Berlin ((2010)
Prakash, M.C., Tan, P.N., Anil, K.J.: Identifying cohesive subgroups and their correspondences in multiple related networks. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology (WI-IAT’10), pp. 476–483. IEEE, New York (2010)
Roy, S.B., Yahia, S.A., Chawla, A.: Constructing and exploring composite items. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10), pp. 843–854. ACM, New York (2010)
Shi, C., Kong, X.N., Philip, S.Y., Xie, S.H., Wu, B.: Relevance search in heterogeneous networks. EDBT. 180–191 (2012)
Sun, Y.Z., Brandon, N., Han, J.W.: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12), pp. 1348–1356. ACM, New York (2012a)
Sun, Y.Z., Han, J.W., Charu, C.A.: When will it happen?: relationship prediction in heterogeneous information networks. In: Proceedings of the 5th International Conference on Web Search and Web Data Mining (WSDM’12), pp. 663–672. ACM, New York (2012b)
Sun, Y.Z., Han, J.W., Yan, X.F.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. PVLDB 4(11), 992–1003 (2011)
Sun, Y.Z., Rick, B., Manish, G.: Co-author relationship prediction in heterogeneous bibliographic networks. In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11), pp. 121–128. IEEE, New York (2011)
Sun, Y.Z., Yu, Y.T., Han, J.W.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 797–806. ACM, New York (2009)
Tang, J., Zhang, J., Jin, R.M., Yang, Z.: Topic level expertise search over heterogeneous networks. Mach. Learn. 82(2), 211–237 (2011)
Valter, C., Giansalvatore, M., Paolo, M.: RoadRunner: towards automatic data extraction from large web sites. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB’01), pp. 109–118. Morgan Kaufmann/ACM, New York (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kou, Y., Shen, D., Xu, H. et al. Two-level interactive identification and derivation of topic clusters in complex networks. World Wide Web 18, 1093–1122 (2015). https://doi.org/10.1007/s11280-014-0310-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-014-0310-4