Abstract
Online social networks (OSNs) offer people the opportunity to join communities where they share a common interest or objective. This kind of community is useful for studying the human behavior, diffusion of information, and dynamics of groups. As the members of a community are always changing, an efficient solution is needed to query information in real time. This paper introduces the Follow Model to present the basic relationship between users in OSNs, and combines it with the MapReduce solution to develop new algorithms with parallel paradigms for querying. Two models for reverse relation and high-order relation of the users were implemented in the Hadoop system. Based on 75 GB message data and 26 GB relation network data from Twitter, a case study was realized using two dynamic discussion communities: #musicmonday and #beatcancer. The querying performance demonstrates that the new solution with the implementation in Hadoop significantly improves the ability to find useful information from OSNs.
Similar content being viewed by others
References
Anagnostopoulos, A., Kumar, R., Mahdian, M., 2008. Influence and correlation in social networks. Proc. 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.7–15. [doi:10.1145/1401890.1401897]
Bhandarkar, M., 2010. MapReduce programming with Apache Hadoop. 24th IEEE Int. Parallel & Distributed Processing Symp., p.1. [doi:10.1109/IPDPS.2010.5470377]
Bialecki, A., Cafarella, M., Cutting, D., et al., 2005. Hadoop: a framework for running applications on large clusters built of commodity hardware. Available from http://lucene.apache.org/hadoop.
Cha, M., Haddadi, H., Benevenuto, F., et al., 2010. Measuring user influence in Twitter: the million follower fallacy. Proc. 4th Int. AAAI Conf. on Weblogs and Social Media, p.10–17.
Chen, W., Wang, Y., Yang, S., 2009. Efficient influence maximization in social networks. Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.199–208. [doi:10.1145/1557019.1557047]
Dean, J., Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113. [doi:10.1145/1327452.1327492]
Goyal, A., Bonchi, F., Lakshmanan, L.V.S., 2010. Learning influence probabilities in social networks. Proc. 3rd ACM Int. Conf. on Web Search and Data Mining, p.241–250. [doi:10.1145/1718487.1718518]
Karypis, G., Aggarwal, R., Kumar, V., et al., 1999. Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans. VLSI, 7(1):69–79. [doi:10.1109/92.748202]
Kwak, H., Lee, C., Park, H., et al., 2010. What is Twitter, a social network or a news media? Proc. 19th Int. Conf. on World Wide Web, p.591–600. [doi:10.1145/1772690.1772751]
Liben-Nowell, D., Kleinberg, J., 2007. The link-prediction problem for social networks. J. Amer. Soc. Inform. Sci. Technol., 58(7):1019–1031. [doi:10.1002/asi.20591]
Lü, L., Zhou, T., 2011. Link prediction in complex networks: a survey. Phys. A, 390(6):1150–1170. [doi:10.1016/j.physa. 2010.11.027]
Sandes, E.F.O., Weigang, L., de Melo, A.C.M.A., 2012. Logical model of relationship for online social networks and performance optimization of queries. LNCS, 7651:726–736. [doi:10.1007/978-3-642-35063-4_59]
Sun, Y., Han, J., Aggarwal, C.C., et al., 2012. When will it happen?—relationship prediction in heterogeneous information networks. Proc. 5th ACM Int. Conf. on Web Search and Data Mining, p.663–672. [doi:10.1145/2124295.2124373]
Tang, J., Sun, J., Wang, C., et al., 2009. Social influence analysis in large-scale networks. Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.807–816. [doi:10.1145/1557019.1557108]
Tang, Z., Lin, H., Li, K., et al., 2012. Acolyte: an in-memory social network query system. Proc. 13th Int. Conf. on Web Information Systems Engineering, p.755–763. [doi:10.1007/978-3-642-35063-4_62]
Theobald, M., Bast, H., Majumdar, D., et al., 2008. TopX: efficient and versatile top-k query processing for semistructured data. VLDB J., 17(1):81–115. [doi:10.1007/s00778-007-0072-z]
Weigang, L., Zheng, J., Liu, G., 2013. W-entropy method to measure the influence of the members from social networks. Int. J. Web Eng. Technol., in press.
Yang, J., Leskovec, J., 2011. Patterns of temporal variation in online media. Proc. 4th ACM Int. Conf. on Web Search and Data Mining, p.177–186. [doi:10.1145/1935826.1935863]
Zhang, Z.K., Liu, C., 2010. A hypergraph model of social tagging networks. J. Stat. Mech., 2010(10):P10005. [doi:10.1088/1742-5468/2010/10/P10005]
Zheng, J., Weigang, L., Uden, L., 2014. Top-X querying in online social networks with MapReduce solution. Proc. 8th Int. Conf. on Knowledge Management in Organizations, p.397–410. [doi:10.1007/978-94-007-7287-8_32]
Zheng, L., Zhou, X., Lin, Z., et al., 2012. Accelerating queries over microblog dataset via grouping and indexing techniques. Proc. 13th Int. Conf. on Web Information Systems Engineering, p.764–770. [doi:10.1007/978-3-642-35063-4_63]
Zhu, F., Liu, J., Xu, L., 2012. A fast and high throughput SQL query system for big data. Proc. 13th Int. Conf. on Web Information Systems Engineering, p.783–788. [doi:10.1007/978-3-642-35063-4_66]
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the Brazilian National Council for Scientific and Technological Development (CNPq) (No. 304058/2010-6)
Rights and permissions
About this article
Cite this article
Weigang, L., Sandes, E.F.O., Zheng, J. et al. Querying dynamic communities in online social networks. J. Zhejiang Univ. - Sci. C 15, 81–90 (2014). https://doi.org/10.1631/jzus.C1300281
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.C1300281