Abstract
Social Media offer golden opportunity for information mining. However, it is a challenge to find useful knowledge from these massive data. This paper describes a solution for a common problem using top-x querying in Online Social Networks (OSNs). Seventy-five GB data were collected from Twitter, reorganized into dataset in a distributed computing platform with Hadoop. By adopting Aggregate-Rank-Delete algorithm, we used MapReduce solution to develop an algorithm, called MapFollowee & ReduceFollower to query the Top-X members who retweet and that have largest number of followers. This proposed approach is to effectively accelerate the querying process, in which not only the performance is faster than the matrix algorithm but also faster than the original algorithm in the stand-alone version. It also reduced the data storage from the original dataset. This result is important because it provides a new parallel paradigm as an application of MapReduce with an efficient way to resolve the practical problem in OSNs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Edans S, Li W, Alba M (2012) Logical model of relationship for online social networks and performance optimization of queries. In: Proceeding of WISE 2012, Springer, Paphos, pp 726–736
Ghemawat S, Dean J (2004) MapReduce: simplified data processing on large clusters. In: symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, pp 137–150
Jie T, Jimeng S, Chi W, Zi Y (2009) Social influence analysis in large-scale networks. In: Proceeding KDD’09, New York, pp 807–816
Goyal A, Bonchi F, Laks VS Lakshmanan (2010) Learning influence through a social networks. In: Proceeding of the third ACM international conference on web search and data mining, New York, pp 241–250
Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and correlation in social networks. In: Proceeding of KDD’08, New York, pp 7–15
Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceeding KDD’09, New York, pp 199–208
Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of AAAI conference on weblogs and social media
Theobald M, Bast H, Majumdar D, Schenkel R, Weikum G (2008) TopX: efficient and versatile top-k query processing for semistructured data. VLDB J 17:81–115. doi:10.1007/s00778-007-0072-z.(2008)
Ghemawat S, Gobioff H, Leung S (2003) The Google file system. ACM SIGOPS Operating Syst Rev 37:29–43
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Lammel R (2008) Google’s MapReduce programming model—revisted. Sci Comput Program 70(1):1–30
Borthakur D (2007) The Hadoop distributed file system: architecture and design. Technique report in the Apache software foundation, Delaware
Yang J, Leskovec J (2011) Patterns of temporal variation in online media. ACM international conference on web search and data mining (WSDM’11), pp 177–186
Kwak H, Lee C, Park H (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international world wide web (www) conference, Raleigh, NC
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Jianya, Z., Weigang, L., Uden, L. (2014). Top-X Querying in Online Social Networks with MapReduce Solution. In: Uden, L., Wang, L., Corchado RodrÃguez, J., Yang, HC., Ting, IH. (eds) The 8th International Conference on Knowledge Management in Organizations. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7287-8_32
Download citation
DOI: https://doi.org/10.1007/978-94-007-7287-8_32
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-7286-1
Online ISBN: 978-94-007-7287-8
eBook Packages: Computer ScienceComputer Science (R0)