Skip to main content

Top-X Querying in Online Social Networks with MapReduce Solution

  • Conference paper
  • First Online:
The 8th International Conference on Knowledge Management in Organizations

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Abstract

Social Media offer golden opportunity for information mining. However, it is a challenge to find useful knowledge from these massive data. This paper describes a solution for a common problem using top-x querying in Online Social Networks (OSNs). Seventy-five GB data were collected from Twitter, reorganized into dataset in a distributed computing platform with Hadoop. By adopting Aggregate-Rank-Delete algorithm, we used MapReduce solution to develop an algorithm, called MapFollowee & ReduceFollower to query the Top-X members who retweet and that have largest number of followers. This proposed approach is to effectively accelerate the querying process, in which not only the performance is faster than the matrix algorithm but also faster than the original algorithm in the stand-alone version. It also reduced the data storage from the original dataset. This result is important because it provides a new parallel paradigm as an application of MapReduce with an efficient way to resolve the practical problem in OSNs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Edans S, Li W, Alba M (2012) Logical model of relationship for online social networks and performance optimization of queries. In: Proceeding of WISE 2012, Springer, Paphos, pp 726–736

    Google Scholar 

  2. Ghemawat S, Dean J (2004) MapReduce: simplified data processing on large clusters. In: symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, pp 137–150

    Google Scholar 

  3. Jie T, Jimeng S, Chi W, Zi Y (2009) Social influence analysis in large-scale networks. In: Proceeding KDD’09, New York, pp 807–816

    Google Scholar 

  4. Goyal A, Bonchi F, Laks VS Lakshmanan (2010) Learning influence through a social networks. In: Proceeding of the third ACM international conference on web search and data mining, New York, pp 241–250

    Google Scholar 

  5. Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and correlation in social networks. In: Proceeding of KDD’08, New York, pp 7–15

    Google Scholar 

  6. Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceeding KDD’09, New York, pp 199–208

    Google Scholar 

  7. Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of AAAI conference on weblogs and social media

    Google Scholar 

  8. Theobald M, Bast H, Majumdar D, Schenkel R, Weikum G (2008) TopX: efficient and versatile top-k query processing for semistructured data. VLDB J 17:81–115. doi:10.1007/s00778-007-0072-z.(2008)

    Article  Google Scholar 

  9. Ghemawat S, Gobioff H, Leung S (2003) The Google file system. ACM SIGOPS Operating Syst Rev 37:29–43

    Article  Google Scholar 

  10. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  11. Lammel R (2008) Google’s MapReduce programming model—revisted. Sci Comput Program 70(1):1–30

    Article  MathSciNet  Google Scholar 

  12. Borthakur D (2007) The Hadoop distributed file system: architecture and design. Technique report in the Apache software foundation, Delaware

    Google Scholar 

  13. Yang J, Leskovec J (2011) Patterns of temporal variation in online media. ACM international conference on web search and data mining (WSDM’11), pp 177–186

    Google Scholar 

  14. Kwak H, Lee C, Park H (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international world wide web (www) conference, Raleigh, NC

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Jianya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Jianya, Z., Weigang, L., Uden, L. (2014). Top-X Querying in Online Social Networks with MapReduce Solution. In: Uden, L., Wang, L., Corchado Rodríguez, J., Yang, HC., Ting, IH. (eds) The 8th International Conference on Knowledge Management in Organizations. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7287-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-7287-8_32

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-7286-1

  • Online ISBN: 978-94-007-7287-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics