Skip to main content

CUVIM: Extracting Fresh Information from Social Network

  • Conference paper
Web-Age Information Management (WAIM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7923))

Included in the following conference series:

Abstract

Social network preserves the life of users and provides great potential for journalists, sociologists and business analysts. Crawling data from social network is a basic step for social network information analysis and processing. As the network becomes huge and information on the network updates faster than web pages, crawling is more difficult because of the limitations of bandwidth, politeness etiquette and computation power. To extract fresh information from social network efficiently and effectively, this paper presents a novel crawling method of social network. To discover the feature of social network, we gather data from real social network, analyze them and build a model to describe the discipline of users’ behavior. With the modeled behavior, we propose methods to predict users’ behavior. According to the prediction, we schedule our crawler more reasonably and extract more fresh information. Experimental results demonstrate that our strategies could obtain information from SNS efficiently and effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Facebook, http://www.facebook.com/press/info.php?statistics

  2. Stanford Graph Set, http://snap.stanford.edu/data/

  3. Leskovec, J.: Social Media Analytics. SIGKDD, tutorial (2011)

    Google Scholar 

  4. Spinn3r, http://www.icwsm.org/data/

  5. Denev, D., Mazeika, A., Spaniol, M., Weikum, G.: SHARC: Framework for Quality-Conscious Web Archiving. In: VLDB (2009)

    Google Scholar 

  6. Olston, C., Pandey, S.: Recrawl scheduling based on information longevity. In: WWW, pp. 437–446 (2008)

    Google Scholar 

  7. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 1–38 (1977)

    Google Scholar 

  8. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachoura, V., Silvestri, F.: Design trade-offs for search engine caching. ACM Trans. Web 2(4), 1–28 (2008)

    Article  Google Scholar 

  9. Cho, J., Ntoulas, A.: Eective change detection using sampling. In: VLDB, pp. 514–525 (2002)

    Google Scholar 

  10. Casella, G., Berger, R. (eds.): Statistical Inference. Brooks/Cole (2008)

    Google Scholar 

  11. Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through url ordering. In: WWW, pp. 161–172 (1998)

    Google Scholar 

  12. Cho, J., Garcia-Molina, H.: Estimating frequency of change. Trans. Inter. Tech. 3(3), 256–290 (2003)

    Article  Google Scholar 

  13. Castillo, C., Marin, M., Rodriguez, A., Baeza-Yates, R.: Scheduling algorithms for web crawling. In: WebMedia, pp. 10–17 (2004)

    Google Scholar 

  14. Cho, J., Schonfeld, U.: Rankmass crawler: a crawler with highpersonalized pagerank coverage guarantee. In: VLDB, pp. 375–386 (2007)

    Google Scholar 

  15. Wikipedia, http://zh.wikipedia.org/wiki/%E6%96%B0%E6%B5%AA%E5%BE%AE%E5%8D%9A

  16. Byun, C., Lee, H., Kim, Y.: Automated Twitter Data Collecting Tool for Data Mining in Social Network. In: RACS (2012)

    Google Scholar 

  17. Okazaki, T.M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proc. of Conf. on World Wide Web, WWW (2010)

    Google Scholar 

  18. Aramaki, E., Maskawa, S., Morita, M.: Twitter Catches, The Flu: Detecting Influenza Epidemics using Twitter. In: Proceedings of the 2011 Conference on Empirical Methods, in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, pp. 1568–1576. Association for Computational Linguistics (2011)

    Google Scholar 

  19. Bošnjak, M., Oliveira, E., Martins, J., Mendes, E., Sarmento, L.: TwitterEcho - A Distributed Focused Crawler to Support Open Research with Twitter Data. In: WWW 2012 – MSND 2012 Workshop, Lyon, France, April 16-20 (2012)

    Google Scholar 

  20. Noordhuis, P., Heijkoop, M., Lazovik, A.: Mining Twitter in the Cloud. In: IEEE 3rd International Conference on Cloud Computing (2010)

    Google Scholar 

  21. Dziczkowski, G., Bougueroua, L., Wegrzyn-Wolska, K.: Social Network – An tutonoumous system designed for radio recommendation. In: International Conference on Computational Aspects of Social Networks, SASoN (2009)

    Google Scholar 

  22. Chau, D., Pandit, S., Wang, S., Faloutsos, C.: Parallel Crawling for Online Social Networks. In: WWW (2007)

    Google Scholar 

  23. Twitter Rate Limiting, https://dev.twitter.com/docs/rate-limiting

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, R., Wang, H., Li, K., Li, J., Gao, H. (2013). CUVIM: Extracting Fresh Information from Social Network. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38562-9_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38561-2

  • Online ISBN: 978-3-642-38562-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics