CUVIM: Extracting Fresh Information from Social Network

Guo, Rui; Wang, Hongzhi; Li, Kaiyu; Li, Jianzhong; Gao, Hong

doi:10.1007/978-3-642-38562-9_36

Rui Guo²¹,
Hongzhi Wang²¹,
Kaiyu Li²¹,
Jianzhong Li²¹ &
…
Hong Gao²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7923))

Included in the following conference series:

International Conference on Web-Age Information Management

3488 Accesses
2 Citations

Abstract

Social network preserves the life of users and provides great potential for journalists, sociologists and business analysts. Crawling data from social network is a basic step for social network information analysis and processing. As the network becomes huge and information on the network updates faster than web pages, crawling is more difficult because of the limitations of bandwidth, politeness etiquette and computation power. To extract fresh information from social network efficiently and effectively, this paper presents a novel crawling method of social network. To discover the feature of social network, we gather data from real social network, analyze them and build a model to describe the discipline of users’ behavior. With the modeled behavior, we propose methods to predict users’ behavior. According to the prediction, we schedule our crawler more reasonably and extract more fresh information. Experimental results demonstrate that our strategies could obtain information from SNS efficiently and effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Facebook, http://www.facebook.com/press/info.php?statistics
Stanford Graph Set, http://snap.stanford.edu/data/
Leskovec, J.: Social Media Analytics. SIGKDD, tutorial (2011)
Google Scholar
Spinn3r, http://www.icwsm.org/data/
Denev, D., Mazeika, A., Spaniol, M., Weikum, G.: SHARC: Framework for Quality-Conscious Web Archiving. In: VLDB (2009)
Google Scholar
Olston, C., Pandey, S.: Recrawl scheduling based on information longevity. In: WWW, pp. 437–446 (2008)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 1–38 (1977)
Google Scholar
Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachoura, V., Silvestri, F.: Design trade-offs for search engine caching. ACM Trans. Web 2(4), 1–28 (2008)
Article Google Scholar
Cho, J., Ntoulas, A.: Eective change detection using sampling. In: VLDB, pp. 514–525 (2002)
Google Scholar
Casella, G., Berger, R. (eds.): Statistical Inference. Brooks/Cole (2008)
Google Scholar
Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through url ordering. In: WWW, pp. 161–172 (1998)
Google Scholar
Cho, J., Garcia-Molina, H.: Estimating frequency of change. Trans. Inter. Tech. 3(3), 256–290 (2003)
Article Google Scholar
Castillo, C., Marin, M., Rodriguez, A., Baeza-Yates, R.: Scheduling algorithms for web crawling. In: WebMedia, pp. 10–17 (2004)
Google Scholar
Cho, J., Schonfeld, U.: Rankmass crawler: a crawler with highpersonalized pagerank coverage guarantee. In: VLDB, pp. 375–386 (2007)
Google Scholar
Wikipedia, http://zh.wikipedia.org/wiki/%E6%96%B0%E6%B5%AA%E5%BE%AE%E5%8D%9A
Byun, C., Lee, H., Kim, Y.: Automated Twitter Data Collecting Tool for Data Mining in Social Network. In: RACS (2012)
Google Scholar
Okazaki, T.M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proc. of Conf. on World Wide Web, WWW (2010)
Google Scholar
Aramaki, E., Maskawa, S., Morita, M.: Twitter Catches, The Flu: Detecting Influenza Epidemics using Twitter. In: Proceedings of the 2011 Conference on Empirical Methods, in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, pp. 1568–1576. Association for Computational Linguistics (2011)
Google Scholar
Bošnjak, M., Oliveira, E., Martins, J., Mendes, E., Sarmento, L.: TwitterEcho - A Distributed Focused Crawler to Support Open Research with Twitter Data. In: WWW 2012 – MSND 2012 Workshop, Lyon, France, April 16-20 (2012)
Google Scholar
Noordhuis, P., Heijkoop, M., Lazovik, A.: Mining Twitter in the Cloud. In: IEEE 3rd International Conference on Cloud Computing (2010)
Google Scholar
Dziczkowski, G., Bougueroua, L., Wegrzyn-Wolska, K.: Social Network – An tutonoumous system designed for radio recommendation. In: International Conference on Computational Aspects of Social Networks, SASoN (2009)
Google Scholar
Chau, D., Pandit, S., Wang, S., Faloutsos, C.: Parallel Crawling for Online Social Networks. In: WWW (2007)
Google Scholar
Twitter Rate Limiting, https://dev.twitter.com/docs/rate-limiting

Download references

Author information

Authors and Affiliations

Harbin Institute of Technology, China
Rui Guo, Hongzhi Wang, Kaiyu Li, Jianzhong Li & Hong Gao

Authors

Rui Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kaiyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Jianyong Wang
Management Science and Information Systems Department, Rutgers, the State University of New Jersey, 1, Washington Park, 07102, Newark, NJ, USA
Hui Xiong
Department of Information Engineering, Nagoya University, 464-8601, Nagoya, Japan
Yoshiharu Ishikawa
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Jianliang Xu
School of Information Science and Engineering, Yanshan University, Qinhuangdao, China
Junfeng Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, R., Wang, H., Li, K., Li, J., Gao, H. (2013). CUVIM: Extracting Fresh Information from Social Network. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-38562-9_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics