ABSTRACT
The emergence and popularization of on-line social networks suddenly made available a large amount of data from social organization, interaction and human behavior. All this information opens new perspectives and challenges to the study of social systems, being of interest to many fields. Although most on-line social networks are recent, a vast amount of scientific papers was already published on this topic, dealing with a broad range of analytical methods and applications. Therefore, the development of a tool capable of gather tailored information from social networks is something that can help a lot of researchers on their work, especially in the area of Natural Language Processing (NLP). Nowadays, the daily base medium where people use more often text language lays precisely on social networks. Therefore, the ubiquitous crawling of social networks is of the utmost importance for researchers. Such a tool will allow the researcher to get the relevant needed information, allowing a faster research in what really matters, without loosing time on the development of his own crawler. In this paper, we present an extensive analysis of the existing social networks and their APIs, and also describe the conception and design of a social network crawler which will help NLP researchers.
- Charu C Aggarwal. 2011. An introduction to social network data analytics. In Social network data analytics. Springer, 1--15.Google Scholar
- Meenu Rakesh Batra. 2014. A review of focused crawler approaches. Int. J 4, 7 (2014).Google Scholar
- Matko Bošnjak, Eduardo Oliveira, José Martins, Eduarda Mendes Rodrigues, and Luís Sarmento. 2012. TwitterEcho: A Distributed Focused Crawler to Support Open Research with Twitter Data. In Proceedings of the 21st International Conference on World Wide Web (WWW '12 Companion). ACM, New York, NY, USA, 1233--1240. https://doi.org/10.1145/2187980.2188266Google ScholarDigital Library
- Soumen Chakrabarti, Martin Van den Berg, and Byron Dom. 1999. Focused crawling: a new approach to topic-specific Web resource discovery. Computer networks 31, 11-16 (1999), 1623--1640.Google Scholar
- SS Dhenakaran and K Thirugnana Sambanthan. 2011. Web crawler-an overview. International Journal of Computer Science and Communication 2, 1 (2011), 265--267.Google Scholar
- Simeon Edosomwan, Sitalaskshmi Kalangot Prakasan, Doriane Kouame, Jonelle Watson, and Tom Seymour. 2011. The history of social media and its impact on business. Journal of Applied Management and entrepreneurship 16, 3 (2011), 79--91.Google Scholar
- F. Erlandsson, R. Nia, M. Boldt, H. Johnson, and S. F. Wu. 2015. Crawling On-line Social Networks. In 2015 Second European Network Intelligence Conference. 9--16. https://doi.org/10.1109/ENIC.2015.10Google Scholar
- Christian Fuchs. 2017. Social media: A critical introduction. Sage.Google Scholar
- M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. 2011. Practical Recommendations on Crawling On-line Social Networks. IEEE Journal on Selected Areas in Communications 29, 9 (October 2011), 1872--1892. https://doi.org/10.1109/JSAC.2011.111011Google ScholarCross Ref
- Heritrix. 2018. Heritrix. https://github.com/internetarchive/heritrix3/Google Scholar
- Jasmine Knight-McCord, Dylan Cleary, Nastassjia Grant, Antoinette Herron, T Lacey, T Livingston, and R Emanuel. 2016. What social media sites do college students use most. Journal of Undergraduate Ethnic Minority Psychology 2, 21 (2016), 21--26.Google Scholar
- David Burth Kurka, Alan Godoy, and Fernando J Von Zuben. 2015. On-line social network analysis: A survey of research applications in computer science. arXiv preprint arXiv:1504.05655 (2015).Google Scholar
- Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and Analysis of On-line Social Networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC '07). ACM, New York, NY, USA, 29--42. https://doi.org/10.1145/1298306.1298311Google ScholarDigital Library
- Apache Nutch. 2014. Apache Nutch. https://nutch.apache.org/Google Scholar
- OpenWebSpider. 2017. OpenWebSpider. http://www.openwebspider.org/Google Scholar
- Scrapy. 2018. Scrapy. https://scrapy.org/Google Scholar
- Clay Shirky. 2008. Here comes everybody: The power of organizing without organizations. Penguin.Google Scholar
- Vladislav Shkapenyuk and Torsten Suel. 2002. Design and implementation of a high-performance distributed web crawler. In Proceedings 18th International Conference on Data Engineering. IEEE, 357--368.Google ScholarCross Ref
- Sônia Cristina Vermelho, Ana Paula Machado Velho, and Valdecir Bertoncello. 2015. Sobre o conceito de redes sociais e seus pesquisadores. Educação e Pesquisa 41, 4 (2015), 863--881.Google ScholarCross Ref
- Webhose. 2018. Webhose. https://webhose.io/Google Scholar
- Chi-In Wong, Kin-Yeung Wong, Kuong-Wai Ng, Wei Fan, and Kai-Hau Yeung. 2014. Design of a crawler for on-line social networks analysis. WSEAS Transactions on Communications 13 (2014), 263--274.Google Scholar
Index Terms
- SocialNetCrawler: Online Social Network Crawler
Recommendations
Investigating Homophily in Online Social Networks
WI-IAT '10: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01Similarity breeds connections, the principle of homophily, has been well studied in existing sociology literature. %Several studies have observed this phenomena by conducting surveys on human subjects. These studies have concluded that new ties are ...
Learning to predict reciprocity and triadic closure in social networks
We study how links are formed in social networks. In particular, we focus on investigating how a reciprocal (two-way) link, the basic relationship in social networks, is developed from a parasocial (one-way) relationship and how the relationships ...
Effects of System Characteristics on Users' Self-Disclosure in Social Networking Sites
ITNG '10: Proceedings of the 2010 Seventh International Conference on Information Technology: New GenerationsSocial networking sites (SNSs) (e.g., Facebook) are increasingly used in people’s daily life and business worlds. With SNSs, people’s social networks are expanded by connecting to others with shared interests or values based on other users’ self-...
Comments