skip to main content
10.1145/3297662.3365805acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmedesConference Proceedingsconference-collections
research-article

SocialNetCrawler: Online Social Network Crawler

Published:10 January 2020Publication History

ABSTRACT

The emergence and popularization of on-line social networks suddenly made available a large amount of data from social organization, interaction and human behavior. All this information opens new perspectives and challenges to the study of social systems, being of interest to many fields. Although most on-line social networks are recent, a vast amount of scientific papers was already published on this topic, dealing with a broad range of analytical methods and applications. Therefore, the development of a tool capable of gather tailored information from social networks is something that can help a lot of researchers on their work, especially in the area of Natural Language Processing (NLP). Nowadays, the daily base medium where people use more often text language lays precisely on social networks. Therefore, the ubiquitous crawling of social networks is of the utmost importance for researchers. Such a tool will allow the researcher to get the relevant needed information, allowing a faster research in what really matters, without loosing time on the development of his own crawler. In this paper, we present an extensive analysis of the existing social networks and their APIs, and also describe the conception and design of a social network crawler which will help NLP researchers.

References

  1. Charu C Aggarwal. 2011. An introduction to social network data analytics. In Social network data analytics. Springer, 1--15.Google ScholarGoogle Scholar
  2. Meenu Rakesh Batra. 2014. A review of focused crawler approaches. Int. J 4, 7 (2014).Google ScholarGoogle Scholar
  3. Matko Bošnjak, Eduardo Oliveira, José Martins, Eduarda Mendes Rodrigues, and Luís Sarmento. 2012. TwitterEcho: A Distributed Focused Crawler to Support Open Research with Twitter Data. In Proceedings of the 21st International Conference on World Wide Web (WWW '12 Companion). ACM, New York, NY, USA, 1233--1240. https://doi.org/10.1145/2187980.2188266Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Soumen Chakrabarti, Martin Van den Berg, and Byron Dom. 1999. Focused crawling: a new approach to topic-specific Web resource discovery. Computer networks 31, 11-16 (1999), 1623--1640.Google ScholarGoogle Scholar
  5. SS Dhenakaran and K Thirugnana Sambanthan. 2011. Web crawler-an overview. International Journal of Computer Science and Communication 2, 1 (2011), 265--267.Google ScholarGoogle Scholar
  6. Simeon Edosomwan, Sitalaskshmi Kalangot Prakasan, Doriane Kouame, Jonelle Watson, and Tom Seymour. 2011. The history of social media and its impact on business. Journal of Applied Management and entrepreneurship 16, 3 (2011), 79--91.Google ScholarGoogle Scholar
  7. F. Erlandsson, R. Nia, M. Boldt, H. Johnson, and S. F. Wu. 2015. Crawling On-line Social Networks. In 2015 Second European Network Intelligence Conference. 9--16. https://doi.org/10.1109/ENIC.2015.10Google ScholarGoogle Scholar
  8. Christian Fuchs. 2017. Social media: A critical introduction. Sage.Google ScholarGoogle Scholar
  9. M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. 2011. Practical Recommendations on Crawling On-line Social Networks. IEEE Journal on Selected Areas in Communications 29, 9 (October 2011), 1872--1892. https://doi.org/10.1109/JSAC.2011.111011Google ScholarGoogle ScholarCross RefCross Ref
  10. Heritrix. 2018. Heritrix. https://github.com/internetarchive/heritrix3/Google ScholarGoogle Scholar
  11. Jasmine Knight-McCord, Dylan Cleary, Nastassjia Grant, Antoinette Herron, T Lacey, T Livingston, and R Emanuel. 2016. What social media sites do college students use most. Journal of Undergraduate Ethnic Minority Psychology 2, 21 (2016), 21--26.Google ScholarGoogle Scholar
  12. David Burth Kurka, Alan Godoy, and Fernando J Von Zuben. 2015. On-line social network analysis: A survey of research applications in computer science. arXiv preprint arXiv:1504.05655 (2015).Google ScholarGoogle Scholar
  13. Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and Analysis of On-line Social Networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC '07). ACM, New York, NY, USA, 29--42. https://doi.org/10.1145/1298306.1298311Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Apache Nutch. 2014. Apache Nutch. https://nutch.apache.org/Google ScholarGoogle Scholar
  15. OpenWebSpider. 2017. OpenWebSpider. http://www.openwebspider.org/Google ScholarGoogle Scholar
  16. Scrapy. 2018. Scrapy. https://scrapy.org/Google ScholarGoogle Scholar
  17. Clay Shirky. 2008. Here comes everybody: The power of organizing without organizations. Penguin.Google ScholarGoogle Scholar
  18. Vladislav Shkapenyuk and Torsten Suel. 2002. Design and implementation of a high-performance distributed web crawler. In Proceedings 18th International Conference on Data Engineering. IEEE, 357--368.Google ScholarGoogle ScholarCross RefCross Ref
  19. Sônia Cristina Vermelho, Ana Paula Machado Velho, and Valdecir Bertoncello. 2015. Sobre o conceito de redes sociais e seus pesquisadores. Educação e Pesquisa 41, 4 (2015), 863--881.Google ScholarGoogle ScholarCross RefCross Ref
  20. Webhose. 2018. Webhose. https://webhose.io/Google ScholarGoogle Scholar
  21. Chi-In Wong, Kin-Yeung Wong, Kuong-Wai Ng, Wei Fan, and Kai-Hau Yeung. 2014. Design of a crawler for on-line social networks analysis. WSEAS Transactions on Communications 13 (2014), 263--274.Google ScholarGoogle Scholar

Index Terms

  1. SocialNetCrawler: Online Social Network Crawler

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        MEDES '19: Proceedings of the 11th International Conference on Management of Digital EcoSystems
        November 2019
        350 pages
        ISBN:9781450362382
        DOI:10.1145/3297662

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 January 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        MEDES '19 Paper Acceptance Rate41of102submissions,40%Overall Acceptance Rate267of682submissions,39%
      • Article Metrics

        • Downloads (Last 12 months)5
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader