skip to main content
10.1145/2740908.2745397acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

The World Conversation: Web Page Metadata Generation From Social Sources

Published:18 May 2015Publication History

ABSTRACT

Over the past couple of years, social networks such as Twitter and Facebook have become the primary source for consuming information on the Internet. One of the main differentiators of this content from traditional information sources available on the Web is the fact that these social networks surface individuals' perspectives. When social media users post and share updates with friends and followers, some of those short fragments of text contain a link and a personal comment about the web page, image or video. We are interested in mining the text around those links for a better understanding of what people are saying about the object they are referring to. Capturing the salient keywords from the crowd is rich metadata that we can use to augment a web page. This metadata can be used for many applications like ranking signals, query augmentation, indexing, and for organizing and categorizing content. In this paper, we present a technique called social signatures that given a link to a web page, pulls the most important keywords from the social chatter around it. That is, a high level representation of the web page from a social media perspective. Our findings indicate that the content of social signatures differs compared to those from a web page and therefore provides new insights. This difference is more prominent as the number of link shares increase. To showcase our work, we present the results of processing a dataset that contains around 1 Billion unique URLs shared in Twitter and Facebook over a two month period. We also provide data points that shed some light on the dynamics of content sharing in social media.

References

  1. Omar Alonso and Kartikay Khandelwal. Kondenzer: Exploration and visualization of archived social media. In Proceedings of ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  2. Einat Amitay, Adam Darlow, David Konopnicki, and Uri Weiss. Queries as anchors: selection by association. In Proceedings of Hypertext, pages 193--201, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peter Anick. Exploiting anchor text as a lexical resource. In LREC, 2004.Google ScholarGoogle Scholar
  4. Oisı Boydell and Barry Smyth. Social summarization in collaborative web search. Information processing & management, 46(6):782--798, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. Scope: Easy and efficient parallel processing of massive data sets. PVLDB, 1(2):1265--1276, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nadav Eiron and Kevin McCurley. Analysis of anchor text for web search. In Proceedings of SIGIR, pages 459--460, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Paolo Ferragina and Ugo Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of CIKM, pages 1625--1628, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Atsushi Fujii. Modeling anchor text and classifying queries to enhance web document retrieval. In Proceedings of WWW, pages 337--346, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael Gamon, Tao Yano, Xinying Song, Johnson Apacible, and Patrick Pantel. Understanding document aboutness step one: Identifying salient entities. MSR-TR-2013--73, 2013.Google ScholarGoogle Scholar
  10. Carolin Gerlitz and Anne Helmond. The like economy: Social buttons and the data-intensive web. New Media Society, 15:1348--1365, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  11. Chia-Jung Lee and Bruce Croft. Incorporating social anchors for ad hoc retrieval. In Proceedings of OAIR, pages 181--188, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Donald Metzler, Jasmine Novak, Hang Cui, and Srihari Reddy. Building enriched document representations using aggregated anchor text. In Proceedings of SIGIR, pages 219--226, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gilad Mishne and Jimmy Lin. Twanchor text: a preliminary study of the value of tweets as anchor text. In Proceedings of SIGIR, pages 1159--1160, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Aditi Muralidharan, Zoltan Gyongyi, and Ed Chi. Social annotations in web search. In Proceedings of SIGCHI, pages 1085--1094, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael Noll and Christoph Meinel. The metadata triumvirate: Social annotations, anchor texts and search queries. In Proceedings of Web Intelligence, volume 1, pages 640--647, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Patrick Pantel, Michael Gamon, Omar Alonso, and Kevin Haas. Social annotations: Utility and prediction modeling. In Proceedings of SIGIR, pages 285--294, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Seung-Taek Park, David Pennock, C Lee Giles, and Robert Krovetz. Analysis of lexical signatures for finding lost or related documents. In Proceedings of SIGIR, pages 11--18, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Stéphane Raux, Nils Grünwald, and Christophe Prieur. Describing the web in less than 140 characters. In Proceedings of ICWSM, 2011.Google ScholarGoogle Scholar
  19. Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proceedings of SIGCHI, pages 319--326, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mingfang Wu, David Hawking, Andrew Turpin, and Falk Scholer. Using anchor text for homepage and topic distillation search tasks. Journal of the American Society for Information Science and Technology, 63(6):1235--1255, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Bo Zhou, Yiqun Liu, Min Zhang, Yijiang Jin, and Shaoping Ma. Incorporating web browsing activities into anchor texts for web search. Information Retrieval, 14(3):290--314, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The World Conversation: Web Page Metadata Generation From Social Sources

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
        May 2015
        1602 pages
        ISBN:9781450334730
        DOI:10.1145/2740908

        Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 May 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader