skip to main content
10.1145/2740908.2745397acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

The World Conversation: Web Page Metadata Generation From Social Sources

Published: 18 May 2015 Publication History

Abstract

Over the past couple of years, social networks such as Twitter and Facebook have become the primary source for consuming information on the Internet. One of the main differentiators of this content from traditional information sources available on the Web is the fact that these social networks surface individuals' perspectives. When social media users post and share updates with friends and followers, some of those short fragments of text contain a link and a personal comment about the web page, image or video. We are interested in mining the text around those links for a better understanding of what people are saying about the object they are referring to. Capturing the salient keywords from the crowd is rich metadata that we can use to augment a web page. This metadata can be used for many applications like ranking signals, query augmentation, indexing, and for organizing and categorizing content. In this paper, we present a technique called social signatures that given a link to a web page, pulls the most important keywords from the social chatter around it. That is, a high level representation of the web page from a social media perspective. Our findings indicate that the content of social signatures differs compared to those from a web page and therefore provides new insights. This difference is more prominent as the number of link shares increase. To showcase our work, we present the results of processing a dataset that contains around 1 Billion unique URLs shared in Twitter and Facebook over a two month period. We also provide data points that shed some light on the dynamics of content sharing in social media.

References

[1]
Omar Alonso and Kartikay Khandelwal. Kondenzer: Exploration and visualization of archived social media. In Proceedings of ICDE, 2014.
[2]
Einat Amitay, Adam Darlow, David Konopnicki, and Uri Weiss. Queries as anchors: selection by association. In Proceedings of Hypertext, pages 193--201, 2005.
[3]
Peter Anick. Exploiting anchor text as a lexical resource. In LREC, 2004.
[4]
Oisı Boydell and Barry Smyth. Social summarization in collaborative web search. Information processing & management, 46(6):782--798, 2010.
[5]
Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. Scope: Easy and efficient parallel processing of massive data sets. PVLDB, 1(2):1265--1276, August 2008.
[6]
Nadav Eiron and Kevin McCurley. Analysis of anchor text for web search. In Proceedings of SIGIR, pages 459--460, 2003.
[7]
Paolo Ferragina and Ugo Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of CIKM, pages 1625--1628, 2010.
[8]
Atsushi Fujii. Modeling anchor text and classifying queries to enhance web document retrieval. In Proceedings of WWW, pages 337--346, 2008.
[9]
Michael Gamon, Tao Yano, Xinying Song, Johnson Apacible, and Patrick Pantel. Understanding document aboutness step one: Identifying salient entities. MSR-TR-2013--73, 2013.
[10]
Carolin Gerlitz and Anne Helmond. The like economy: Social buttons and the data-intensive web. New Media Society, 15:1348--1365, 2013.
[11]
Chia-Jung Lee and Bruce Croft. Incorporating social anchors for ad hoc retrieval. In Proceedings of OAIR, pages 181--188, 2013.
[12]
Donald Metzler, Jasmine Novak, Hang Cui, and Srihari Reddy. Building enriched document representations using aggregated anchor text. In Proceedings of SIGIR, pages 219--226, 2009.
[13]
Gilad Mishne and Jimmy Lin. Twanchor text: a preliminary study of the value of tweets as anchor text. In Proceedings of SIGIR, pages 1159--1160, 2012.
[14]
Aditi Muralidharan, Zoltan Gyongyi, and Ed Chi. Social annotations in web search. In Proceedings of SIGCHI, pages 1085--1094, 2012.
[15]
Michael Noll and Christoph Meinel. The metadata triumvirate: Social annotations, anchor texts and search queries. In Proceedings of Web Intelligence, volume 1, pages 640--647, 2008.
[16]
Patrick Pantel, Michael Gamon, Omar Alonso, and Kevin Haas. Social annotations: Utility and prediction modeling. In Proceedings of SIGIR, pages 285--294, 2012.
[17]
Seung-Taek Park, David Pennock, C Lee Giles, and Robert Krovetz. Analysis of lexical signatures for finding lost or related documents. In Proceedings of SIGIR, pages 11--18, 2002.
[18]
Stéphane Raux, Nils Grünwald, and Christophe Prieur. Describing the web in less than 140 characters. In Proceedings of ICWSM, 2011.
[19]
Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proceedings of SIGCHI, pages 319--326, 2004.
[20]
Mingfang Wu, David Hawking, Andrew Turpin, and Falk Scholer. Using anchor text for homepage and topic distillation search tasks. Journal of the American Society for Information Science and Technology, 63(6):1235--1255, 2012.
[21]
Bo Zhou, Yiqun Liu, Min Zhang, Yijiang Jin, and Shaoping Ma. Incorporating web browsing activities into anchor texts for web search. Information Retrieval, 14(3):290--314, 2011.

Cited By

View all
  • (2024)Product Query Recommendation for Enriching Suggested Q&AsProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638314(352-357)Online publication date: 10-Mar-2024
  • (2019)A Lightweight Representation of News Events on Social MediaProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331300(1049-1052)Online publication date: 18-Jul-2019
  • (2018)How it HappenedProceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries10.1145/3197026.3197034(193-202)Online publication date: 23-May-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
May 2015
1602 pages
ISBN:9781450334730
DOI:10.1145/2740908

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Facebook
  2. annotation
  3. metadata
  4. social media
  5. twitter
  6. web page augmentation

Qualifiers

  • Research-article
  • Refereed limited

Conference

WWW '15
Sponsor:
  • IW3C2

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Product Query Recommendation for Enriching Suggested Q&AsProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638314(352-357)Online publication date: 10-Mar-2024
  • (2019)A Lightweight Representation of News Events on Social MediaProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331300(1049-1052)Online publication date: 18-Jul-2019
  • (2018)How it HappenedProceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries10.1145/3197026.3197034(193-202)Online publication date: 23-May-2018
  • (2018)Social SearchSocial Information Access10.1007/978-3-319-90092-6_7(213-276)Online publication date: 3-May-2018
  • (2017)Automatic Generation of Event Timelines from Social DataProceedings of the 2017 ACM on Web Science Conference10.1145/3091478.3091519(207-211)Online publication date: 25-Jun-2017
  • (2017)What's Happening and What HappenedProceedings of the 2017 ACM on Web Science Conference10.1145/3091478.3091484(191-200)Online publication date: 25-Jun-2017
  • (2017)Gaining historical and international relations insights from social media: spatio-temporal real-world news analysis using TwitterEPJ Data Science10.1140/epjds/s13688-017-0122-86:1Online publication date: 6-Oct-2017
  • (2016)Proposal of a New Social Signal for Excluding Common Web Pages in Multiple Social Networking ServicesComputational Social Networks10.1007/978-3-319-42345-6_21(239-248)Online publication date: 12-Jul-2016
  • (2015)Short video metadata acquisition gameProceedings of the 2015 10th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP)10.1109/SMAP.2015.7370092(1-5)Online publication date: 5-Nov-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media