tutorial

Information organization and retrieval with collaboratively generated content

Authors:
Eugene Agichtein

Emory University, Atlanta, GA, USA

Emory University, Atlanta, GA, USA
View Profile

,
Evgeniy Gabrilovich

Yahoo! Research, Santa Clara, CA, USA

Yahoo! Research, Santa Clara, CA, USA
View Profile

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalJuly 2011Pages 1307–1308https://doi.org/10.1145/2009916.2010173

Published:24 July 2011Publication History

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Pages 1307–1308

ABSTRACT

Proliferation of ubiquitous access to the Internet enables millions of Web users to collaborate online on a variety of activities. Many of these activities result in the construction of large repositories of knowledge, either as their primary aim (e.g., Wikipedia) or as a by-product (e.g., Yahoo! Answers). In this tutorial, we will discuss organizing and exploiting Collaboratively Generated Content (CGC) for information organization and retrieval. Specifically, we intend to cover two complementary areas of the problem: (1) using such content as a powerful enabling resource for knowledge-enriched, intelligent representations and new information retrieval algorithms, and (2) development of supporting technologies for extracting, filtering, and organizing collaboratively created content.

The unprecedented amounts of information in CGC enable new, knowledge-rich approaches to information access, which are significantly more powerful than the conventional word-based methods. Considerable progress has been made in this direction over the last few years. Examples include explicit manipulation of human-defined concepts and their use to augment the bag of words (cf. Explicit Semantic Analysis), using large-scale taxonomies of topics from Wikipedia or the Open Directory Project to construct additional class-based features, or using Wikipedia for better word sense disambiguation.

However, the quality and comprehensiveness of collaboratively created content vary widely, and in order for this resource to be useful, a significant amount of preprocessing, filtering, and organization is necessary. Consequently, new methods for analyzing CGC and corresponding user interactions are required to effectively harness the resulting knowledge. Thus, not only the content repositories can be used to improve IR methods, but the reverse pollination is also possible, as better information extraction methods can be used for automatically collecting more knowledge, or verifying the contributed content. This natural connection between modeling the generation process of CGC and effectively using the accumulated knowledge suggests covering both areas together in a single tutorial.

The intended audience of the tutorial includes IR researchers and graduate students, who would like to learn about the recent advances and research opportunities in working with collaboratively generated content. The emphasis of the tutorial is on comparing the existing approaches and presenting practical techniques that IR practitioners can use in their research. We also cover open research challenges, as well as survey available resources (software tools and data) for getting started in this research field.

Index Terms

Information organization and retrieval with collaboratively generated content
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Mining, searching and exploiting collaboratively generated content on the web
WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

Proliferation of ubiquitous access to the Internet enables millions of Web users to collaborate online on a variety of activities. Many of these activities result in the construction of large repositories of knowledge, either as their primary aim (e.g., ...
Read More
Exploring the user-generated content (UGC) uploading behavior on youtube
WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web

YouTube is the world's largest video sharing platform where both professional and non-professional users participate in creating, uploading, and viewing content. In this work, we analyze content in the music category created by the non-professionals, ...
Read More
Network Characteristics and the Value of Collaborative User-Generated Content

User-generated content is increasingly created through the collaborative efforts of multiple individuals. In this paper, we argue that the value of collaborative user-generated content is a function both of the direct efforts of its contributors and of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
July 2011
1374 pages
ISBN:9781450307574
DOI:10.1145/2009916
General Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Jian-Yun Nie
University of Montreal, Canada
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Tat-Seng Chua
National University of Singapore
,
W. Bruce Croft
University of Massachusetts, Amherst, USA
Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
collaboratively generated content
Qualifiers
- tutorial
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 189
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Information organization and retrieval with collaboratively generated content

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

ABSTRACT

Cited By

Index Terms

Recommendations

Mining, searching and exploiting collaboratively generated content on the web

Exploring the user-generated content (UGC) uploading behavior on youtube

Network Characteristics and the Value of Collaborative User-Generated Content