It is our great pleasure to welcome you to the 2nd International Workshop on Search and Mining User-generated Contents -- SMUC 2010. SMUC 2010 aims to become a forum for researchers from several Information and Knowledge Management areas like data/text mining, information retrieval, semantics, etc. that apply their work into the fields of Social Media and Opinion/Sentiment Analysis where the main goal is to process user generated contents.
User generated content provides an excellent scenario to apply the metaphor of mining any kind of information. In a social media context, users create a huge amount of data where we can look for valuable nuggets of knowledge by applying several search techniques (information retrieval) or mining techniques (data mining, text mining, web mining, opinion mining, etc.). In this kind of data we can find both structured information (ratings, tags, links, etc.) and unstructured information (text, audio, video, etc.), and we must learn to combine existing techniques in order to take advantage of this heterogeneity while extracting useful knowledge.
The call for papers attracted 25 submissions from Asia, North America, Europe and Africa. The program committee accepted 8 full papers and 7 posters that cover a variety of topics, including data and text mining, opinion mining, search, spam filtering and tagging. The accepted papers use several information sources for experimentation like Twitter data, Product Reviews, Webpages, Wikipedia, Programable Web, Delicious and Blogs. In addition, the program includes a panel on Industrial Applications of Search and Mining User-generated Contents technologies, and a keynote speech by Bing Liu. We hope that these proceedings will serve as a valuable reference for researchers and developers.
Proceeding Downloads
Exploiting tag and word correlations for improved webpage clustering
Automatic clustering of webpages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, webpage clustering algorithms only use features extracted ...
A knowledge-rich approach to feature-based opinion extraction from product reviews
Feature-based opinion extraction is a task related to information extraction, which consists of extracting structured opinions on features of some object from reviews or other subjective textual sources. Over the last years, this problem has been ...
Entity-relationship queries over wikipedia
Wikipedia is the largest user-generated knowledge base. We propose a structured query mechanism, entity-relationship query, for searching entities in Wikipedia corpus by their properties and inter-relationships. An entity-relationship query consists of ...
A formal study of classification techniques on entity discovery and their application to opinion mining
Entity discovery has become an important topic of study in recent years due to its wide range of applications. In this paper, we focus on examining the effectiveness of various classification techniques on entity discovery and their application to the ...
Classifying latent user attributes in twitter
Social media outlets such as Twitter have become an important forum for peer interaction. Thus the ability to classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language or similar ...
Spam detection with a content-based random-walk algorithm
In this work we tackle the problem of the spam detection on the Web. Spam web pages have become a problem for Web search engines, due to the negative effects that this phenomenon can cause in their retrieval results. Our approach is based on a random-...
Exploiting web reviews for generating customer service surveys
Traditional customer satisfaction analysis relies on the work of designing, distributing, collecting and analyzing surveys. Surveys that are designed by humans may be subjective, and it is hard to know what service aspects are the most important for ...
Characterization of the twitter @replies network: are user ties social or topical?
In recent years, social media services have become a global phenomenon on the Internet. The popularity of these services provides an opportunity to study the characteristics of online social networks and the communities that emerge in them. This paper ...
Mining social tags to predict mashup patterns
In the past few years, tagging has gained large momentum as a user-driven approach for categorizing and indexing content on the Web. Mashups have recently joined the list of Web resources targeted for social tagging. In the context of the social Web, a ...
A weighted tag similarity measure based on a collaborative weight model
The problem of measuring semantic relatedness between social tags remains largely open. Given the structure of social bookmarking systems, similarity measures need to be addressed from a social bookmarking systems perspective. We address the fundamental ...
How to interpret the helpfulness of online product reviews: bridging the needs between customers and designers
Helpful reviews are the valuable voice of the customer which benefit both consumers and product designers. On e-commerce websites, consumers are usually encouraged to rate whether a review is helpful or not. As consumers are not obligated to vote ...
On the difficulty of clustering company tweets
Twitter is a new successful technology of the Web 2.0 genre which is used by millions of people and companies to publish brief messages ("tweets") with the purpose of sharing experiences and/or opinions about a product or service. Due to the huge amount ...
Web-based statistical fact checking of textual documents
User generated content has been growing tremendously in recent years. This content reflects the interests and the diversity of online users. In turn, the diversity among internet users is also reflected in the quality of the content being published ...
Cross-media impact on twitter in japan
Twitter, a microblogging service, is now grabbing attention of people as a new channel. For deep understanding of this new service, this paper reports the characteristics of Twitter users in Japan, and the impact of media such as publications, and TV ...
Extracting emotion topics from blog sentences: use of voting from multi-engine supervised classifiers
This paper presents a supervised multi-engine classifier approach followed by voting to identify emotion topic(s) from English blog sentences. Manual annotation of the English blog sentences in the training set has shown a satisfactory agreement with ...
- Proceedings of the 2nd international workshop on Search and mining user-generated contents
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
SMUC '10 | 25 | 15 | 60% |
Overall | 25 | 15 | 60% |