Authors:
Martin Leginus
1
;
Leon Derczynski
2
and
Peter Dolog
1
Affiliations:
1
Aalborg University, Denmark
;
2
University of Sheffield, United Kingdom
Keyword(s):
Word Clouds, Recognized Named Entities, User Evaluation, Social Streams Access.
Related
Ontology
Subjects/Areas/Topics:
Multimedia and User Interfaces
;
Searching and Browsing
;
Social Media Analytics
;
Social Networks and Organizational Culture
;
Society, e-Business and e-Government
;
Web Information Systems and Technologies
;
Web Interfaces and Applications
Abstract:
Intuitive and effective access to large volumes of information is increasingly important. As social media
explodes as a useful source of information, so are methods required to access these large volumes of user-generated
content. Word clouds are an effective information access tool. However, those generated over
social media data often depict redundant and mis-ranked entries. This limits the users’ ability to browse and
explore datasets. This paper proposes a method for improving word cloud generation over social streams.
Named entity expressions in tweets are detected, disambiguated and aggregated into entity clusters. A word
cloud is generated from terms that represent the most relevant entity clusters. We find that word clouds with
grouped named entities attain significantly broader coverage and significantly decreased content duplication.
Further, access to relevant entries in the collection is improved. An extrinsic crowdsourced user evaluation
of generated word clouds
was performed. Word clouds with grouped named entities are rated as significantly
more relevant and more diverse with respect to the baseline. In addition, we found that word clouds with
higher levels of Mean Average Precision (MAP) are more likely to be rated by users as being relevant to the
concepts reflected. Critically, this supports MAP as a tool for predicting word cloud quality without requiring
a human in the loop.
(More)