ABSTRACT
The growth of popularity of Web 2.0 applications greatly increased the amount of social media content available on the Internet. However, the unsupervised, user-oriented nature of this source of information, and thus, its potential lack of quality, have posed a challenge to information retrieval (IR) services. Previous work focuses mostly only on tags, although a consensus about its effectiveness as supporting information for IR services has not yet been reached. Moreover, other textual features of the Web 2.0 are generally overseen by previous research.
In this context, this work aims at assessing the relative quality of distinct textual features available on the Web 2.0. Towards this goal, we analyzed four features (title, tags, description and comments) in four popular applications (CiteULike, Last.FM, Yahoo! Video, and Youtube). Firstly, we characterized data from these applications in order to extract evidence of quality of each feature with respect to usage, amount of content, descriptive and discriminative power as well as of content diversity across features. Afterwards, a series of classification experiments were conducted as a case study for quality evaluation. Characterization and classification results indicate that: 1) when considered separately, tags is the most promising feature, achieving the best classification results, although its absence in a non-negligible fraction of objects may affect its potential use; and 2) each feature may bring different pieces of information, and combining their contents can improve classification.
- Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871--1874, 2008. Google ScholarDigital Library
- E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding High-Quality Content in Social Media. In Proc. WSDM, 2008. Google ScholarDigital Library
- K. Bischoff, F. Claudiu-S, N. Wolfgang, and P. Raluca. Can All Tags Be Used for Search? In Proc. CIKM, 2008. Google ScholarDigital Library
- S. Boll. MultiTube - Where Web 2.0 and Multimedia Could Meet. IEEE Multimedia, 14(1), 2007. Google ScholarDigital Library
- L. Chen, P. Wright, and W. Nejdl. Improving music genre classification using collaborative tagging data. In Proc. WSDM, 2009. Google ScholarDigital Library
- D. Fernandes, E. de Moura, B. Ribeiro-Neto, A. da Silva, and M. Gonçalves. Computing Block Importance for Searching on Web Sites. In Proc. CIKM, 2007. Google ScholarDigital Library
- S. Golder and B. Huberman. Usage Patterns of Collaborative Tagging Systems. Journal of Information Science, 32(2), 2006. Google ScholarDigital Library
- L. A. Goodman. Snowball Sampling. Annals of Math. Statistics, 32(1), 1961.Google Scholar
- T. Haveliwala, A. Gionis, D. Klein, and P. Indyk. Evaluating strategies for similarity search on the web. In Proc. WWW, 2002. Google ScholarDigital Library
- M. L. E. Hu, A. Sun, H. Lauw, and B. Vuong. Measuring article quality in wikipedia: models and evaluation. In Proc. CIKM, 2007. Google ScholarDigital Library
- T. Joachims, C. Nedellec, and C. Rouveirol. Text categorization with support vector machines: learning with many relevant. In Europ. Conf. on Machine Learning. Springer, 1998. Google ScholarDigital Library
- X. Li, L. Guo, and Y. Zhao. Tag-based Social Interest Discovery. In Proc. WWW, 2008. Google ScholarDigital Library
- C. Marlow, M. Naaman, D. Boyd, and M. Davis. Position Paper, Tagging, Taxonomy, Flickr, Article, To read. In Collaborative Web Tagging Workshop (WWW'06), 2006.Google Scholar
- C. Marshall. No Bull, No Spin: A comparison of tags with other forms of user metadata. In Proc. JCDL, 2009. Google ScholarDigital Library
- G. Mishne. Using blog properties to improve retrieval. Proc. of ICWSM, 2007.Google Scholar
- D. Ramage, P. Heymann, C. Manning, and H. Garcia-Molina. Clustering the tagged web. In Proc. WSDM, 2009. Google ScholarDigital Library
- M. Rege, M. Dong, and J. Hua. Graph Theoretical Framework for Simultaneously Integrating Visual and Textual Features for Efficient Web Image Clustering. In Proc. WWW, 2008. Google ScholarDigital Library
- R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, J. Parreira, and G. Weikum. Efficient Top-k Querying Over Social-Tagging Networks. In Proc. SIGIR, 2008. Google ScholarDigital Library
- B. Sigurbjornsson and R. van Zwol. Flickr Tag Recommendation Based on Collective Knowledge. In Proc. WWW, 2008. Google ScholarDigital Library
- F. Suchanek, M. Vojnovic, and D. Gunawardena. Social Tags: Meanings and Suggestions. In Proc. CIKM, 2008. Google ScholarDigital Library
Index Terms
- Evidence of quality of textual features on the web 2.0
Recommendations
Assessing the quality of textual features in social media
Social media is increasingly becoming a significant fraction of the content retrieved daily by Web users. However, the potential lack of quality of user generated content poses a challenge to information retrieval services, which rely mostly on textual ...
Tag recommendation by machine learning with textual and social features
Tags are very popular in social media (like Youtube, Flickr) and provide valuable and crucial information for social media. But at the same time, there exist a great number of noisy tags, which lead to many studies on tag suggestion and recommendation ...
Characterizing use and quality of textual attributes in Web 2.0 applications
WebMedia '09: Proceedings of the XV Brazilian Symposium on Multimedia and the WebDespite the large amount of multimedia content in Web 2.0 applications, most of its services in Information Retrieval (IR) use only attributes associated with textual content (eg, labels or tags). However, because they are typically generated by users, ...
Comments