skip to main content
research-article

Analyzing and Mining Comments and Comment Ratings on the Social Web

Published: 08 July 2014 Publication History

Abstract

An analysis of the social video sharing platform YouTube and the news aggregator Yahoo! News reveals the presence of vast amounts of community feedback through comments for published videos and news stories, as well as through metaratings for these comments. This article presents an in-depth study of commenting and comment rating behavior on a sample of more than 10 million user comments on YouTube and Yahoo! News. In this study, comment ratings are considered first-class citizens. Their dependencies with textual content, thread structure of comments, and associated content (e.g., videos and their metadata) are analyzed to obtain a comprehensive understanding of the community commenting behavior. Furthermore, this article explores the applicability of machine learning and data mining to detect acceptance of comments by the community, comments likely to trigger discussions, controversial and polarizing content, and users exhibiting offensive commenting behavior. Results from this study have potential application in guiding the design of community-oriented online discussion platforms.

References

[1]
E. Agichtein., C. Castillo, D. Donato, A. Gionis, and G. Mishne. 2008. Finding high-quality content in social media. In Proceedings of the International Conference on Web Search and Data Mining (WSDM'08). ACM Press, New York, 183--194.
[2]
M. Alonzo and M. Aiken. 2004. Flaming in electronic communication. Decis. Support Syst. 36, 3, 205--213.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022.
[4]
M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. 2007. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). ACM Press, New York, 1--14.
[5]
C. Chang and C. Lin. 2011. Libsvm: A library for support vector machines. ACM Trans. Intel. Syst. Technol. 2, 3, 27:1--27:27.
[6]
S. Chelaru, C. Orellana-Rodriguez, and I. S. Altingovde. 2012. Can social features help learning to rank youtube videos? In Proceedings of the 13th International Conference on Web Information Systems Engineering (WISE'12). 552--566.
[7]
X. Cheng, C. Dale, and J. Liu. 2007. Understanding the characteristics of Internet short video sharing: YouTube as a case study. Tech. rep. arXiv:0707.3670v1 cs.NI. arXiv e-prints, Cornell University, NY.
[8]
O. Dalal, S. H. Sengemedu, and S. Sanyal. 2012. Multi-objective ranking of comments on web. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 419--428.
[9]
C. Danescu-Niculescu-Mizil, G. Kossinets, J. Kleinberg, and L. Lee. 2009. How opinions are received by online communities: A case study on amazon.com helpfulness votes. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 141--150.
[10]
M. De Choudhury, H. Sundaram, A. John, and D. D. Seligmann. 2009. What makes conversations interesting? Themes, participants and consequences of conversations in online social media. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 331--340.
[11]
K. Denecke. 2008. Using SentiWordNet for multilingual sentiment analysis. In Proceedings of the 24th International Conference on Data Engineering Workshops. 507--512.
[12]
S. Dumais, J. Platt, D. Heckerman, and M. Sahami. 1998. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM'98). ACM Press, New York, 148--155.
[13]
A. Esuli. 2008. Automatic generation of lexical resources for opinion mining: Models, algorithms and applications. SIGIR Forum 42, 105--106.
[14]
A. Esuli and F. Sebastiani. 2006. SentiWordNet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC'06). 417--422.
[15]
C. Fellbaum, Ed. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
[16]
K. Filippova and K. B. Hall. 2011. Improved video categorization from text metadata and user comments. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM Press, New York, 835--842.
[17]
P. Gill, M. Arlitt, Z. Li, and A. Mahanti. 2007. YouTube traffic characterization: A view from the edge. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). ACM Press, New York, 15--28.
[18]
V. Gomez, A. Kaltenbrunner, and V. Lopez. 2008. Statistical analysis of the social network and discussion threads in Slashdot. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM Press, New York, 645--654.
[19]
V. Gomez, H. Kappen, N. Litvak, and A. Kaltenbrunner. 2012. A likelihood-based framework for the analysis of discussion threads. J. World Wide Web 16, 5--6, 645--675.
[20]
R. Hanna, A. Rohm, and V. L. Crittenden. 2011. We're all connected: The power of the social media ecosystem. Bus. Horiz. 54, 3, 265--273.
[21]
F. M. Harper, D. Raban, S. Rafaeli, and J. A. Konstan. 2008. Predictors of answer quality in online q&a sites. In Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI'08). ACM Press, New York, 865--874.
[22]
A. M. Harwood and C. L. Hahn. 1990. Controversial Issues in the Classroom. ERIC Clearinghouse for Social Studies/Social Science Education.
[23]
C. Hsu, E. Khabiri, and J. Caverlee. 2009. Ranking comments on the social web. In Proceedings of the International Conference on Computational Science and Engineering. Vol. 4. 90--97.
[24]
M. Hu, A. Sun, and E.-P. Lim. 2008. Comments-oriented document summarization: Understanding documents with readers' feedback. In Proceedings of the 31st Annual ACM SIGIR International Conference on Research and Development in Information Retrieval. 291--298.
[25]
T. Joachims. 1998. Text categorization with suport vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML'98). Springer, 137--142.
[26]
J. H. Kietzmann, K. Hermkens, I. P. McCarthy, and B. S. Silvestre. 2011. Social media? Get serious! Understanding the functional building blocks of social media. Bus. Horiz. 54, 3, 241--251.
[27]
S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti. 2006. Automatically assessing review helpfulness. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'06). Association for Computational Linguistics, 423--430.
[28]
J. Kunegis, A. Lommatzsch, and C. Bauckhage. 2009. The Slashdot zoo: Mining a social network with negative edges. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 741--750.
[29]
J. A. Kuypers. 2002. Press Bias and Politics: How the Media Frame Controversial Issues. Praeger.
[30]
Q. Li, J. Wang, Y. P. Chen, and Z. Lin. 2010. User comments for news recommendation in forum-based social media. Inf. Sci. 180, 24, 4929--4939.
[31]
Y. Lu, C. Zhai, and N. Sundaresan. 2009. Rated aspect summarization of short comments. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 131--140.
[32]
C. Manning and H. Schuetze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
[33]
G. Mishne and N. Glance. 2006. Leave a reply: An analysis of weblog comments. In Proceedings of the 3rd Annual Workshop on the Weblogging Ecosystem.
[34]
A. Mishra and R. Rastogi. 2012. Semi-supervised correction of biased comment ratings. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 181--190.
[35]
B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing (EMNLP'02). Vol. 10, Association for Computational Linguistics, 79--86.
[36]
S. Park, M. Ko, J. Kim, Y. Liu, and J. Song. 2011. The politics of comments: Predicting political orientation of news stories with commenters' sentiment patterns. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW'11). 113--122.
[37]
M. Potthast, B. Stein, F. Loose, and S. Becker. 2012. Information retrieval in the commentsphere. ACM Trans. Intell. Syst. Technol. 3, 4, 68:1--68:21.
[38]
A. Rosenberg and E. Binkowski. 2004. Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points. In HLT-NAACL Short Papers (HLT-NAACL-Short'04). Association for Computational Linguistics, 77--80.
[39]
M. Rowe, S. Angeletou, and H. Alani. 2011a. Anticipating discussion activity on community forums. In Proceedings of the 3rd IEEE International Conference on Privacy, Security, Risk and Trust and the 3rd IEEE International Conference on Social Computing (PASSAT/SocialCom'11). 315--322.
[40]
M. Rowe, S. Angeletou, and H. Alani. 2011b. Predicting discussions on the social semantic web. In Proceedings of the 8th Extended Semantic Web Conference on The Semanic Web: Research and Applications (ESWC'11), Part II. Springer, 405--420.
[41]
J. San Pedro, T. Yeh, and N. Oliver. 2012. Leveraging user comments for aesthetic aware image search reranking. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 439--448.
[42]
A. Schuth, M. Marx, and M. de Rijke. 2007. Extracting the discussion structure in comments on news-articles. In Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management (WIDM'07). ACM Press, New York, 97--104.
[43]
E. Shmueli, A. Kagian, Y. Koren, and R. Lempel. 2012. Care to comment? Recommendations for commenting on news stories. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 429--438.
[44]
S. Siersdorfer, S. Chelaru, W. Nejdl, and J. San Pedro. 2010. How useful are your comments? Analyzing and predicting youtube comments and comment ratings. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). ACM Press, New York, 891--900.
[45]
S. Siersdorfer, J. San Pedro, and M. Sanderson. 2009. Automatic video tagging using content redundancy. In Proceedings of the 32nd ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'09). ACM Press, New York, 395--402.
[46]
A. Susarla, J.-H. Oh, and Y. Tan. 2012. Social networks and the diffusion of user-generated content: Evidence from YouTube. Inf. Syst. Res. 23, 1, 23--41.
[47]
A. Tatar, J. Leguay, P. Antoniadis, A. Limbourg, M. D. De Amorim, and S. Fdida. 2011. Predicting the popularity of online articles based on user comments. In Proceedings of the International Conference on Web Intelligence, Mining and Semantics (WIMS'11).
[48]
M. Thelwall, P. Sud, and F. Vis. 2012. Commenting on YouTube videos: From Guatemalan rock to el big bang. J. Amer. Soc. Inf. Sci. Technol. 63, 3, 616--629.
[49]
M. Thomas, B. Pang, and L. Lee. 2006. Get out the vote: Determining support or opposition from congressional floor-debate transcripts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'06). Association for Computational Linguistics, 327--335.
[50]
M. Tsagkias, W. Weerkamp, and M. de Rijke. 2010. News comments: Exploring, modeling, and online prediction. In Proceedings of the 32nd European Conference on IR Research (ECIR'10). 191--203.
[51]
A. Veloso, W. Meira, T. Macambira, D. Guedes, and H. Almeida. 2007. Automatic moderation of comments in a large on-line journalistic environment. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM'07).
[52]
C. Wang, M. Ye, and B. A. Huberman. 2012. From user comments to on-line conversations. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12). ACM Press, New York, 244--252.
[53]
M. Weimer, I. Gurevych, and M. Mhlhuser. 2007. Automatically assessing the post quality in online discussions on software. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (Companion Volume Proceedings of the Demo and Poster Sessions). 125--128.
[54]
F. Wu and B. A. Huberman. 2008. How public opinion forms. In Proceedings of the 4th International Workshop on Internet and Network Economics (WINE'08). Springer, 334--341.
[55]
Y. Yang and J. O. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML'97). Morgan Kaufmann, San Fransisco, 412--420.
[56]
T. Yano and N. A. Smith. 2010. What's worthy of comment? Content and comment volume in political blogs. In Proceedings of the 4th International Conference on Weblogs and Social Media.
[57]
W. G. Yee, A. Yates, S. Liu, and O. Frieder. 2009. Are web user comments useful for search? In LSDS-web IR Workshop. http://lsdsir09.isti.cnr.it/lsdsir09-7.pdf.

Cited By

View all
  • (2024)Commenting on local politics: An analysis of YouTube video comments for local government videosResearch in Corpus Linguistics10.32714/ricl.13.01.0213:1(1-25)Online publication date: 2024
  • (2024)Analysing Emotional and Topical Patterns in Conspiracy Theory Narratives: a Discourse Comparative Study on the 2023 Hawaii Wildfires2024 14th International Conference on Pattern Recognition Systems (ICPRS)10.1109/ICPRS62101.2024.10677804(1-7)Online publication date: 15-Jul-2024
  • (2024)Evaluating service quality of express logistics service based on online reviews using LDA-LSTMJournal of Management Science and Engineering10.1016/j.jmse.2024.02.0019:3(308-327)Online publication date: Sep-2024
  • Show More Cited By

Index Terms

  1. Analyzing and Mining Comments and Comment Ratings on the Social Web

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on the Web
    ACM Transactions on the Web  Volume 8, Issue 3
    June 2014
    256 pages
    ISSN:1559-1131
    EISSN:1559-114X
    DOI:10.1145/2639948
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 July 2014
    Accepted: 01 March 2014
    Revised: 01 November 2013
    Received: 01 September 2012
    Published in TWEB Volume 8, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Comment ratings
    2. Yahoo! News
    3. YouTube
    4. community feedback

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)86
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Commenting on local politics: An analysis of YouTube video comments for local government videosResearch in Corpus Linguistics10.32714/ricl.13.01.0213:1(1-25)Online publication date: 2024
    • (2024)Analysing Emotional and Topical Patterns in Conspiracy Theory Narratives: a Discourse Comparative Study on the 2023 Hawaii Wildfires2024 14th International Conference on Pattern Recognition Systems (ICPRS)10.1109/ICPRS62101.2024.10677804(1-7)Online publication date: 15-Jul-2024
    • (2024)Evaluating service quality of express logistics service based on online reviews using LDA-LSTMJournal of Management Science and Engineering10.1016/j.jmse.2024.02.0019:3(308-327)Online publication date: Sep-2024
    • (2024)The impact of social media reports on nurses' job satisfaction: A cross-sectional surveyComputers in Human Behavior Reports10.1016/j.chbr.2024.10052916(100529)Online publication date: Dec-2024
    • (2023)Machine Learning based Sentiment Analysis of YouTube Video Comments2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI)10.1109/ICAEECI58247.2023.10370907(1-6)Online publication date: 19-Oct-2023
    • (2023)Hass im Netz – Aggressivität und Toxizität von Hasskommentaren und Postings, Detektion und AnalyseHandbuch Cyberkriminologie 110.1007/978-3-658-35439-8_13(261-292)Online publication date: 29-Aug-2023
    • (2022)An Estimation of Online Video User Engagement From Features of Time- and Value-Continuous, Dimensional EmotionsFrontiers in Computer Science10.3389/fcomp.2022.7731544Online publication date: 23-Mar-2022
    • (2022)Hass im Netz – Aggressivität und Toxizität von Hasskommentaren und Postings, Detektion und AnalyseHandbuch Cyberkriminologie10.1007/978-3-658-35450-3_13-1(1-32)Online publication date: 4-Nov-2022
    • (2022)Feature Relevance Analysis of Product Reviews to Support Online ShoppingInformation Integration and Web Intelligence10.1007/978-3-031-21047-1_40(441-446)Online publication date: 20-Nov-2022
    • (2021)Regulating algorithmic filtering on social mediaProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540797(6997-7011)Online publication date: 6-Dec-2021
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media