skip to main content
10.1145/3428658.3431756acmconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

Automatic Content Quality Estimation Using Deep Neural Networks in Collaborative Encyclopedias on the Web

Published: 30 November 2020 Publication History

Abstract

Wikipedia is based on user-generated content, in other words, anyone with internet access can write and make changes to its articles, hence the information quality of the encyclopedias is often criticized. Therefore, assigning the correct class of quality to Wikipedia articles is crucial for the experience of both authors and readers when using this large repository of information. In this paper, we present an approach relying on deep neural networks for this problem, our experiments consisted of testing three different models to achieve the best possible result. The first one using a conventional deep learning architecture and the other two using sets of semantically related quality indicators (aka, views) to better exploit their different properties, thus improving the final prediction. Finally, we compared our results with the state-of-the-art method (Support Vector Regression with views), achieving similar results with the possibility of improvement.

References

[1]
Andrew Borthwick, John Sterling, Eugene Agichtein, and Ralph Grishman. 1998. Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. In Sixth Workshop on Very Large Corpora. https://www.aclweb.org/anthology/W98-1118
[2]
Gabriel Calzada and Alex Dekhtyar. 2010. On Measuring the Quality of Wikipedia Articles. 11--18. https://doi.org/10.1145/1772938.1772943
[3]
Quang-Vinh Dang and Claudia-Lavinia Ignat. 2016. Quality Assessment of Wikipedia Articles without Feature Engineering. 27--30. https://doi.org/10.1145/2910896.2910917
[4]
Marco Cristo Pável Calado Daniel H Dalip, Marcos A Gonçalves. 2009. Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia. Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries (2009), 295--304.
[5]
Marco Cristo Pável Calado Daniel H Dalip, Marcos A Gonçalves. 2017. A general multiview framework for assessing the quality of collab-oratively created content on web 2.0. Journal of the Association forInformation Science and Technology (2017), 86--308.
[6]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]
[7]
Will Koehrsen. 2018. A Conceptual Explanation of Bayesian Hyperparameter Optimization for Machine Learning. Retrieved July 11, 2020 from https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f
[8]
Ning Qian. 1999. On the Momentum Term in Gradient Descent Learning Algorithms. Neural networks 12.1 (1999), 145--151.
[9]
Pincock T. Mingus B Rassbach, L. 2007. Exploring the feasibility of automatically rating online article quality. Retrieved Aug 05, 2020 from https://upload.wikimedia.org/wikipedia/wikimania2007/d/d3/RassbachPincockMingus07.pdf
[10]
Geoffrey Hinton Tijmen Tieleman. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (Oct. 2012), 26--31.
[11]
David H. Wolpert. 1992. Stacked Generalization. Neural Networks 5 (1992), 241--259.

Cited By

View all
  • (2024)Leveraging Large Language Models and Deep Learning for Wikipedia Quality Assessment2024 6th International Conference on Advancements in Computing (ICAC)10.1109/ICAC64487.2024.10851135(558-563)Online publication date: 12-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WebMedia '20: Proceedings of the Brazilian Symposium on Multimedia and the Web
November 2020
364 pages
ISBN:9781450381963
DOI:10.1145/3428658
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

  • SBC: Brazilian Computer Society
  • CNPq: Conselho Nacional de Desenvolvimento Cientifico e Tecn
  • CGIBR: Comite Gestor da Internet no Brazil
  • CAPES: Brazilian Higher Education Funding Council

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Machine Learning
  2. Neural Networks
  3. Quality Assessment
  4. Wikipedia

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

  • Conselho Nacional de Desenvolvimento Científico e Tecnológico
  • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
  • Fundação de Amparo à Pesquisa do Estado de Minas Gerais

Conference

WebMedia '20
Sponsor:
WebMedia '20: Brazillian Symposium on Multimedia and the Web
November 30 - December 4, 2020
São Luís, Brazil

Acceptance Rates

WebMedia '20 Paper Acceptance Rate 34 of 87 submissions, 39%;
Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Leveraging Large Language Models and Deep Learning for Wikipedia Quality Assessment2024 6th International Conference on Advancements in Computing (ICAC)10.1109/ICAC64487.2024.10851135(558-563)Online publication date: 12-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media