skip to main content
10.1145/2184305.2184308acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebqualityConference Proceedingsconference-collections
research-article

Measuring the quality of web content using factual information

Published: 16 April 2012 Publication History

Abstract

Nowadays, many decisions are based on information found in the Web. For the most part, the disseminating sources are not certified, and hence an assessment of the quality and credibility of Web content became more important than ever. With factual density we present a simple statistical quality measure that is based on facts extracted from Web content using Open Information Extraction. In a first case study, we use this measure to identify featured/good articles in Wikipedia. We compare the factual density measure with word count, a measure that has successfully been applied to this task in the past. Our evaluation corroborates the good performance of word count in Wikipedia since featured/good articles are often longer than non-featured. However, for articles of similar lengths the word count measure fails while factual density can separate between them with an F-measure of 90.4%. We also investigate the use of relational features for categorizing Wikipedia articles into featured/good versus non-featured ones. If articles have similar lengths, we achieve an F-measure of 86.7% and 84% otherwise.

References

[1]
M. Anderka, B. Stein, and N. Lipka. Towards automatic quality assurance in wikipedia. In Proc. of the 20th int. conf. on World wide web, pages 5--6, 2011.
[2]
J. E. Blumenstock. Size matters: word count as a measure of quality on wikipedia. In Proc. of the 17th int. conf. on World Wide Web, pages 1095--1096, 2008.
[3]
M. J. Cafarella, J. Madhavan, and A. Halevy. Web-scale extraction of structured data. SIGMOD Rec., 37:55--61, 2009.
[4]
O. Etzioni, M. Banko, S. Soderland, and D. Weld. Open information extraction from the web. Communications of the ACM, 51(12):68--74, 2008.
[5]
C. Fellbaum, editor. WordNet: an electronic lexical database. MIT Press, 1998.
[6]
E. Lex, A. Juffinger, and M. Granitzer. Objectivity classification in online media. In Proc. of the 21st ACM conf. on Hypertext and hypermedia, pages 293--294, 2010.
[7]
N. Lipka and B. Stein. Identifying featured articles in wikipedia: writing style matters. In Proc. of the 19th int. conf. on World wide web, 2010.
[8]
A. Ritter, S. Soderland, D. Downey, and O. Etzioni. It's a contradiction - no, it's not: A case study using functional relations. In EMNLP, pages 11--20. ACL, 2008.
[9]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proc. of the 16th int. conf. on World Wide Web, pages 697--706, 2007.
[10]
N. Weber, K. Schoefegger, J. Bimrose, T. Ley, S. Lindstaedt, A. Brown, and S.-A. Barnes. Knowledge maturing in the semantic mediawiki: A design study in career guidance. In Learning in the Synergy of Multiple Disciplines, pages 700--705. Springer Berlin/Heidelberg, 2009.

Cited By

View all
  • (2023)Automatic Quality Assessment of Wikipedia Articles—A Systematic Literature ReviewACM Computing Surveys10.1145/362528656:4(1-37)Online publication date: 10-Nov-2023
  • (2021)Structural Analysis of Wikigraph to Investigate Quality Grades of Wikipedia ArticlesCompanion Proceedings of the Web Conference 202110.1145/3442442.3452345(584-590)Online publication date: 19-Apr-2021
  • (2021)Measuring Quality of Wikipedia Articles by Feature Fusion‐based Stack LearningProceedings of the Association for Information Science and Technology10.1002/pra2.44958:1(206-217)Online publication date: 13-Oct-2021
  • Show More Cited By

Index Terms

  1. Measuring the quality of web content using factual information

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WebQuality '12: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
      April 2012
      71 pages
      ISBN:9781450312370
      DOI:10.1145/2184305
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 April 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      WebQuality '12

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Automatic Quality Assessment of Wikipedia Articles—A Systematic Literature ReviewACM Computing Surveys10.1145/362528656:4(1-37)Online publication date: 10-Nov-2023
      • (2021)Structural Analysis of Wikigraph to Investigate Quality Grades of Wikipedia ArticlesCompanion Proceedings of the Web Conference 202110.1145/3442442.3452345(584-590)Online publication date: 19-Apr-2021
      • (2021)Measuring Quality of Wikipedia Articles by Feature Fusion‐based Stack LearningProceedings of the Association for Information Science and Technology10.1002/pra2.44958:1(206-217)Online publication date: 13-Oct-2021
      • (2020)Don’t Let Me Be Misunderstood:Comparing Intentions and Perceptions in Online DiscussionsProceedings of The Web Conference 202010.1145/3366423.3380273(2066-2077)Online publication date: 20-Apr-2020
      • (2020)Predicting Information Quality Flaws in Wikipedia by Using Classical and Deep Learning ApproachesComputer Science – CACIC 201910.1007/978-3-030-48325-8_1(3-18)Online publication date: 14-May-2020
      • (2019)Interactive Quality Analytics of User-generated ContentACM Transactions on Interactive Intelligent Systems10.1145/31509739:2-3(1-42)Online publication date: 27-Mar-2019
      • (2018)Determining Quality of Articles in Polish Wikipedia Based on Linguistic FeaturesInformation and Software Technologies10.1007/978-3-319-99972-2_45(546-558)Online publication date: 29-Aug-2018
      • (2018)Comparative Analysis of the Informativeness and Encyclopedic Style of the Popular Web Information SourcesBusiness Information Systems10.1007/978-3-319-93931-5_24(333-344)Online publication date: 16-Jun-2018
      • (2017)Relative Quality and Popularity Evaluation of Multilingual Wikipedia ArticlesInformatics10.3390/informatics40400434:4(43)Online publication date: 8-Dec-2017
      • (2017)WikiLyzerProceedings of the 22nd International Conference on Intelligent User Interfaces10.1145/3025171.3025201(377-388)Online publication date: 7-Mar-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media