skip to main content
10.1145/3025171.3025201acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

WikiLyzer: Interactive Information Quality Assessment in Wikipedia

Published: 07 March 2017 Publication History

Abstract

Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success, but also a hindrance to good quality: contributions can be of poor quality because anyone, even anonymous users, can participate. Though Wikipedia has defined guidelines as to what makes the perfect article, authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. However, little has been done to support quality assessment of user-generated content through interactive tools that combine automatic methods and human intelligence. We developed WikiLyzer, a Web toolkit comprising three interactive applications designed to assist (i) knowledge discovery experts in creating and testing metrics for quality measurement, (ii) Wikipedia users searching for good articles, and (iii) Wikipedia authors that need to identify weaknesses to improve a particular article. A design study sheds a light on how experts could create complex quality metrics with our tool, while a user study reports on its usefulness to identify high-quality content.

References

[1]
Adler, B. T., Chatterjee, K., De Alfaro, L., Faella, M., Pye, I., and Raman, V. Assigning trust to wikipedia content. In Proceedings of the 4th International Symposium on Wikis, ACM (2008), 26.
[2]
Adler, B. T., and De Alfaro, L. A content-driven reputation system for the wikipedia. In Proceedings of the 16th international conference on World Wide Web, ACM (2007), 261--270.
[3]
Alexander, J. E., and Tate, M. A. Web Wisdom; How to Evaluate and Create Information Quality on the Webb, 1st ed. L. Erlbaum Associates Inc., Hillsdale, NJ, USA, 1999.
[4]
Anderka, M. Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. Dissertation, Bauhaus-Universität Weimar, June 2013.
[5]
Anderka, M., Stein, B., and Lipka, N. Detection of text quality flaws as a one-class classification problem. In 20th ACM International Conference on Information and Knowledge Management (CIKM'11), ACM (2011), 2313--2316.
[6]
Anderka, M., Stein, B., and Lipka, N. Towards Automatic Quality Assurance in Wikipedia. In 20th International Conference on World Wide Web, ACM (2011).
[7]
Anderka, M., Stein, B., and Lipka, N. Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. In 35rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM (2012).
[8]
Arazy, O., Nov, O., Patterson, R., and Yeo, L. Information quality in wikipedia: The effects of group composition and task conflict. Journal of Management Information Systems 27, 4 (2011), 71--98.
[9]
Baeza-Yates, R. User generated content: how good is it? In 3rd Workshop on Information Credibility on the Web (WICOW'09), ACM (2009), 1--2.
[10]
Blumenstock, J. E. Size matters: word count as a measure of quality on wikipedia. In Proceedings of the 17th international conference on World Wide Web, ACM (2008), 1095--1096.
[11]
Brandes, U., Kenis, P., Lerner, J., and van Raaij, D. Network analysis of collaboration structure in wikipedia. In Proceedings of the 18th international conference on World wide web, ACM (2009), 731--740.
[12]
Chevalier, F., Huot, S., and Fekete, J.-D. Wikipediaviz: Conveying article quality for casual wikipedia readers. In Pacific Visualization Symposium (PacificVis), 2010 IEEE, IEEE (2010), 49--56.
[13]
Dalip, D. H., Santos, R. L., Oliveira, D. R., Amaral, V. F., Gonçalves, M. A., Prates, R. O., Minardi, R., and de Almeida, J. M. Greenwiki: a tool to support users' assessment of the quality of wikipedia articles. In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, ACM (2011), 469--470.
[14]
Ferretti, E., Errecalde, M., Anderka, M., and Stein, B. On the Use of Reliable-Negatives Selection Strategies in the PU Learning Approach for Quality Flaws Prediction in Wikipedia. In Proceedings of the 11th International Workshop on Text-based Information Retrieval (TIR 2014), held in conjunction with DEXA 2014, IEEE (2014), 211--215.
[15]
Ferretti, E., Fusilier, D., Cabrera, R., Montes-y-Gómez, M., Errecalde, M., and Rosso, P. On the use of PU Learning for quality flaw prediction in Wikipedia: notebook for PAN at CLEF 2012. In Notebook Papers of CLEF 2012 LABs and Workshops (2012).
[16]
Ferschke, O., Gurevych, I., and Rittberger., M. FlawFinder: a modular system for predicting quality flaws in Wikipedia: notebook for PAN at CLEF 2012. In Notebook Papers of CLEF 2012 LABs and Workshops (2012).
[17]
Ferschke, O., Gurevych, I., and Rittberger, M. The impact of topic bias on quality flaw prediction in wikipedia. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), vol. 1, Association for Computational Linguistics (Stroudsburg, PA, USA, Aug. 2013), 721--730.
[18]
Goutte, C., and Gaussier, E. A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In Advances in information retrieval. Springer, 2005, 345--359.
[19]
Gratzl, S., Lex, A., Gehlenborg, N., Pfister, H., and Streit, M. LineUp: visual analysis of multi-attribute rankings. IEEE transactions on visualization and computer graphics 19, 12 (dec 2013), 2277--86.
[20]
Hasan Dalip, D., André Gonçalves, M., Cristo, M., and Calado, P. Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia. In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, ACM (2009), 295--304.
[21]
Hu, M., Lim, E.-P., Sun, A., Lauw, H. W., and Vuong, B.-Q. Measuring article quality in wikipedia: models and evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, ACM (2007), 243--252.
[22]
Lex, E., Völske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Stein, B., and Granitzer, M. Measuring the quality of web content using factual information. In 2nd joint WICOW/AIRWeb Workshop on Web quality, ACM (2012).
[23]
Lih, A. Wikipedia as participatory journalism: Reliable sources? metrics for evaluating collaborative media as a news resource. Nature (2004).
[24]
Lim, E.-P., Vuong, B.-Q., Lauw, H. W., and Sun, A. Measuring qualities of articles contributed by online communities. In Web Intelligence (2006), 81--87.
[25]
Lipka, N., and Stein, B. Identifying featured articles in wikipedia: writing style matters. In Proceedings of the 19th international conference on World wide web, ACM (2010), 1147--1148.
[26]
Pirolli, P., Wollny, E., and Suh, B. So you know you're getting the best possible information: a tool that increases wikipedia credibility. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2009), 1505--1508.
[27]
Riche, N. H., Lee, B., and Chevalier, F. ichase: Supporting exploration and awareness of editing activities on wikipedia. In Proceedings of the International Conference on Advanced Visual Interfaces, ACM (2010), 59--66.
[28]
Sauro, J., and Lewis, J. R. Quantifying the user experience: Practical statistics for user research. Elsevier, 2012.
[29]
Sedlmair, M., Meyer, M., and Munzner, T. Design study methodology: Reflections from the trenches and the stacks. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2431--2440.
[30]
Streit, M., and Gehlenborg, N. Bar charts and box plots. Nature methods 11, 2 (feb 2014), 117.
[31]
Stvilia, B., Twidale, M., Smith, L., and Gasser, L. Assessing information quality of a community-based encyclopedia. In 10th International Conference on Information Quality (ICIQ'05), MIT (2005), 442--454.
[32]
User:Pyrospirit. User:Pyrospiri. https://en.wikipedia.org/wiki/User:Pyrospirit/metadata, 2015. {Online; accessed 26-May-2015}.
[33]
Wöhner, T., and Peters, R. Assessing the quality of wikipedia articles with lifecycle based metrics. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration, ACM (2009), 16.

Cited By

View all
  • (2023)Automatic Quality Assessment of Wikipedia Articles—A Systematic Literature ReviewACM Computing Surveys10.1145/362528656:4(1-37)Online publication date: 10-Nov-2023
  • (2021)A Web GIS-Based Integration of 3D Digital Models with Linked Open Data for Cultural Heritage ExplorationISPRS International Journal of Geo-Information10.3390/ijgi1010068410:10(684)Online publication date: 11-Oct-2021
  • (2020)Modeling Popularity and Reliability of Sources in Multilingual WikipediaInformation10.3390/info1105026311:5(263)Online publication date: 13-May-2020
  • Show More Cited By

Index Terms

  1. WikiLyzer: Interactive Information Quality Assessment in Wikipedia

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      IUI '17: Proceedings of the 22nd International Conference on Intelligent User Interfaces
      March 2017
      654 pages
      ISBN:9781450343480
      DOI:10.1145/3025171
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 March 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. text analytics
      2. text quality
      3. user-generated content
      4. visual analytics
      5. wikipedia

      Qualifiers

      • Research-article

      Funding Sources

      • WIQ-EI project funded by European Comission under FP7-PEOPLE
      • Austrian COMET Program

      Conference

      IUI'17
      Sponsor:

      Acceptance Rates

      IUI '17 Paper Acceptance Rate 63 of 272 submissions, 23%;
      Overall Acceptance Rate 746 of 2,811 submissions, 27%

      Upcoming Conference

      IUI '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Automatic Quality Assessment of Wikipedia Articles—A Systematic Literature ReviewACM Computing Surveys10.1145/362528656:4(1-37)Online publication date: 10-Nov-2023
      • (2021)A Web GIS-Based Integration of 3D Digital Models with Linked Open Data for Cultural Heritage ExplorationISPRS International Journal of Geo-Information10.3390/ijgi1010068410:10(684)Online publication date: 11-Oct-2021
      • (2020)Modeling Popularity and Reliability of Sources in Multilingual WikipediaInformation10.3390/info1105026311:5(263)Online publication date: 13-May-2020
      • (2019)Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different TopicsComputers10.3390/computers80300608:3(60)Online publication date: 14-Aug-2019
      • (2019)Interactive Quality Analytics of User-generated ContentACM Transactions on Interactive Intelligent Systems10.1145/31509739:2-3(1-42)Online publication date: 27-Mar-2019
      • (2019)Measures for Quality Assessment of Articles and Infoboxes in Multilingual WikipediaBusiness Information Systems Workshops10.1007/978-3-030-04849-5_53(619-633)Online publication date: 3-Jan-2019
      • (2018)Ontology-Based Classifiers for Wikipedia Article Quality ClassificationAdvances in Intelligent Informatics, Smart Technology and Natural Language Processing10.1007/978-3-319-94703-7_7(68-81)Online publication date: 19-Dec-2018
      • (2017)The Impact of Topic Characteristics and Threat on Willingness to Engage with Wikipedia Articles: Insights from Laboratory ExperimentsFrontiers in Psychology10.3389/fpsyg.2017.019608Online publication date: 7-Nov-2017
      • (2017)From Search to Discovery with Visual Exploration ToolsProceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics10.1145/3038462.3038872(1-1)Online publication date: 13-Mar-2017

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media