skip to main content
10.1145/1458527.1458538acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections

A "quick and dirty" website data quality indicator

Published: 30 October 2008 Publication History


This short paper outlines a research study in progress, which is motivated by the perception that the spelling error rate of a document can serve as a rudimentary proxy for the degree of quality control exercised in its creation, and, subsequently, indicate its quality. One objective of this research is to validate this understanding. Ultimately, the goal of this research is to take advantage of such an association. In particular, we propose a simple, "quick and dirty" metric for assisting in the evaluation of the quality of websites. This metric utilizes the reported hit counts of search engine queries on a pre-determined set of commonly misspelled words.


Amento, B., Terveen, L., and Hill, W. 2000. Does "Authority" Mean Quality? Predicting Expert Quality Ratings of Web Documents' in Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000.
Bagrow, J. P., Rozenfeld, H. D., Bollt, E. M., and ben-Avraham, D. 2004. How famous is a scientist?--Famous to those who know us, Europhysics Letters, Vol. 67, No. 4, 2004, pp. 511--516.
Cappielo, C. and Pernici, B. 2006. A Methodology for Information Quality Management in Self-Healing Web Services, 11th International Conference on Information Quality (ICIQ-06), MIT, Cambridge MA, 2006.
Cilibrasi, R. L., and Vitányi, P. M. B. 2007. The Google Similarity Index, IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 3, 2007, pp. 370--383.
Eppler, M. J., Algesheimer, R., and Dimpfel, M. 2003. Quality Criteria of Content-driven Websites and their Influence on Customer Satisfaction and Loyalty: An Empirical Test of an Information Quality Framework. 8th International Conference on Information Quality (ICIQ-03), MIT, Cambridge MA, 2003.
Krebs, V. 2003. What's your Google Number? International Association for Human Resource Information Management Journal, Vol. VII, No. 2, 2003, pp. 40--42.
Matsuo, Y., Tomobe, H., and Nishimura, T. 2007. Robust Estimation of Google Counts for Social Network Extraction. Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence 2007, pp. 1395--1401.
Pion, S., and Hamel, L. 2007. The Internet Democracy: A Predictive Model Based on Web Text Mining, International Conference on Data Mining, 2007, pp. 292--300.
Pun, J. C. C., and Lochovsky, F. H. 2004. Ranking search results by web quality dimensions, Journal of Web Engineering, Vol. 3, No. 3-4, 2004, pp. 216--235.
Pun, J. C. C., and Lochovsky, F. H. 2005. Finding High-Quality Web Pages Using Cohesiveness. 10th International Conference on Information Quality (ICIQ-05), MIT, Cambridge MA, 2005.
Schrock, Kathy. 2008. Critical Evaluation of a Web Site: Secondary School Level. Kathy Schrock's Guide for Educators, {June 5, 2008}
Simkin, M. V., and Roychowdhury V. P. 2006. "Theory of Aces: Fame by Chance or Merit?"The Journal of Mathematical Sociology, Vol. 30, No. 1, 2006, pp. 33--41.
Stvilia, B., Twidale, M. B., Smith, L. C., and Les Gasser 2005. Assessing Information Quality of a Community-Based Encyclopedia. 10th International Conference on Information Quality (ICIQ-05), MIT, Cambridge MA, 2005.
Sweetland, J. H. 2000. Reviewing the World Wide Web-Theory Versus Reality, Library Trends, Spring 2000.
Ury, Connie and Lori Mardis. 2008. Evaluating Websites: PART of the Research Process. {5 June 2008}
Virginia Tech 2008. Evaluating Internet Information, {June 5, 2008}.
R. Y. Wang and D. M. Strong 1996. Beyond Accuracy: What Data Quality Means to Data Consumers, Journal of Management Information Systems 12(4) (1996).

Cited By

View all
  • (2022)Does the crying baby always get the milk? An analysis of government responses for online requestsChinese Sociological Review10.1080/21620555.2022.210366755:1(96-125)Online publication date: 5-Aug-2022
  • (2018)A resource of errors written in Spanish by people with dyslexia and its linguistic, phonetic and visual analysisLanguage Resources and Evaluation10.1007/s10579-015-9329-051:2(379-408)Online publication date: 17-Dec-2018
  • (2014)Spatio-temporal quality issues for local searchProceedings of the 25th ACM conference on Hypertext and social media10.1145/2631775.2631792(297-299)Online publication date: 1-Sep-2014
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
WICOW '08: Proceedings of the 2nd ACM workshop on Information credibility on the web
October 2008
100 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2008


Request permissions for this article.

Check for updates

Author Tags

  1. data quality
  2. data quality indicator
  3. indicator
  4. information credibility
  5. information quality
  6. quick and dirty
  7. web data


  • Short-paper


CIKM08: Conference on Information and Knowledge Management
October 30, 2008
California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 9 of 19 submissions, 47%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics


Cited By

View all
  • (2022)Does the crying baby always get the milk? An analysis of government responses for online requestsChinese Sociological Review10.1080/21620555.2022.210366755:1(96-125)Online publication date: 5-Aug-2022
  • (2018)A resource of errors written in Spanish by people with dyslexia and its linguistic, phonetic and visual analysisLanguage Resources and Evaluation10.1007/s10579-015-9329-051:2(379-408)Online publication date: 17-Dec-2018
  • (2014)Spatio-temporal quality issues for local searchProceedings of the 25th ACM conference on Hypertext and social media10.1145/2631775.2631792(297-299)Online publication date: 1-Sep-2014
  • (2012)Informing observersProceedings of the 2012 Joint EDBT/ICDT Workshops10.1145/2320765.2320776(1-8)Online publication date: 30-Mar-2012
  • (2012)Lexical quality as a proxy for web text understandabilityProceedings of the 21st International Conference on World Wide Web10.1145/2187980.2188142(591-592)Online publication date: 16-Apr-2012
  • (2012)On measuring the lexical quality of the webProceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality10.1145/2184305.2184307(1-6)Online publication date: 16-Apr-2012
  • (2012)Lexical quality as a measure for textual web accessibilityProceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part I10.1007/978-3-642-31522-0_61(404-408)Online publication date: 11-Jul-2012
  • (2011)Estimating dyslexia in the webProceedings of the International Cross-Disciplinary Conference on Web Accessibility10.1145/1969289.1969300(1-4)Online publication date: 28-Mar-2011
  • (2010)Web-based statistical fact checking of textual documentsProceedings of the 2nd international workshop on Search and mining user-generated contents10.1145/1871985.1872002(103-110)Online publication date: 30-Oct-2010

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media