Abstract
This paper claims that effectiveness of automatic tools for evaluating web site accessibility has to be itself evaluated, given the increasingly important role that these tools play. The paper presents a comparison method for a pair of tools that takes into account correctness, completeness and specificity in supporting the task of assessing the conformance of a web site with respect to established guidelines. The paper presents data acquired during a case study based on comparing LIFT Machine with Bobby. The data acquired from the case study is used to assess the strengths and weaknesses of the comparison method. The conclusion is that even though there is room for improvement of the method, it is already capable of providing accurate and reliable conclusions.
Similar content being viewed by others
Notes
Comparing these tools is somehow unfair given their different scope, flexibility, power and price. LIFT Machine is targeted to an enterprise-level quality assurance team and whose price starts at $6,000; Bobby 4.0 was available for free (now it runs at about $300) and is targeted to a single individual wanting to test a relatively limited number of pages. Nevertheless, the comparison is useful as a case study for demonstrating the evaluation method itself.
More specific types of results could be considered. For example a distinction among definite errors, probable errors, manual warnings triggered by content, and untriggered manual warnings would yield a finer grid serving as a basis for a richer evaluation of tools. Some of the testing tools provide these finer distinctions. However, it may be difficult to classify the tool output according to those categories (this information might not be always available from the tool output) and if two tools provide different types of results, it will be difficult to compare them. For this reason, the method proposed in this paper is based on two types of results: those that the tool assumes to be true problems and those that are warnings.
The consequence is that the evaluation method is blind with respect to finer distinctions, and tools that provide intermediate warnings are treated in the same way as tools that provide manual warnings.
It is recommendable to use the same limits for both systems; otherwise the consequence is that there might be several pages that are tested by one tool only, thus reducing the effectiveness of the comparison method, since the issues associated with these pages are excluded from any further analysis. This has happened in the case study, due to differences in the crawling methods adopted by different tools.
This is Bobby’s terminology corresponding to what we earlier referred to as manual warning triggered by content (for Partial or PartialOnce) and untriggered manual warning (for AskOnce).
This is a case where the values reported in Table 1 affect the FN percentages. In particular, since FN for Bobby is defined in reference to the behavior of LIFT, when considering a larger number of issues generated by LIFT, there are increased chances to find a FN for Bobby. Therefore, FN for Bobby is correct, while FN for LIFT is underestimated.
A confidence interval of a parameter around a value and with a given significance level α describes the possible variability of the parameter when a different sample of data is analysed. α gives the probability that the parameter stays within the interval.
For example, the claim HFPα valid with probability α=0.01, means that the data gathered in this experiment in 99 cases out of 100 support the claim that A produces less FP than B.
References
Brajnik G (2000) Automatic web usability evaluation: what needs to be done? In: Proceedings of human factors and the web, 6th conference, Austin, Texas http://www.dimi.uniud.it/giorgio/papers/hfweb00.html
Brajnik G (2001) Towards valid quality models for websites. In: Proceedings of human factors and the web, 7th conference, Madison, Wisconsin http://www.dimi.uniud.it/giorgio/papers/hfweb01.html
Brajnik G (2004) Using automatic tools in accessibility and usability assurance. In: Stephanidis C (ed) Lecture notes in computer science proceedings of the 8th ERCIM UI4ALL workshop. Springer, Berlin Heidelberg New York
Brajnik G (2003) Comparing accessibility evaluation tools: results from a case study. In: Ardissono L, Goy A (eds) HCITALY 2003: Simposio su human-computer interaction. SigCHI, Turin, Italy
Brink T, Hofer E (2002) Automatically evaluating web usability. In: Proceedings of CHI 2002 workshop, Minneapolis, 20–25 April
Chisholm W, Palmer S (2002) Evaluation and Report Language (EARL) 1.0. http://www.w3.org/TR/EARL10
Dougherty R, Wade A (2004) Vischeck http://www.vischeck.com/
EUROAccessibility (2003) www.euroaccessibility.org http://www.euroaccessibility.org Cited in Nov 2003
Fenton NE, Pfleeger SL (1997) Software metrics, 2nd edn. Thompson, Washington, D.C.
Gunning R (1968) The techniques of clear writing. McGraw-Hill, New York
Ivory M, Hearst M (2001) The state of the art in automated usability evaluation of user interfaces. ACM Comput Surv 4(33):173–197
Ivory M, Mankoff J, Le A (2003) Using automated tools to improve web site usage by users with diverse abilities. IT Soc 1(3):195–236 http://www.stanford.edu/group/siqss/itandsociety/v01i03/v01i03a11.pdf
Nielsen Norman Group (2001) Beyond ALT text: making the web easy to use for users with disabilities. http://www.nngroup.com/reports/accessibility/
Paciello M (2000) Web accessibility for people with disabilities. CMP Books, Gilroy, Calif.
Scapin D, Leulier C, Vanderdonckt J, Mariage C, Bastien C, Farenc C, Palanque P, Bastide R (2000) Towards automated testing of web usability guidelines. In: Proceedings of human factors and the web, 6th conference, Austin, Texas http://www.tri.sbc.com/hfweb/scapin/Scapin.html
Slatin J, Rush S (2003) Maximum accessibility: making your web site more usable for everyone. Addison-Wesley, Boston
Sullivan T, Matson R (2000) Barriers to use: usability and content accessibility on the web’s most popular sites. In: Proceedings of 1st ACM conference on universal usability, Washington, D.C., 16–17 November
Thatcher J (2002) Evaluation and repair tools. Originally posted on http://www.jimthatcher.com. Cited in June 2002; no longer available.
Thatcher J, Waddell C, Henry S, Swierenga S, Urban M, Burks M, Regan B, Bohman P (2002) Constructing accessible web sites. Glasshaus, Birmingham, UK
UsableNet Inc (2003) Usablenet technology. http://www.usablenet.com/usablenet_technology/usablenet_technology.html
UsableNet Inc (2004) LIFT for Dreamweaver—Nielsen Norman Group edition. http://www.usablenet.com/products_services/lfdnng/lfdnng.html
W3C Web Accessibility Initiative. (1994) Evaluation, repair, and transformation tools for web content accessibility. http://www.w3.org/WAI/ER/existingtools.html
World Wide Web Consortium (1999) Web accessibility initiative. Checklist of checkpoints for web content accessibility guidelines 1.0. http://www.w3.org/TR/WCAG10/fullchecklist.html. Cited in May 1999
WorldWideWeb Consortium (1999) Web accessibility initiative. Web content accessibility guidelines 1.0. http://www.w3.org/TR/WCAG10. Cited in May 1999
Acknowledgements
Many thanks to Jim Thatcher and Daniela Ortner for their detailed reading of a draft of this paper. I’d also like to thank participants of the first face-to-face meeting of EuroAccessibility Task Force 2 held in London, November 2003, for their feedback on the method. Of course, the author is the only one responsible for the content of this paper. I give my thanks also to the editorial staff of the journal for their help in improving the English style of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Giorgio Brajnik is a scientific advisor for UsableNet Inc., manufacturer of LIFT Machine, one of the tools used in the case study reported in this paper.
Rights and permissions
About this article
Cite this article
Brajnik, G. Comparing accessibility evaluation tools: a method for tool effectiveness. Univ Access Inf Soc 3, 252–263 (2004). https://doi.org/10.1007/s10209-004-0105-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10209-004-0105-y