In this section, we analyse the answers that the users provided to the questions included in the survey, distinguishing them between those provided to the open questions, and those provided to the closed questions (S1–S6). The answers to the open questions were analysed in the following manner: they were saved in an excel file and then their content was read and searched for common themes, according to which the various questions were coded and grouped. The analysis was done by one of the authors and then reviewed by the others.
5.3.1 Answers to the Open Questions.
Do you use automated accessibility assessment tools to support your work? (Y/N) If yes, which one(s)?
Ninety people answered that they use accessibility tools for their work, while the remaining 48 did not use them. Regarding the tool they use most often for their work they answered: MAUVE++ is used by 32 users (16.49%); WAVE by 27 users (13.92%); Siteimprove by 21 people (10.82%); W3C Markup Validation Service by 14 persons (7.22%), Lighthouse by 11 people (5.67%); axe by 11 users (5.67%); Achecker by 10 people (5.15%); Vamola by 10 users (5.15%), Accessibility Insights by 6 users (3.09%); IBM Equal Access Accessibility Checker by 6 users (3.09%); ARC Toolkit by 3 users (1.55%), Cynthia Says by 3 participants (1.55%), tota11y by 3 users (1.55%), and WebAIM by 3 users (1.55%). Then, FAE is used by 2 users, aCe is used by 2 users, pdf checker is used by 2 users, TPGi Colour Contrast Analyzer (CCA) is used by 2 users, 2 users declared to use a mix of browsers’ extensions and add-ons. Other tools (such as Jigsaw, Imergo) were mentioned by just one user. We also collected information about the frequency with which the users exploit validation tools for their job (we asked to indicate a maximum of three tools in the survey). 29 users (which correspond to the 32.22% of the 90 abovementioned users), declared to use at least one tool once a month. 26 users (28.89%) declared to use one or more tools once a day, 18 users (20%) once a year, while the remaining 17 (18.89%) once a week.
How would you define the transparency of automatic accessibility assessment tools?
The answers provided by the users are grouped according to the criteria that we identified previously. Please note that it sometimes happened that, in their responses, users mentioned more than one aspect/criterion.
What standards, success criteria, and techniques are supported. The standard(s) with the tool is compliant with have been mentioned by 16 users as a way to characterize the transparency of tools. Seventeen persons mentioned that when a tool explicitly highlights the criteria used for the evaluation and how many of them are covered, this highly contributes to an increase in its transparency. One user explicitly mentioned that one factor that affects the transparency of a tool is “when the tool highlights the methods and the parameters that characterize the evaluation criteria”. Seven persons highlighted that one factor that affects the transparency of a tool is whether the tool highlights the specific techniques/tests that it applied (or not) when checking the various success criteria.
How accessibility issues are categorized. Six users mentioned this aspect, in particular the importance of having the results categorized according to different types of content (e.g., according to the implementation language, such as HTML or CSS).
How the validation results are provided by the tool. Forty-one users mentioned aspects associated with this point. In particular, the information that the tool provides to users about errors/violations of accessibility was judged an aspect that strongly characterizes the transparency of a tool. In particular, many users declared that it is important that tools provide correct and clear explanations/visualizations of such errors (possibly both in the page and in the code), and in a way that is comprehensible also by non-technical users. Furthermore, they should offer good coverage of the errors using relevant references to the page/code to better identify/localize them, and also using relevant references to the corresponding concerned criteria. Finally, they should provide clear explanations about why an issue was pointed out and what its consequences are in terms of accessibility. Other users mentioned that transparency is highly impacted by how clear the results provided by the analysis are, and also how comprehensible is the way in which the analysis is carried out, also referring to broader information on the resulting analysis, i.e., not just focusing on the errors, but indicating how the results have been obtained, the pages used for the evaluation, the clarity and comprehensibility of the results produced by the analysis and the way in which they have been obtained, the completeness of the analysis, its reliability, and verifiability/replicability.
Whether the tool provides practical indications about how to solve the identified problems. Fifteen users mentioned as a key aspect when the tools provide users with concrete suggestions for possible solutions to the accessibility violations identified, more precisely how and where to intervene in order to solve the identified accessibility violations/issues.
Whether the tool is able to provide information about its limitations. Another aspect that users (N = 5) judged important for the transparency of the tool is that it should clearly highlight the situations that it is not able to address automatically, and therefore for which situations there is a specific need for manual checking. In particular, one user said that one aspect characterizing transparency is when the tool clearly states “what needs a manual validation and what would imply, in concrete terms, performing this manual validation”, thus highlighting that not only it is important to remark the need for manual checking in general (as tools are never exhaustive), but also to provide guidance to the users about how to perform such manual check in concrete terms. One user highlighted that the tool is transparent when it highlights any part of the web site that the tool was not able to test. Another aspect that users (N = 4) rated highly in terms of transparency was related to the situations when the evaluations are ambiguous, or when false positive and false negatives could occur. In such cases, one user highlighted that it would be better that the tool explained the choices made in order to arrive at the provided results. Four users highlighted that further information about the methodology and objectives of the tool should be provided to users, to increase transparency. Moreover, the importance of declaring the limitations of the assessment provided was mentioned by four users as a way to improve transparency. Among further aspects mentioned by participants, two of them highlighted that the inconsistencies occurring between the evaluations provided by different tools can affect transparency. One user mentioned that a tool is transparent when it is actually possible to perform some modifications to the validation results (e.g., when it is possible to declare that a “fail” is actually a “pass”).
Additional aspects mentioned. Twenty-four users mentioned some more general characteristic/quality that tools should have in order to be transparent. Among the qualities mentioned there was ease of use, clarity, and reliability; also, the fact that the tool is free or open source, that it is independent from specific stakeholders/vendors, and that it can be used by any user in an “open” manner, and also the accountability/credibility of the organization that develops the tool. Nine users mentioned specific features of the tool that can affect transparency. Among them, five users highlighted the users’ need to get further details and documentation about the tool, also in terms of its strengths and features. One user highlighted that the tools generally do not provide many details, thereby accessibility experts tend not to trust them. Two users mentioned that the transparency of a tool could be increased by the personalization/configuration possibilities offered by the tool.
Have you ever experienced NOT to be able to understand the results of an accessibility evaluation performed by an automated tool? 89 answered yes, 49 users answered no.
If YES, do you remember what kind of difficulties you encountered? The answers to this question are grouped according to four themes: results, errors, solutions, and lack of clarity.
Seventeen users mentioned difficulties connected with the results provided by tools. The aspects that they mentioned regarding the results can be grouped according to five sub-themes (with the most frequent ones appearing first): the mismatches between what was reported and what the user observed, the interpretation of results, divergences between evaluations provided by other tools, unclear/inefficient presentation of results, and lack of completeness in providing such results. In particular, most users complained about mismatches: some users reported experiencing a mismatch between what they observed and what was reported by the tool, e.g., this happened –one user said- when there was a criterion which was reported as not satisfied, whereas actually there was no error in the page. On the other hand, another user said that, although the page was not accessible, the tool said that it was. Another user highlighted that this mismatch could affect the trust that users have in tools, as sometimes users can have the feeling that the indications are not correct and therefore the tool seems buggy. Indeed, a pair of users reported having actually experienced a bug in the validation tool: “I do recall that the technical support team were able to explain the issues and that some issues were due to bugs in the tool.” Another point mentioned by several users regarded the interpretation of results: people complained about the difficulty of understanding the results provided by the tool. One user said that sometimes the tools are limited to provide a technical summary, which makes it difficult to understand their results, especially by non-technical people. Another user said that “it is sometimes difficult to understand what success criteria it was mapping to, why it was only picking some techniques over others, or why certain lines of code were tagged as wrong.” Another sub-theme referred to the divergences among different evaluations. In particular, two users highlighted that the provided results are not always in line with the results provided by other tools. A pair of users highlighted that sometimes tools present the results in an uneasy-to-read and non-efficient manner. In particular, one user highlighted that sometimes tools provide “a long list of results which is not useful if you are willing to prioritize due to lack of resources”. A final sub-theme regarded the lack of completeness in providing the results: one user said that she would have preferred to see in detail also the criteria that successfully passed the evaluation, and not only those which failed.
Sixteen users mentioned difficulties connected with errors: in many cases the reported difficulty is in understanding the reason why a point is reported as a violation of accessibility as the indications (error messages) are sometimes ambiguous, generic (i.e., do not specifically indicate where the error is), do not even actually relate to a real accessibility violation, are not always correctly associated with the concerned element, and overall are unclear and not exhaustive. One user complained about an unclear distinction between errors and warnings. Fourteen users provided further comments about reported difficulties concerning the solutions provided by the tools. Most of them highlighted the difficulty of having specific, clear and correct indications about how to solve the issues, reporting that currently there is a scarcity of them. A pair of users highlighted that sometimes the proposed suggestions about how to solve a specific accessibility issue are not correct or do not work. One user suggested providing “more contextualized solutions, perhaps with small examples, to understand whether the tool has understood the context of the analysed content or not”. Eleven users complained, more in general, about lack of clarity of the information provided by the tool: sometimes tools provide messages that are generic and ambiguous, or they use a too technical language, which is not suitable for unskilled people.
Are there any other features you think an automated accessibility evaluation tool should have in order to be transparent? 42 users (30.43% of the total users) answered YES, the remaining 96 answered no.
If YES, which ones? The answers from users were grouped into six themes.
Seventeen users mentioned some features and/or characteristics that the tool should have. Aspects that were highlighted by users regarded: the possibility to evaluate also PDFs, to be open source, to include further localization possibilities (i.e., to be able to select the language to more easily understand the errors, or to be able to select a specific country, as accessibility norms can change according to them), to have the possibility to export the reports. One user suggested having a blog reporting the updates done over time on the tool, which could help −the user noted− for understanding why some evaluation results changed over time. Another user suggested making available some practical tutorials about how to write HTML and CSS. Another user highlighted that it would be important to know who is developing and releasing the tool, as well as its mission and the goals, to better evaluate its reliability and degree of confidence; the same user highlighted that another added value could be the availability of an effective support service. One user highlighted that it could be useful to know who supports/promotes the tool. One user suggested indicating how often the tool is updated. Another user highlighted that some tools (e.g., aCe) have a manual checking model that should be followed by other tools, as it helps in doing manual checks. In particular, to this regard, another user said that tools should suggest which manual tests can be done in order to verify semi-automated success criteria. A user suggested publishing the list of the tests that tools do, highlighting that some tools already do this. One user suggested proving the results delivered by the tool against a standardized set of examples, like the ones provided by ACT rules. Another user highlighted the need of solving the inconsistencies that can be found among different tools. One user declared that it would be useful to understand whether a site/app meets the WAD requirements. Another user said that tools should state outright the known statistics of how many accessibility problems can be determined through automated scans. Thirteen users mentioned the need of having further info on the results/analysis. Among the most relevant comments provided, one highlighted the need of having further information to precisely replicate the tests; another person suggested to clearly indicate what was not tested, and what needs to be tested manually. Another user highlighted the clarity of the results as a key aspect. A pair of users highlighted that the tool should give a precise indication of the conformance level (i.e., A, AA, AAA) that it considers. In addition, one of them highlighted that it would be useful to indicate what kind of problems would be faced by people with which disability(ies) if an error is not corrected or a criterion is not met. Another user would appreciate further information about the ARIA rules the tool evaluated. A pair of users highlighted the need to have more references to WCAG, i.e., which WCAG checkpoints are covered, how they are covered, and more code snippets with a hint on what to fix and what is missing. A user highlighted that, if the tool provides a general overall measure, it should be explicit in how it is calculated and what its limitations are. Two users emphasized the importance of having some visual indications directly in the concerned Web page (i.e., to show the layout of the Web page, to highlight key parts in it e.g., the tables). In addition, another user highlighted the importance of having further information about how the check has been done by the tool, in order to facilitate the user to verify whether there is a bug in the tool. Five users highlighted the need of providing concrete and operative indications to solve the errors identified, also by showing one or more examples of the solution, especially for the most common errors. Among them, one user highlighted the need of providing hands-on examples especially when specific assistive technologies are involved (i.e., screen readers), as in such situations it is not obvious that all the users of the tool know all their implications and how to solve possible problems connected with their use. Four users also pointed out the need to provide better support to non-technical users. This would imply, for instance, providing users with easily understandable results, possibly accompanied by visual graphs, as well as easy-to-understand explanations of the motivations why an accessibility violation was found by the tool. One user suggested having an icon that should allow users to keep track of the current state of the evaluation easily. Three users mentioned the need to emphasize that manual check is always needed. Users highlighted that tools should clearly state that the automatic checks they provide should in any case be complemented by manual validation. One, in particular, declared: “They should state outright the known statistics of how many accessibility problems can be determined through automated scans and should also make clear that it is not possible for an automated tool to identify or remediate the vast majority of websites to 100% WCAG conformance.”. Finally, three users pointed out the need of highlighting and addressing the occurrence of false positives, false negatives, and possible errors in the analysis. They said that sometimes tools highlight issues that are not real ones or, on the contrary, could fail in identifying actual accessibility violations, thereby it would be better to solve this issue. One user, in particular, suggested that the tools should highlight the possibility of false positives in the most critical WCAG criteria. Another one suggested that the tools should report the confidence level associated with a specific error when it cannot be sure that it is an actual error.
5.3.2 Answers to the Closed Questions.
On a scale of 1 (not very useful) to 5 (very useful), how useful do you rate the following features in automated accessibility validation tools in terms of transparency? That the tool •
states what standards, success criteria and techniques it supports in the assessment? (S1)
•
specifies how it categorizes evaluation results (errors, warnings, etc.)? (S2)
•
is able to provide general measures that make explicit the level of accessibility of the website/mobile app? (S3)
•
presents the evaluation results in a summarized (e.g., graphs, tables) and in a detailed way (e.g., code view)? (S4)
•
gives some practical indications on how to resolve the detected problem? (S5)
•
gives some indication of its limitations? (S6)
For example, on what types of limitations the tool should provide indications (follow-up question to S6)?
The answers addressed various themes: (i) the aspects that the tool is not able to automatically evaluate, (ii) the preferences and parameters that users can specify for the analysis, (iii) the situations in which the results can be wrong (e.g., false positive or false negatives) or ambiguous, (iv) the lack of clear indications highlighting the need of manual check (with possible guidance on this manual check), and also (v) further aspects.
Thirty-six users mentioned aspects that the tool is not able to automatically evaluate as a limitation. Many of them generically pointed out that tools should indicate the situations that they cannot automatically assess, either e.g., because they do not cover the corresponding criterion or because the success criterion is just partially checked. Other users were more specific in identifying such cases: when a tool is not able to access URLs that are protected by login; when tools are not able to perform their assessment when specific technologies are considered or when dynamic pages are considered; when issues are in the content of the page, rather than in its structure; when checking colour contrast on pseudo-elements; when they have to analyse mobile apps, when they have to analyse different types of documents/formats (i.e., svg, pdf), and other specific situations (e.g., shadow DOMs, content inside frames). Also, one user mentioned the inability of tools to perfectly emulate a braille reader; another user mentioned the inability for tools to evaluate how properly images and alternative texts are used in a website. One of the users mentioned that tools are not able to cover all WCAG success criteria, and it would be useful to know the rules (testing algorithms) used and which published ACT rules are covered. Another theme regarded limitations concerning the preferences and parameters that users can specify for the analysis (8 users mentioned this point). Some users mentioned as a limitation the number of pages to consider for the evaluation, one user mentioned the depth of the analysis. One user highlighted as a limitation the lack of compliance with specific standards. One user highlighted that there are some tools which are not very up-to-date. Another comment indicated that the versions of the various languages (e.g., JavaScript) and frameworks (e.g., Bootstrap) that the tool is able to address could represent a limitation, and thus it should be clearly indicated. Another type of limitation regarded the occurrence of situations in which the results can be wrong (i.e., false positive or false negatives), or ambiguous. Seven users mentioned that tools should clearly indicate situations that can generate false positives/negatives in the evaluation or indicate when the evaluation could generate multiple interpretations. In this regard, one user mentioned that the ARC Toolkit issues warning for cases that may or may not be a problem depending on the context. Another theme regarded the lack of a clear declaration highlighting the need for manual checks (with possible guidance). Five users emphasized the fact that an automatic validation is never complete and exhaustive, therefore tools should clearly highlight this, also possibly providing guidance for carrying out manual checks. In addition, one user said: “An automatic evaluation is not enough to guarantee that a site is accessible. Heads up for manual checks would be appreciated; some tools do that.” Finally, as further aspects mentioned, one user suggested that it would be useful to know in advance the behaviour of the tool compared to a benchmark (in terms of false positive, false negatives, coverage), acknowledging that this would require to have a ‘normalized’ corpus and process to assess evaluation tools. One user highlighted that tools should be more explicit about their pricing options. One user would like to have more information about situations in which different tools return different results. One user mentioned that tools should mention the possible improvements. Another aspect mentioned by a user is that sometimes tools are too “code-based”.
Regarding the users’ ratings, as it can be seen from Table
4, since the median (Mdn) values are higher than mean (M) values, the data distribution is more concentrated on the right-side, corresponding to the higher scores. In addition, while the range (which gives a measurement of how spread out the entire data set is) is high (Min = 1, and Max = 5), the interquartile range (which gives the range of the middle half of a data set) is low (IQR = 1), which means that the middle half of the data shows little variability.