1 SpokenText Reader Prototype

The SpokenText Reader prototype is a hybrid iOS native application. The user interface was written in JavaScript and HTML which was then bundled together using Apache CordovaFootnote 1 and deployed to an iOS device.

SpokenText Reader allows users to listen to recorded audio, take notes and bookmark key points in the recording. If you leave a recording and reenter the recording it starts from where you last left off and it provides a form of annotation to “mark-up” audio files in a similar way to how people “mark-up” printed documents. Figure 1 shows how people who study with print, highlight, underline and write notes in the margins. SpokenText Reader provides similar abilities to people who study from audio recordings.

Fig. 1.
figure 1

Example of text being highlighted on a page while studying (Color figure online)

1.1 Prototype Screenshots

The following figures represent key screens of SpokenText Reader which were evaluated during the usability testing session (Fig. 2).

Fig. 2.
figure 2

Annotated player screen

Fig. 3.
figure 3

Left to right: library, player and add note screens

2 Usability Test

The SpokenText Reader prototype was evaluated using a usability test. The usability test consisted of three parts: a list of pretest questions, a list of tasks to be performed and a list of three post questions to gauge the participant’s overall impressions of the prototype based on their experiences during the test.

To find out if the tool meets basic Human Computer Interaction standards and to overcome the researcher’s own biases it needed to be tested by other users. After all, if the researcher designed around his own limitations, this could pose a problem, for other users. At the same time, being a member of the target population added an incredible advantage in that many designers lack a continuous user involvement at every step of the design process. This is something that most of the time is not possible. Therefore, to see if the advantages outnumber the biases, a usability test was necessary.

The usability test was conducted in a lab setting and was guided by the extensive guidelines for conducting a usability test outlined by Dumas and Redish [1], Nielsen [2, 3] and Rubin [4].

In the end 5 students completed the usability test which is not ideal. Nevertheless, given the small number of students attending Carleton University who have a visual or learning disability it is quite a good result and is still in keeping with Nielsen [5] who states that you can achieve valuable insights regarding a software products usability with just 5 participants, if they are used wisely.

During the usability test participants were asked questions and the prototype system was shown to each participant on an iPhone 6s Plus. The participant tried to complete the assigned tasks and their experiences, opinions and ideas for improving the prototype system were noted. Each was asked a series of questions after the test, with the aim of determining how usable they found the prototype system and if they would use it, if given the chance.

The sessions were both audio and video recorded to aid the researcher when trying to review results of the sessions and discover insights.

2.1 Tasks

Table 1 lists the tasks used during the usability testing sessions. The tasks were focused on the key screens of SpokenText Reader. The screen flow shown in Fig. 4 provides an overview of the flow between the parts of SpokenText Reader that were evaluated during the usability test. Figure 3 provides an overview of three key screens that were available to participants when taking part in the study.

Table 1. List of usability testing tasks
Fig. 4.
figure 4

Primary SpokenText reader screen flow diagram

2.2 Device Used for Testing

An iPhone 6 Plus running iOS 9 was used by all participants during the test. SpokenText Reader was installed on the device. Before the testing session, SpokenText Reader was launched and navigated to the Library screen.

This device was used for the testing since it was the one the researcher was most familiar with and used by most participants who owned a mobile phone. This is important as screen readers used by people with visual impairments vary between the various platforms, so it was important to use a device that could accommodate these participants and one for which they would also be quite familiar.

2.3 Testing Room Setup

The testing room was setup to be comfortable for participants and to allow the researcher to see the actions taken by each participant during the testing sessions. Additionally, it was designed to allow for high quality recordings of both audio and video to be captured. The 60-inch television was connected to a MacBook Air via an HDMI cable. This allowed the researcher to see in detail what any given participant was doing. The television and MacBook Air were angled as much as possible so that they were not in the direct line of sight of a participant. This was done to help mitigate any issue participants might have with seeing themselves while participating in the usability testing session, while at the same time accommodating for the researcher’s visual disability (researcher is legally blind).

The MacBook Air, captured the video feed coming from the USB webcam placed on the testing sled, which pointed at the iPhone 6s Plus screen and the second video feed coming from the USB webcam that was pointed at the participant and placed on the desk. The MacBook Air then recorded the whole screen using QuickTime’s built-in screen recording feature and sent a mirror copy of the full desktop to the 60-inch television via an HDMI cable.

A second HD video camera and iPhone 5s were used as a backup in case the main recording system malfunctioned. The HD video camera captured both audio and video, whereas the iPhone 5s just captured audio using the built-in voice recorder application.

The desk was positioned in the room in such a way as to ensure that the overhead lights did not reflect off the screen of the iPhone 6s Plus (Figs. 5, 6 and 7).

Fig. 5.
figure 5

Usability testing room setup for testing session

Fig. 6.
figure 6

Layout of equipment used for the usability testing sessions.

Fig. 7.
figure 7

Screen capture of recording being made of test session

2.4 Custom Testing Sled Designed for Testing with Visually Impaired People

To facilitate the testing of participants with visual disabilities, a custom usability testing sled was designed and developed. It allowed for high quality video and audio to be captured from the testing sessions, while still allowing users to act in as normal a way as possible, (the sled could be held close to the face without blocking the camera) no matter the visual acuity of a participant (Fig. 8).

Fig. 8.
figure 8

Sled used during testing.

3 Usability Test Findings

Usability test findings were analyzed using a time on task analysis and analyzing overall task completion rates.

The time on task analysis was performed by reviewing the video recorded for each participant session and using a stopwatch to track how long the participant took to complete the task. Only actual time on task values are shown. If a participant stopped while conducting a task to talk about something not related to performing the task, the stop watch was stopped until they stop talking about non-task related information. It was turned back on when they returned to performing the task they were initially asked to perform.

The process to analyze the task completion rates from the usability testing sessions was as follows. Each task was scored as a pass or fail and the totals for each task were added up and then divided by the total number of participants to get a total completion percentage for a specific task.

The following sections discuss the findings from the usability test and where appropriate discuss where these findings might contain potential biases, due to the limited sample size, participant mix or limitations of the prototype.

3.1 General Participant Demographics

In total five participants (Table 2) took part in the usability test, four females and one male. Three of the participants (P1, P2 and P4) had no vision or very limited vision, while the other two (P3 and P5) were learning disabled and had normal vision. All participants had extensive experience with mobile technology, except P1 who had very limited exposure to mobile technology and did not own a smartphone.

Table 2. Participants disability and experience with mobile technology

Figure 9 shows the age ranges of participants who took part in the study. All but one participant was between the age of 18 to 24. The remaining participant was between the age of 31 to 35. There were no participants between the age of 25 to 30 or 40 and over.

Fig. 9.
figure 9

Participant age ranges

Participants (Fig. 10) were split between the faculties of Public Affairs and the Faculty of Arts and Social Sciences, with 60% belonging to the Faculty of Public Affairs and 40% belonging to the Faculty of Arts and Social Sciences.

Fig. 10.
figure 10

Total participants by faculty

When it came to the current year of study for participants, 40% were in second year, while the remaining participants were in first, third or fourth year, Fig. 11.

Fig. 11.
figure 11

Current year of study

Regarding total course load per term (Fig. 12) only one participant took a full course load of five classes per term, with the rest preferring to take three or four classes per term.

Fig. 12.
figure 12

Participant course load per term

3.2 Time on Task

Figures 13 and 14 present the results of the time on task analysis. The data was split into two figures to help make it easier to comprehend due to the task completion times for P1 being so much longer than the rest of the participants.

Fig. 13.
figure 13

Time on task for P1 and P4

Fig. 14.
figure 14

Time on task for P2, P3 and P5

From Fig. 13 we can see that P1 and P4 struggled with a few of the tasks. Task 2, 6 and 8 proved to be a challenge for P1. Task 6, proved challenging for P4, along with task 10 which P4 failed to complete even after trying for 1 min and 8 s.

From Fig. 14 we can see that P2 took significantly longer to complete most of the tasks, especially tasks 3, 4, 6, 7, 9 and 10. Given that P2 is blind, was using a screen reader and could not just glance at the user interface to see all of the controls it contained and had to navigate control-by-control, this is to be expected. The expectation being that given time and the visual nature of how ‘Voice Over’ works P2 and all of the other participants (P1 and P4) using Voice Over, would remember the controls each screen of SpokenText Reader offered. Thus, they were able to tap on the control they wanted to activate instead of having to navigate, control by control to find the one they wanted.

Remembering a user interface is a common practice among users with low vision, since it is often much faster than having to put your face close to a screen to see it or navigate control by control when using a screen reader. There is no reason why, if they found SpokenText Reader useful they would not take the time to remember it as they do other applications they want to use. Perhaps this is something to investigate since time and practice may give a better indication of how these users can incorporate a new technology.

One final contributing factor to the slow times of P1, P2 and P4 who all used ‘Voice Over’ was the labeling issues of some of the controls. The controls for setting «In and Out» points for clip notes and the controls for «Playing and Saving» a clip note caused some problems. It may be due to how ‘Voice Over’ did not read the extra information added to the controls in the case of the set in point and out point buttons, this issue is discussed in more detail in a following section of this paper.

The following table shows time on task values by participant for all users in minutes and seconds. It is clear, that P1 struggled on many of the tasks taking minutes to complete tasks that other participants completed in seconds. As this participant, had neither used a smartphone before, nor used ‘Voice Over’ on a mobile device before attending the testing session, it makes sense that it took longer to complete each task. As they were not only learning how SpokenText Reader was structured, but also trying to learn how to use ‘Voice Over’ and learn the mobile user interface design patterns employed by SpokenText Reader (which can be quite different than interface patterns used for desktop applications) at the same time.

From Table 3 we can see that the total average time spent on any task was 24 s and that most tasks were completed in under 15 s when you exclude the values for P1.

Table 3. Time on task values by participant for all users in minutes and seconds

Table 4 shows how the times change when you remove the values for P1. Total average time spent on any task drops to 6 s from 24 s. It is worth noting that task 6 and 10 which took P4 significantly longer to complete, 1 min and 15 s for task 6 and one minute for task 10. However, even with this, the times registered were quite comparable, which is surprising given the fact that both P2 and P4 were only using ‘Voice Over’ to access SpokenText Reader and could not see the screen.

Table 4. Time on task values by participant excluding those for P1 in minutes and seconds

Reviewing Tables 5 and 6 where the times for failed tasks have been omitted, reveals that over all completion times for all tasks with all users changes only slightly from an average completion time of 24 s for all tasks to a time of 21 s when the failed tasks are removed. We see a similar small improvement when looking at task completion times with P1’s times removed. Thus, the overall task completion rate improves from 6 s to 5 s.

Table 5. Time on task values by participant for all users in minutes and seconds excluding times for failed tasks
Table 6. Time on task values by participant excluding failed tasks and times for P1 in minutes and seconds

Furthermore, where we see the biggest change is in the completion times for task 3, 7 and 10. Task 3 sees the overall average completion time fall from 16 s to 3 s. Task 7 completion times drop from 7 s to 3 s and task 10’s completion times drop from 18 s to 8 s.

Discussion

Time on task is a common analysis technique used in usability testing results. This study produced results that can indicate that even with the diverse nature of the test participants it is a useful measurement. The blind participants ranged greatly in their exposure and knowledge of mobile access technology and mobile technology in general, some demonstrated that they were expert users of mobile access technology and mobile technology in general and some had very little exposure to mobile access technology and mobile technology in general. All of which greatly influenced how long it took them to complete tasks. But if you exclude the results of P1 as was done with Tables 4 and 6 and just look at the times for the rest of the participants, for the most part their times are not all that different (all things being equal) and given time might improve greatly with more exposure to the application. This was a surprise to the researcher who thought it would not be the case when first setting out to conduct the time on task analysis.

If users of screen readers were also given more time to “look over a screen” before being asked a task they might have completed tasks in times even closer to their peers. Typically, this is not done in usability testing. But based on the researcher’s experience testing with blind people, limited as it is, it might be a best practice to let all blind users navigate around a screen using their screen reader for a period of time to let them “see it” before asking them questions, but more research would be needed to determine this for sure. It is a worthwhile idea given the slower speed of interaction provided to screen reader users verses participants who are fully sighted who have the ability to quickly visually scan a user interface to determine the affordances it offers but how this extra time would affect the testing results would need to be considered.

4 Results

Although not all participants were successful in completing all tasks presented to them during the usability test, an overall task completion rate of 92% is quite encouraging. There are a few points worth discussing in more detail that became known during the testing sessions.

4.1 Setting in and Out Points for Audio Clip Notes

Screen reader users did take longer to complete each task as expected, given that interacting with screen readers are inherently slower as they do not provide a means to glance at an interface. Instead, users need to navigate a screen, control by control inspecting each one to determine which one they want before taking action.

Over time, participants would typically learn where the controls are on the screen and their function as they created a mental model of how the application worked, but this takes time and exposure to an application and a willingness by the user to commit the interface to memory.

The large delay before ‘Voice Over’ reported the extra contextual information provided to blind participants as additional context regarding a control’s purpose, resulted in blind participants taking longer to familiarize themselves with the controls on the player page of SpokenText Reader. It could be argued, that it affected their mental model creation of how the application worked and what it was capable of doing. They could not quickly swipe left and right, but had to instead rest on a control for a few seconds to determine its purpose.

It was interesting that the sighted participants had no issues setting in and out points for a clip note. It seemed like they understood the visual relationship of how the clip note features were grouped vs the controls used to play and pause the recording.

The tasks to set in and out points for a clip note, were slower for users of Voice Over, since they had to navigate the user interface to find the controls affecting their time on task for task 6 and 7. The hybrid application also caused issues since it introduced a delay in speaking the extra contextual information only provided by ‘Voice Over’ after a 10 s delay. If the application had been a truly native iOS application, there would have been no delay in speaking the additional contextual information only intended for ‘Voice Over’ users.

‘Voice Over’ users initially move so fast they never heard the extra information added to these controls until they slowed down. Often navigating back and forward over all the controls until they happened to rest on one long enough for the extra information to be reported. The learning disabled participants had no problems finding the controls needed to set in and out points and set the in and out points with ease. Since they could see the visual relationship between the various groups of controls presented on the screen, where the ‘Voice Over’ users were getting confused between the controls to play recordings and those used to set clip in and out points. This issue, was exacerbated by the delay in speaking the contextual information meant to provide the clip in and out points with a more descriptive label than just in or out. It should have reported “Set clip in point” and “Set clip out point” respectively.

4.2 Control Labels Causing Confusion

Having two buttons labeled «Play» confused ‘Voice Over’ users, which is understandable. The main button to play or pause the recording reported as play when navigated to by ‘Voice Over’ and so did the «Play Clip» button. The Play clip button should have been reported as «Play Clip» when navigated to by ‘Voice Over’ and not just «Play». The difference between the two buttons is visually clear but not clear from the audio stream presented to ‘Voice Over’ users.

4.3 Issues with Using ‘Voice Over’ with Hybrid Applications

‘Voice Over’ reported table and ARIA information that was not relevant for a native application, but intended for web pages only. This confused the blind participants who had previous experience with ‘Voice Over’ as it was not expected since they expected to be in a native application and have ‘Voice Over’ behave as such. In addition, delays in reporting title text placed on elements caused participants to misunderstand what some controls did.

If the same interface design was delivered using a true native iOS application the aforementioned issues would not be a problem anymore.

4.4 Challenges with the Swipe Me Controller

The «Swipe Me» controller was of no use to blind participants since ‘Voice Over’ takes control over all possible swipe gestures thus making the control nonfunctional.

It might be possible to use 3D touch to implement a similar feature but more research is needed to determine if this is the case or not. Maybe, if a user was using ‘Voice Over’ and the swipe me controller received focus, they could swipe up and down to rewind and double tap to play and pause the recording. This functionality would only be provided to users of ‘Voice Over’.

4.5 Challenges with the Sled

The testing sled worked well. Participants were offered to hold the sled if they wanted to better simulate the typical way they would use their smartphones, but most chose to leave it on the desk. This might be due to the large size of the iPhone 6S Plus and its 5.5-inch screen, which is almost as large as a small tablet.

Even with them leaving the sled on the desk, there was no evidence that the use of the sled negatively affected the testing in any way.

5 Future Work

This study testing a hybrid application and one approach to labeling form controls for use by ‘Voice Over’ demonstrated that there can be issues with ‘Voice Over’ and hybrid applications, but there are other methods which could be used to give the controls context, for example hidden text or ARIA attributes could be used instead. It would be worthwhile to try a few of the different approaches available for giving the HTML user interface controls context and in so doing, determine if one of these alternative approaches would resolve the issues found during this test.

Additionally, the application could be compiled to run on an Android based device, where it could be tested to determine whether or not, the issues found in iOS are also present on Android when using its built-in screen reader.

6 Conclusions

Hybrid mobile applications offer many benefits in terms of speed of development and ease of iteration, but they do present accessibility barriers to users of ‘Voice Over’ in iOS.

With the Hybrid application using a Web View to render the HTML and JavaScript used to define the applications user interface ‘Voice Over’ interprets your application as being a web site and as such speaks controls in a manner that would be appropriate for a web site, but this might confuse users of your application who expect it to speak like a true native application when using ‘Voice Over’. In addition to this, how web pages are spoken by ‘Voice Over’ has changed overtime as newer versions of iOS are released by Apple, resulting in techniques that once worked to provide context to form controls no longer functioning.

For all the above it is recommended by this author that you consider developing your mobile application using native user interfaces and programing languages whenever possible. This will ensure that the form controls used within your application are consistently spoken in the manner that you defined when you developed and released your application to its intended user community.