Keywords

1 Introduction

An estimated 28 million people in the U.S. are Deaf or Hard of Hearing (DHH) [26, 30]. While there are many DHH individuals with excellent literacy skills, research has shown that there is great diversity in the reading skills among the DHH community, and many DHH individuals exhibit lower English language literacy levels compared to their hearing peers [39]. Prior literacy research with DHH readers has found that some have effortful word recognition, lower vocabulary, slower reading rate, more limited repertoire of comprehension strategies, and some avoid reading activities [24, 27, 28, 31, 37]. When performing question-answering tasks based on a text, some DHH students rely on basic visual matching of text with words in the questions, rather than considering the meaning of the text [3, 40, 41].

Since most web content is in the form of written text, low literacy can be a barrier for information access for DHH users. Bilal and Kirby [4] found that DHH students often experience difficulties when using internet search engines. Smith [35] observed the internet search behaviors of deaf adolescents who used search engines to complete fact-based tasks, including: search query formation and modification, website identification, and selection. The researchers concluded that deaf adolescents had difficulty initiating, conducting, or validating effective internet searches in response to fact based search tasks.

For over 500,000 people in the U.S., American Sign Language (ASL) is their primary means of communication [30]: Individuals with lower English fluency may benefit from tools that can convey content in the form of ASL. Some researchers have developed websites or software that include ASL video [5, 6, 12, 15, 36] or animations [1, 2, 17, 19, 21, 22, 25] to address the issue of making the information on sites more accessible. Most of these projects address the issue of providing sign language translations for textual information or enabling website designers to author content in ASL. However, there has been relatively little research on the use of ASL video to supplement the textual content on web sites. In this study, we investigate providing support in the form of ASL videos that provide single-word translations of individual English words in a text. Such a tool may assist readers understand a text, and it may also improve reading vocabulary skills.

1.1 Our Approach

After examining prior research on the design and development of bilingual interfaces, we constructed design guidelines for interfaces that include text and sign language. Based on this, we implemented a web plug-in allows users to click on English words (that the user does not understand) in the text to see ASL video translations (of a single ASL sign) for that word. A user study was also conducted with DHH students to evaluate the system’s usability, measure users’ preferences regarding the tool, and to observe whether its use offered an improvement in reading comprehension.

1.2 Prior Work

Many researchers have studied methods for improving website accessibility for DHH users. For instance, Fajarado et al. [9] explored the effect of substituting graphical items (clip art images) in place of textual links on webpages during an information retrieval task, but their users (both deaf and hearing) were more successful when using the ordinary text-hypertext interface, rather than in the novel graphical one. Other researchers have examined methods for incorporating sign language into computer interfaces for DHH users [14, 16, 18], especially in the education domain, e.g., reading/writing software for deaf children [15] or e-learning systems accessible for DHH users [5, 7].

Sign-Language Only Interfaces.

To produce interfaces that avoid English text use, researchers in [12, 34] explored a hyperlinked video system called “sign-linking,” consisting of linking mechanisms within a video itself to take users to other sections of the video or to other pages. Fajarado et al. [10] found that sign language videos added to text hyperlinks improved web search efficiency for DHH users, and Reis et al. [33] are designing an ASL-only learning resource for STEM topics.

Combining Written Language and Sign Language Videos on Websites.

Debevc et al. [6] embedded interactive elements in the text of a web interface that triggered video translations in sign language, and they found that providing ASL video translations of phrases or sentences increased users’ interest in the content of the material. Similarly, Straetz et al. [36] presented an e-learning system for deaf adults who wanted to maintain and improve their math and reading/writing skills; along with text, it displayed German Sign Language videos. Our word-lookup plug-in is similar in premise to these prior projects, yet readers should note that the researchers in [6, 36] prepared all sign-language videos in advance since it is well-beyond the state-of-the-art of machine translation systems to produce automatic sign-language translations of written text. In contrast to [6, 36], we investigate a system that provides single-word translations of individual English words into videos of individual ASL words, which is much closer to being practical with current computational linguistic technology.

Combining Written Language and Sign Language Animations on Websites.

Researchers have also investigated the use of sign language animations (computer generated avatars) to make web content or software more accessible to sign language users, e.g. [1, 2, 8]. Kennaway et al. [25] described their eSign system for generating sign language animations to accompany text, to provide access to critical web information content, such as government websites. In their system, the avatar animation was displayed on the body of the page, in the frame of the window, or as a separate pop-up.

Providing ASL Tooltips on Demand.

Other researchers have investigated rapid lookup of English words in ASL dictionaries. Petrie et al. [32] investigated four types of tooltips (small windows that appear when the user hovers their mouse over words) for DHH users, containing: (1) sign language, (2) pictures, (3) video of a human mouth speaking the word, or (4) digital lips. Users preferred the sign-language or picture versions. Jones et al. [23] studied a mobile phone app to enable DHH children to look up ASL definitions by photographing printed English words using the device’s camera.

Evaluating Efficacy of Sign Language Interfaces.

Some researchers have tried to measure how bilingual web content influences signers’ comprehension of hypertext content. Gentry et al. [13] conducted an experiment in which deaf participants were presented stories in four different formats: printed (text), text with pictures, text with sign language, and text with pictures and sign language. The use of signs improved performance compared to the text-only condition but not to the text with pictures condition, which yielded the highest scores. In prior work, our laboratory has presented methods for effectively measuring comprehension among DHH study participants, through the use of comprehension questions about information content [20].

Summary of Prior Work.

While there has been limited work on providing sign language only interfaces on websites (due to the challenges of providing hyperlinks and other web- page structural content using a dynamic information channel like video), there has been significantly more research on finding new methods for blending written language content with sign language (either through video or animation). Further, there has been prior work investigating how pop-up tooltips could provide information in ASL on demand. While prior work has investigated the creation of easier methods for looking up ASL words in a dictionary, the specific use of pop-up tools to provide ASL translations of individual English words in web content has not previously been investigated through an experimental evaluation with DHH users.

2 Research Questions and Methods

In this study, we compare participants’ subjective preferences and reading comprehension scores when using three versions of a website containing text content:

  1. 1.

    Normal: a version containing English text without any augmentations,

  2. 2.

    Dictionary: a version that enables users to click on words to see an English dictionary definition (using Google Dictionary, a plug-in for Google Chrome browsers, provided definitions for selected English words in a pop-up box, see Fig. 1), and

    Fig. 1.
    figure 1

    On left, a screen image from the Dictionary condition in the study, which displays a pop-up English definition when an English word is clicked; on right, a screen image from our ASLPopup condition, providing pop-up ASL videos on demand, when a word is clicked.

  3. 3.

    ASLPopup: a version that allows users to click on a word to see a small video pop-up window appear in which a human performs an ASL sign that is a single-word translation of the selected English word (as shown on the right side of Fig. 1).

We investigated the following research questions (RQ1-RQ4), as DHH students were asked to use this system to perform a reading comprehension task:

  1. 1.

    Do DHH students show a subjective preference for having support (Dictionary or ASLPopup), as compared to no support (Normal)?

  2. 2.

    Do DHH students show a subjective preference for ASL video in the interface (ASLPopup), as opposed to having support from written definitions (Dictionary)?

  3. 3.

    Do DHH students have better performance in answering comprehension questions when provided with support (Dictionary or ASLPopup), compared to Normal?

  4. 4.

    Do DHH students show an improved performance in answering comprehension questions when provided with ASL video (ASLPopup), compared to Dictionary?

2.1 Prototype System Design and Implementation

To guide the design of our prototype, we consulted prior work on the design of bilingual interfaces for DHH users containing both text and sign language. For instance, DHH users of educational systems indicated a preference for videos of sign language, which would appear on demand [6, 36]. In addition, researchers in [36] found that providing sign language on-demand made content accessible to DHH users, without interfering with the usability of the system for hearing users.

Fajarado et al. [11] recommended that when providing sign language video translations of text content, designers should use pop-up windows with videos embedded in the same page as the text. This recommendation follows the spatial contiguity principle in the multimedia learning literature [29], i.e. user’s understanding of the message transmitted through words and corresponding pictures will increase when they are presented near each other. However, providing multiple sources of the same information may be in conflict with the redundancy principle [38], which argues that users’ comprehension is hampered if a multimedia system includes different sources for the same information. In addition, Richards et al. [34] argues that asking users to split their attention across different sources would create literacy barriers due to the continuous switching between their first and second languages. Both concerns can be mitigated if the interface allows the user to access the multimedia content on request only, thereby requiring the user’s attention on only one part of the interface at a given moment.

Users of a sign-language animation system for websites in [25] had a preference for seeing both written text as well as signing. Participants in that study preferred animation display windows that appeared in a fixed location, rather than pop-ups, which sometimes blocked the text below. Of course, users in that study were watching sign language content with a long duration, rather than short definitions. This concern could be further mitigated by positioning a pop-up so that it does not block the text below.

Based on this prior work, we implemented a webpage for displaying text content (using HTML5, CSS3, JavaScript, and JQuery). To indicate words that users could click, a blue color font was used (and an underline appears below the word when the mouse hovers over it). A wide margin was used on the right side of the page (approximately 25% width) so that if a word is clicked, a pop-up window can appear on the right side (without blocking the text) to convey the video. When a word is clicked, it appears with highlighting background color, as shown in Fig. 1. The pop-up appears at the same vertical height as the line of text containing the word that was clicked. The window contains a single video, with media controls including a play button, and forward/backward frame controls (which were useful when watching a short video one frame at a time). The popup also contained a close button on the top-right corner. ASL videos were provided by the National Technical Institute for the Deaf (NTID) ASL Video dictionary and Inflection GuideFootnote 1 from Rochester Institute of Technology (RIT). That resource contains flash videos of 2700 ASL signs corresponding to English words.

We had to select which English words in the text should be highlighted to indicate that an ASL video was available for that word. While it is within the state-of-the-art of computational linguistic techniques to morphologically analyze words (e.g. matching the dictionary entry for “absorb” with words like “absorbed” or “absorbing”) or disambiguate words with multiple meanings based on their surrounding context (e.g. “can” could indicate a container or “to be able”), it was not the focus of our project to fully implement an automatic system. Instead, we were interested in how users would respond to a system like this. For this reason, we employed a Wizard-of-Oz approach, in which an ASL expert identified the appropriate ASL signs from the dictionary for individual English words in the specific texts that would be displayed during our user study. The three passages used in this study were selected from Graduate Record Exam 6 online practice resourcesFootnote 2, with 5 comprehension questions for each text passage. Each question was multiple choice, with a single correct answer, and the expert also identified words appearing in the questions or their answer choices that could be highlighted. In total, the expert identified approximately 51.5% of the English words in the text as corresponding to videos that we could display from the NTID dictionary.

2.2 User Study

To recruit participants for a user study comparing the three conditions (Normal, Dictionary, and ASLPopup), we posted flyers throughout the Rochester Institute of Technology campus asking potential participants if they were Deaf or Hard of Hearing and if they used ASL. Before beginning the study, participants were asked to provide some basic demographic and background information. A total of 18 participants (6 male, 12 female) were recruited, between ages 20–29. All of the participants reported having become DHH by the age of 3. Of these, 10 reported as having learned ASL by age 5, while 6 learned ASL between ages 10–18. Aside from 2 participants, all reported having used ASL at the elementary/secondary school level, and 14 participants reported that they also used English while communicating at work/school. All of the participants had experience in reading English content on a website while browsing from a computer.

Each of our three text passages was prepared in the three conditions (Normal, Dictionary, and ASLPopup), and the conditions were assigned to each passage using a Latin square schedule, and the presentation order of each passage was counterbalanced. Thus, each participant completed a total of 3 reading comprehension tasks – a participant saw each text passages only once, in one of the conditions. At the top of each webpage, brief instructions indicated if the user could click on words of the text, e.g. “On this page, you can click on highlighted words to see ASL videos.” In addition, the experimenter briefly demonstrated how to click on a word before allowing the participant to begin reading the text and answering the five questions below each passage. Participants wrote their answers to each question on a paper sheet provided to them. After completing the comprehension tasks, participants assigned preference scores to each of the three conditions encountered, on a 0-to-10 scale (10 = high preference).

2.3 Results for Each Research Question

One participant indicated that she did not make use of the pop-up videos in the ASLPopup condition, and she therefore refrained from providing a preference score for that condition. This participant was excluded from the analysis for RQ1-RQ2. Median 1-to-10 preference scores for the Normal condition was 6, Dictionary was 8, and ASLPopup was 9. The distributions in the three groups differed significantly (Friedman test p < 0.001); post-hoc Wilcoxon pairwise tests revealed significant differences (p < 0.05) between: Normal vs. Dictionary and Normal vs. ASLPopup.

RQ1 considered whether DHH users showed a subjective preference for having Support tools (Dictionary or ASLPopup) in the interface, as opposed to not having any support (Normal). Based on the result of the pairwise tests, we see that participants preferred each of the support conditions to the Normal condition.

RQ2 considered whether DHH users showed a subjective preference for having the ASLPopup support tool in the interface, as opposed to Dictionary support. Based on the results above, distributions in these two groups did not differ significantly. No difference in preference scores was observed between Dictionary and ASLPopup.

Participant’s accuracy on reading comprehension questions was analyzed to answer RQ3 and RQ4, which considered participant performances across the three different conditions. A Shapiro-Wilks test for normalcy was performed on the collected reading comprehension data, which concluded that the data was not normally distributed (Shapiro-Wilks W = 0.87478, Wcritical = 0.95688, p < 0.05). Thus, the Kruskal-Wallis non-parametric test was performed to analyze the comprehension performance data. RQ3 considered whether DHH users showed an improved performance in answering comprehension questions when provided Support tools (Dictionary or ASLPopup), as opposed to not having any support (Normal). Similarly, RQ4 considered whether participants showed an improved performance in answering comprehension questions when provided with the ASLPopup tool as opposed to while using the Dictionary tool. The Kruskal-Wallis test revealed no significant differences between groups (Kruskal-Wallis k = 3, H = 1.709, p-value > 0.05) when we consider comprehension scores.

3 Conclusions, Limitations, and Future Work

By providing sign language support in the form of ASL videos for a web interface, this project investigated methods to assist DHH users to better understand information presented as English text on websites. Our intention was to create a tool that could enable DHH users with limited English literacy skills to better understand English words in a text by identifying ASL video translations for unfamiliar English terms. This tool did not attempt to provide English-to-ASL translations of full sentences, but instead allowed users to view ASL signs of individual words. Insights and recommendations from prior relevant work informed our design of this tool, which made use of the NTID ASL Video Dictionary and Inflection Guide resource to display videos of ASL signs when the user clicked on an English word.

Our user study demonstrated that DHH users preferred to use the ASLPopup tool for sign language support, as compared to having no support provided (RQ1), and users also preferred the Google Dictionary tool for word definitions, as compared to having no support tool (RQ1). Prior research in the field has shown that DHH signers perform significantly better when provided with graphical cues or sign language support while dealing with written text. However, in this study, we did not observe any significant difference in user preferences ASLPopup and Dictionary (RQ2). We speculate that our participants recruited on a university campus may have higher levels of English literacy than the target users of a system for providing ASL translations of English words. Thus, having support in the form of ASL did not provide a significant benefit. Similarly, no significant difference was observed in participants’ performance in answering reading comprehension questions across the three conditions presented (RQ3 & RQ4). Prior work had suggested that providing DHH readers with English text that is augmented with sign language or graphical content could improve comprehension [11]; we speculate that our study may have been underpowered (too few participants) to measure whether the provision of an ASLPopup tool provided any comprehension benefit.

3.1 Limitations

A possible limitation of our study design is that our reading comprehension tests were taken from Graduate Record Exam (GRE) resources, which have a high difficulty level. We speculate that the high difficulty level of the questions could have masked any influence on the participant performance scores. In particular, when designing our reading passages for this study, we found it challenging to tailor the reading difficult level of our texts to the skill level of the participants we might recruit on the university campus. Furthermore, although the NTID dictionary resource was extremely valuable for our project, it does not contain all possible ASL signs (only 2,700). Thus, it was challenging to identify English text passages that would have a high match-rate with our video dictionary resource, while at the same time being sufficiently difficult for our participants, to encourage them to make use of the word-lookup features.

Another limitation is that we asked an ASL expert to identify these English words in our text – rather than using an automatic word matching tool. Thus, our study reflects a system with a level of accuracy that may be beyond the state-of-the-art of computational linguistic tools for disambiguating the senses of individual words and accurately morphologically analyzing those words to find word matches in a dictionary.

In informal feedback comments provided by participants about their experiences with the systems, two participants provided mixed feedback regarding the ASLPopup tool. While the videos provided for some complicated words were “ok,” one participant did not like amount of ASL video options that were available and felt that there were too many word on the screen that were highlighted in a blue color as click-able. Another participant commented that although she appreciated the concept of having an ASL support tool, she did not require its use during the study (since she was able to understand all of the English words on the screen).

3.2 Future Work

In future work, to address the limitation above, we intend to conduct a follow-up study using text passages at a variety of English difficulty levels and by recruiting participants with greater variation in their English literacy skill – to better explore the space of text complexity and user’s literacy skills – since it is likely that tools such as the ASLPopup plugin may have particular benefits for DHH users who encounter a text includes some words that are just beyond their reading skill level.

In addition, based on user comments in our initial study, we may explore variations in the percentage of words that are marked as click-able when the text is displayed. It may be the case that too many words were visually highlighted in the current version of the system, including may easy-to-read words whose click-ability did not provide benefits to users.

Our current study used an ASL expert to label words that correspond to items in the video dictionary resource; in future work, we may experiment with implementing an automatic tool for matching English words to ASL dictionary items. Additional user studies would be necessary to determine whether errors in the output of an automatic system influence how users respond to a system like this, e.g. if an English word “right” is being used in a sentence to convey “correct” but the automatic system links it to an ASL video for the “direction opposite left.”

Future studies could also investigate our speculation that an ASL video popup tool might also have an educational benefit by introducing DHH readers to additional English vocabulary terms as they read. Another potential benefit of this tool is that it may be interest to hearing students who are learning ASL, since it could further expose them to additional ASL vocabulary items. A follow-up study with hearing ASL students could investigate this additional application.