1 Introduction

Commercially off-the-shelf (COTS) quantified-self wearable devices such as smartwatches, pedometers, and associated applications are seen as a potential medium to: (i) support health self-management among older populations [16]; and (ii) improve physical activities through “Quantified Self” [50]. Despite the potential, compared with their younger counterparts, many older adults have challenges in adopting such wearable device categories [7]. Previous researchers and practitioners have identified that such challenges are due to: (i) usability issues related to complex interfaces and extensive functionalities [18] that have not been designed to suit them [31]; and (ii) age-related changes in cognitive and physical capabilities [39]. Tedesco et al. [51] state, “wearable technologies are mainly designed to attract a young, sporty and technical affine group of adults.” This is a setback facing the older adults when seeking to take advantage of wearables.

To offset such challenges, research has been emerging on identification, evaluation, and analysis of usability issues faced by the older population while using smartwatches and pedometers [9, 22, 43]. Researchers have identified usability issues including button size, screen size, interaction with the screen, iconography, battery, reliability, and accuracy [39, 40, 43]. However, these previous lack an overall framework to provide a comprehensive set of identified usability issues related to specific wearable device categories. This can keep researchers, industry manufacturers, and wearable application developers from understanding most important usability issues that need to be rectified in order to improve adoption of smartwatches and pedometers by the older adults.

While previous studies (see Table 1) have provided insight into various aspects of the usability issues related to wearable devices that the older adults face, they do not directly answer our research questions.

Table 1. Usability issues identified by previous researchers.

RQ1: What usability issues, related to device characteristics of smartwatches and pedometers, can obstruct the motivation of the older adults to adopt these devices, and how have the issues been categorized? Rationale: Identify the range of usability issues of each device category that affect adoption. This enables creation of an overall framework to provide a comprehensive set of usability issues for each device category.

RQ2: What usability issues, related to device characteristics, have a sizable impact on usability needs for smartwatches and pedometers and thus warrant immediate prioritization by technology designers, the research community, and application developers? Rationale: Prioritize the predominant usability issues that need immediate potential solutions to improve adoption of smartwatches and pedometers among older adults.

The aim of this study is therefore to: (i) explore the usability issues of specific wearable device categories, i.e. smartwatches and pedometers, by reviewing the literature and applying Contextual Action Theory (CAT) [49], and the Usability Evaluation Method [23] to this study’s set of usability experiments among older adults users; and (ii) empirically validate quantitative data gathered from older participants in order to prioritize the predominant usability issues of each device category requiring immediate potential solution. The presented framework and empirically validated result may be valuable for researchers, industry manufacturers, and wearable application developers to improve smartwatches and pedometers for the older adults.

2 Related Work

In order to answer the above research questions, this section details previously identified usability issues faced by older populations while using COTS wearable devices. Table 1 summarizes recent literature on usability issues associated with wearable devices and their associated applications.

3 Research Design Process

To answer the research questions, we propose a two-stage research process (see Fig. 1) to measure the issue variables and compare their influence on motivation to adopt smartwatches and pedometers. This study was conducted in two stages, namely Identifying and Prioritizing (see Fig. 1). During the identifying stage, we performed a usability evaluation of devices with older participants to determine perceived usability issues and to formulate a usability categorization framework based on identified issues, whereas in the prioritizing stage, we collected and organized the predominant usability issues into a categorization framework.

Fig. 1.
figure 1

Flow diagram of research process

3.1 Identifying Stage

The main purpose of this stage was to identify the number of times that usability issues during usability evaluation of devices with older participant throughout the study. First, a general presentation and requirements for participation were provided to participants, followed by a recruitment form, which collected preliminary participants’ information such as age, technological knowledge, current use of external devices like smartphones, and consent. Alshamari and Mayhew [2] suggest performing usability tests so that participants can be classified based on their level of systems experience, other individual characteristics. Black [5] point out, “Ideally participants should fall in the middle of the qualification spectrum to ensure that the tests do not result in excessive false positives or false negatives” (p.7). Based on this suggestion, data obtained from the recruitment form was analyzed and used to select participants for the evaluation study. All participants in the study were presented with an ethical review statement and aspects of informed consent (i.e. participants’ right to confidentiality, risks, data storage, the use of anonymized data, the voluntary nature of participation, that no health-related data would be collected), and in turn signed consent forms were obtained. The ethical committees of the Lappeenranta University of Technology and California State University, Long Beach, approved the study.

Thirty-three older participants from Finland and the U.S., with a mean age of 62.46 years (SD = 2.295), were voluntarily recruited to participate in the usability test sessions. This sample size is sufficient based on the recommendation [33] that a group size of 3–20 participants is typically valid, with 5–10 participants demonstrating a sensible baseline range. Participants from both countries were living independently and were interested in using new technology to improve their well-being [27]. The contextual action theory (CAT) explained by Stanton [49] and Usability Evaluation Method (UEM) presented by Ivory and Hearst [23] were applied as foundational methodologies in evaluating COTS wearable devices in this study’s set of usability experiments among older users. Stanton et al. [49] states that CAT explains human actions in terms of coping with technology within a particular context and that five phases are associated with contextual actions.

First phase:

Presentation of actual demands and actual resources to participants, consisting of the device, the tasks to be performed on the device, environmental constraints (e.g. time), and so on. Firstly, participants were presented with functioning wearable COTS devices, i.e. smartwatches and pedometers, to help explore the significance of various types of data for future design, as pointed out by Kanis [26]. No requirements were provided for device selection. Secondly, participants were presented with several experimental tasks (See Appendix AFootnote 1 for presented experimental tasks) along with a timeframe, namely two one-hour, controlled environment sessions (i.e. the first and final meetings). As stated by [11], “the idea of momentary memory implies that we don’t store our experiences in perfect experimental and temporal fidelity, rather memories are formed from snapshots of the representative moments in an experience” (p. 90). Therefore, participants were asked to use each category of device under real conditions every day for the two weeks between the meet-up sessions (i.e. in a semi-controlled-environment) and to capture the usage in a daily log using the diary method. No specific pre-defined activities such as put on/take off, charge, walk, eat, rest, sleep, or exercise [24] were specified. Participants were requested to return to another one-hour, controlled environment session to return the device and test usability. Finally, participants were told that upon completing the semi-controlled usability evaluation, they would be asked to respond to a survey.

Second phase:

Appraisal of those demands and resources by participants. As stated by [25], the primary appraisal of an interaction event can result in a negative emotional response such as anxiety or frustration. To reduce such negative emotional response from participants, participants were asked to appraise the demands and resources presented during the first phase, so that their stated perception might help to redirect negative emotional response away from the experiment itself [21].

Third Phase:

Comparison of perceived resources with perceived demands. In this phase, participants were asked to compare their own perceived resources with perceived demands to determine any imbalance related to the specific properties of smartwatches and pedometers, which could affect participation in the study [49].

Fourth Phase:

Possible degradation of pathways. Participant appraisal and comparison may reflect the potential for degradation of pathways, i.e. emotional responses and behavioral responses. Such emotional responses may include decreases in user satisfaction and motivation, while potential behavioral responses include an increase in errors and inefficiency.

Fifth Phase:

Appraisal of the effects of these responses on device usage. The effects of these responses on participant interaction with the devices were gathered through the daily log, which included several kinds of measurements.

Measurements of identifying usability issues

The search strings “usability issue*”,“smartwatch*”, “pedometer*”, and “wearable*” were conducted utilizing the digital databases IEEE Xplore, the ACM Digital Library, Science Direct, and Web of Science. After refining the results from the digital databases, the final lists of usability issues were derived from [3, 16, 37, 39, 40, 43, 48].

Participants were asked to keep a diary of their experiences. The diary included several kinds of data, such as: (i) whether devices were worn (if not, why); (ii) which activities were undertaken (e.g. walking, hiking, running, cycling, etc.); (iii) whether device use motivated physical activity (and why/why not); (iv) which applications were used (if not used, why); (v) usability issues (e.g. screen size, icons, interaction techniques, tap detection, font size, button location, data accuracy, screen resolution, device weight, device shape, device size, lack of screen, battery life, and the option to add any missing usability issues); and (vi) additional comments.

For the purpose of analysis, the usability issues for both device categories (smartwatches and pedometers) have been categorized into two components: hardware and user interface. Specifically, the hardware concerns involve issues related to external look and feel and to internal components such as sensors, processor, memory, power supply, and transceiver [1, 32]. User interface involves issues with various parts through which users interact with the device [1]. Furthermore, the user interface component has been sub-categorized into input and navigation mechanism, based on the work of [1].

The first set of data gathered from the first and final meet up sessions was analyzed using the instant data analysis technique proposed by [29]. The qualitative data obtained from diaries were analysed based on the data analysis framework presented by [10], “which offers an eclectic approach for qualitative diary data analysis” (p. 1514).

The final data set of identifying stage derived from (i) the first and final meet-up sessions and (ii) four weeks of daily logs by the older participants data. The analysis was done using a Microsoft Excel spreadsheet, wherein reported usability issues were assigned (1) to understand the number of times they were reported by participants during the entire evaluation period. This analysis enabled understanding of the breadth and occurrence of reported usability issues in order to find out the most frequent usability issues that could be used as the basis for quantitative analysis.

3.2 Prioritizing Stage

The main purpose of this stage was to collect quantitative data from the participants using an immediate prioritization scale. This study’s immediate prioritization scale utilized most usability issues reported by older participants during (i) the usability test sessions (first and final meet-ups) and (ii) four weeks of participants’ daily logs. In a survey, participants were asked to rate on a 7 Likert scale (0 = strongly agree to 7 = strongly disagree) how much the identified usability issues correspond with the motivation to adopt. Qualitative data from the survey was analyzed separately in an Excel spreadsheet, using the statistical data analysis language R and the descriptive statistical analysis functions available in R core [42] and the psych library [45].

Data analysis was performed with multiple linear regression [12] in order to test hypotheses to see which variables most influenced the motivation to use the devices. Multiple linear regression modeling was performed using the R core statistics library [42], following the methodological guidelines set out by Weisberg [55, 56] and Laerd Research [30]. Additional multiple linear regression diagnostics were performed using the following R libraries: mctest (multicollinearity diagnostics) [52], MASS (standardized residuals) [53], car (Durbin-Watson Test, outlier testing, Spread-Level and QQ plots) [17], and lmtest (Breusch-Pagan test) [59].

3.3 Results

After analyzing sets of data from the identifying stage (i.e. the first and final meet-up sessions and four weeks of daily logs by the older participants data), we identified 13 usability issues common to pedometers and smartwatches and categorized them into a framework of hardware or user interface related issues (see Fig. 2), with the lack of screen being the only additional issue unique to pedometers. Interaction techniques were a multi-faceted category under user interface. Participants reported that interaction techniques can cause usability issues despite their intended functions of providing feedback to the user that can be perceived without continuous visual attention [19] and engaging users through quantitative or qualitative understanding of underlying data [6] through notification. For example, in this study participants reported usability issues caused by interaction technique sub-categories of both feedback (tactile and kinesthetic) and notification. In addition, on both smartwatches and pedometers, older participants reported issues with data accuracy and connectivity as sub-categories under hardware sensor issues, which was in line with previous research [38, 43].

Fig. 2.
figure 2

Categorization framework of usability issues of pedometers and smartwatches identified from the identifying stage of this study.

To understand the important usability issues, we further analyzed the data based on number of times usability issues were reported by the participants during the entire evaluation period. Figures 3, 4, 5 and 6 show the mean and standard deviations of the scores (frequency) of the usability issues related to hardware and user interface and its sub components for both smartwatches and pedometers. This outcome indicates that, screen size, interaction techniques (i.e. feedback and notifications), font size, tap detection, and button location were the most influencing.

Fig. 3.
figure 3

Descriptive analyses of usability issues related to hardware and user interface for smartwatches

Fig. 4.
figure 4

Descriptive analyses of usability issues related to sub components of hardware and user interface for smartwatches

Fig. 5.
figure 5

Descriptive analyses of usability issues related to hardware and user interface for pedometers

Fig. 6.
figure 6

Descriptive analyses of usability issues related to hardware and user interface and its sub components for smartwatches

Therefore, we focus on screen size, typography (i.e. font size), tap detection, and interaction techniques (i.e. feedback and notifications) and button location to validate and enhance our understanding of the most frequent issues with device characteristics. It is in this way that we pursue our proposed process (see Fig. 1) to measure the issue variables and compare their influence on motivation to adopt smartwatches and pedometers. The following section presents the variables used in the statistical research model and hypotheses formulated based on the variables.

3.4 Validity of the Measurement

The hypotheses were tested by creating multiple linear regression models from the issue variables that were most frequently cited as affecting usability. Separate models were created for smartwatches and pedometers. First, a multiple regression was run to predict motivation to adopt smartwatches from screen size, font size, wrist feedback, finger feedback, touch controls, interrupting distractions, and button location perspectives. A second multiple regression was run with those same variables in order to predict motivation to adopt pedometers. In both models there was linearity, as assessed by a plot of studentized residuals against the predicted values. There was independence of residuals, as assessed by a Durbin-Watson statistic of 1.86 in the first model and 2.48 in the second model. There was homoscedasticity, as assessed by visual inspection of a plot of studentized residuals versus unstandardized predicted values and the studentized Breusch-Pagan test. There was no evidence of multicollinearity, as assessed by tolerance values greater than 0.1 and VIF testing. There were no studentized deleted residuals greater than ± 3 standard deviations, outlying leverage values, or values for Cook’s distance above 1. In the second model two outliers were removed as guided by the regression model diagnostics. The assumption of normality was met in both models, as assessed by a Q-Q Plot.

The first multiple regression model statistically significantly predicted motivation to adopt smartwatches, F (7, 23) = 3.733, p < .01, adj. R2 = .39. Some variables added statistically significantly to the prediction, confirming part of the hypotheses. Regression coefficients and standard errors can be found in Table 2.

Table 2. Multiple Linear regression result for smartwatches

The second multiple regression model statistically significantly predicted motivation to adopt pedometers, F (7, 23) = 3.74, p < .01, adj. R2 = .39. Some variables added statistically significantly to the prediction, confirming part of the hypotheses. Regression coefficients and standard errors can be found in Table 3.

Table 3. Multiple Linear regression result for pedometers

4 Discussion

The focus of this section is to discuss the results obtained during the identifying and prioritizing stages of this study, based on interpretation and exploration of the retrieved data. The categorization framework explains there are not major differences between the identified usability issues related to smartwatches and pedometers. As both wearable device categories consist of similar features, the only identified difference was due to the pedometer’s lack of screen. The main advantage of the categorization framework is that it summarizes and structures usability issues of smartwatches and pedometers identified in previous research and during the identifying stage of this study. If one finds additional usability issues of smartwatches and pedometers in the future, the tree within the framework could be expanded.

During the identifying stage, older participants reflected three kinds of device usage problems on both smartwatches and pedometers: short-term, occasional, and long-term issues. Short-term issues, for example those caused by hardware, such as weight, device shape, resolution, device connectivity, sensors (data inaccuracy), and battery, as well as those associated with user interface, such as button location, and iconography, lasted relatively briefly (i.e. the first few days of the study, when participants had their first interactions with the devices) and had minimal effect on device usability. For example, battery could be classified as a short-term issue, because within a few days, participants adjusted to charging the device regularly. Findings regarding short-term usage issues reinforce the statement from [43] that with “increasing time participants were more and more confident in the battery life and thereby decreased the number of charging cycles as well as charged the tracker later and thereby with a lower battery status” (p. 1414). Other identified usability issues, for example those caused by hardware, such as screen size and device size, and by user interface, such as tap detection, font size, interaction techniques, and navigation, appeared either occasionally or throughout the study.

Although participants experienced certain usability issues throughout the study, there was zero drop-out. As stated by [25], “Facing an obstacles during the use of technology doesn’t necessarily lead to frustration because in the face of goal-incongruent events, the user may still cope with the arising emotions” (p. 73). In practice, the smartwatches and pedometers may have: (i) provided immediate accessibility [46]; and/or (ii) acted as facilitators of behavior change for the older adults due to motivational aspects and objective control [43]. For example, participant feedback indicated that devices facilitated motivation by providing “daily steps,” that it was “fun to meet challenges,” and that devices “made me aware of sleep patterns” and “aware to move and not to be sedentary for a long period of time.” In addition, this study also found though qualitative feedback that users had a positive intention to use devices that are expected to work well, have good design, wearability, and do not raise privacy concerns.

Hypothesis testing revealed that small screen size is the main device characteristic related to both smartwatches and pedometers that needs immediate prioritization to improve adoption among the older adults. Supporting previous research, this study further reveals that screen size plays a significant role in adoption of wearable devices, in that small screen size restricts user behavior [20] in their ability to move beyond the fixed functionality of a tradition watch and to support a variety of apps [58] through input and output capabilities [20, 44, 58]. As perception of utility has been found to be of great importance for the older adults [46], options to address screen size include creating smartwatches and pedometers with: (i) non-graphical technology designs with led arrays [44] (ii) a larger screen by curving the screen around the wrist [58]; and (iii) the novel gaze interaction technique that enables hands-free input on smartwatches [15], all of which provide better user experience and can lead to a positive opinion from referents, so that older users actively build a positive attitude towards adoption of devices.

In addition, hypothesis results demonstrated that font size was statistically significantly important for the older adults in both categories of wearable devices. However, font size had higher significance for pedometers than for smartwatches. Pedometers currently have very limited amounts of screen space, and their visual displays can easily become cluttered with information and widgets [6]. Furthermore, the human eye reads an individual line of text in discrete chunks by making a series of fixations (i.e. brief moments, around 250 ms, when the eye is stopped on a word or word group, and the brain processes the visual information) and saccades (i.e. fast eye movement, usually forward in the text around 8-12 characters, to position the eye on the next section of text) [8]. One study [14] asserts that “individual characteristics such as age, impairments may affect movement of the eyes.” Thus significantly longer fixations for smaller fonts [4] on the pedometers may have adversely differentiated the result between two device categories.

Both smartwatches and pedometers provide individuals with various types of tailored and quantified self-data supporting daily physical activities, wherever they are and at any time, [28] through notification in the form of audio, visual, and haptic signals [34]. However, results from hypothesis indicate that the older adults are more sensitive towards the disruptions caused by all push notifications. Current smartwatch and pedometer user interfaces may demand users’ attention at inopportune moments, [34] e.g. without knowing which context the user is in and featuring repetitiveness in the notification content [35]. Other prioritized, predominant usability issues were button location on smartwatches and tap detection on pedometers. The result related to button location was in line with a previous study [20] indicating that pointing error rate is significantly affected by button size and location on the UI as the index finger taps on a device. However, the tap detection were significantly higher for pedometers, it may be because of (i) variance in touch screen technologies used between two device categories. For example, smartwatch devices evaluated in this study used display with the force touch technology and the pedometers with the monochrome Liquid Crystal Display (LCD) touch screen which has different ways of detecting if user is touching the screen. As tap detection has been found to be of great importance for the older adults with regards to pedometers, options to address improving the touch screen with new sensing technology [54] which could detect how much pressure is been exerted by the older users and display the output based on measurements; (ii) characteristics of older participants. For example, Culen [13] state, “age-related changes constitute challenges of touch and grip” (p. 464).

The above discussion highlights prioritized needs for immediate attention and further investigation by technology designers, the research community, and application developers regarding the predominant issues older adults face when using smartwatches and pedometers. We see two lines of immediate future work: First, the effect of timing and frequency using intelligent, sensor driven and/or pre-determined, static notification [35] could be analyzed to gain insight into how the older adults prefer to receive push notifications of “quantified self” data from their smartwatches and pedometers. The findings may help in the design of effective user interfaces to reduce usability issues caused by push notifications and thereby increase device adoption. Second, through a longitudinal study using eye-gazing techniques, future research should look into which typographical variables such as font size and font type [4] are most effective for older users of smartwatches and pedometers.

5 Conclusion

This study presented a categorization framework for usability issues of smartwatches and pedometers. Additionally, this paper used multiple linear regression modeling to prioritize the issues predominantly reported during the first ‘identifying’ stage of the study. “Prioritizing” stage of the study found for (i) pedometers issues of screen size, Typography (i.e. font size), interaction technique (i.e. notification), and tap detection; and (ii) smartwatches issues of screen size, Typography (i.e. font size), interaction technique (i.e. notification); button location warrant immediate attention by technology designers, the research community, and application developers to increase device adoption among the older adults. The main limitation of this study is the relatively small and non-random sample, meaning the results cannot be generalized. This study can, however, be used as a basis for further studies to: (i) investigate how prioritized predominant usability issues differ when secondary users, such as caregivers or relatives, use smartwatches and pedometers on behalf of frail older users; (ii) discover how a categorization framework of usability issues related to smartwatches and pedometers varies across different cultures; (iii) provide information that can serve as a basis for improving adoption by enhancing device characteristics; and (iv) identify the prioritized predominant usability issues among higher age and frail older users.