1 Introduction

The increasing demand for mobile computing has led to the presence of numerous handheld devices of different screen dimensions, from 4-in phones to 12.9-in tablets [1]. The diverse sizes of devices offer various options for customers according to individual preferences. However, the physical limitations of different dimensions constrain the display area and cause an inconsistent user experience across screens. The previous study indicated that larger displays offer a preferable user experience and a higher sense of immersion [2], yet the mobility of larger devices often fails in the tasks of daily portable usage. It is also a challenge for most mobile applications to offer a consistent user experience across devices under the fragmented conditions of screen sizes. Although it is possible and effortless to directly scale and resize the contents to fit different panels, a proper solution is to optimize the user experience across different displays [3]. Therefore, the approach to evaluating user feedback to the stimulations from various screen dimensions has become an important issue [4]. In addition to resizing the contents with a fixed ratio between targets and panels, the content adjustment between screens could utilize a fixed target size that reveals more content with increasing dimensions of different devices or even completely redesign the interface for the diverse panel sizes. In the situation of identical target sizes, the concept of flux could be adopted to explore the interaction between users and various sizes of displays.

This research introduces the concept of visual flux to describe the process from controlled visual stimulation of mobile devices to brain recognition of information and explores the interaction between users and different screen sizes of portable devices. The communication of the information stream utilizes the device display to create visual stimulations on the retinas of the user’s eyes, and the flow from different screen sizes and by various device orientations leads to the representation of this approach in the form of visual flux. User feedback is then provided to the devices after the information of visual flux undergoes the perception process of the human brain. This research analyzes the data of user feedback and visual flux and explores the user interaction between receiving and conveying information across different sizes of portable devices. On mobile devices, visual flux (\( J_{v} \)) could be defined as the amount (\( n \)) of effective information (\( q \)) passing through the area of the user’s field of view (\( A \)) (Eq. 1):

$$ J_{v} = n\frac{q}{A} $$
(1)

In this case, \( q \) represents the source of visual stimulations, or targets. This research utilizes the controlled dimension of visual stimulation (\( q \)) and discusses the user behavior with different numbers (\( n \)) of stimulations on different screen sizes (\( A \)) under the condition of respective unit visual flux (\( q/A \)). From the device aspect, \( A \) could be defined as the effective zone of the display area, not including the title bar or the control bar. From the user’s perspective, considering the individual behavior with different postures for various sizes of portable devices, the definition of \( A \) could be represented as the angle of view of distinctive devices to the user. To consider the effects of the relation between the devices and the users, such as the corresponding distance and orientation between devices and users, the dimensions of different screen sizes, and the information stimulation from each device, this research incorporates the Kinect v2 motion sensor released by Microsoft Corporation and the orientation data from the position sensors of mobile devices and constructs a system for reconstructing the user’s interactive behavior to explore the physical user activities with multiple dimensions of mobile devices; the study intends to investigate the process between the images generated by the devices and the information perceived by the human brain with the controlled stimulation over multiple sizes of mobile devices via the concept of visual flux.

2 Related Work

In the field of studying the interaction between portable devices and physical user behavior, the research focuses on discovering the viewing distance between handheld electronic devices and different groups of users to provide comprehensive findings [5], in addition to the relation between font size and viewing distance, a detailed study [6] offers the relation between one type of content and the viewing distance of a smart phone. However, the effect of cross-dimensional comparison with multiple sizes could offer a more exhaustive understanding of fragmented screen conditions. Research on the effect of different desktop screen sizes on human posture revealed that users tend to position larger displays at a greater distance [7], and it is valuable to extend this analysis to the field of mobile devices. Research on the minimum target size of mobile devices [8] offers a complete discussion on the effect of target size with one-thumb usage along with a wide range of applications. On the other hand, studies on human postures with mobile devices including head restriction while using a cell phone [9], muscle activity related to user posture [10], and the effect of typing related to upper body posture [11] are also exhaustive, and researches that attempt to utilize user skeleton data [12, 13] are also applicable not only in the discipline of human–computer interaction but in a wide range of applications. Regarding evaluation of user performance, Fitts’ Law [14] indicates that the mean time (\( MT \)) of user reaction is a linear function of the index of difficulty (\( ID \)) and could be expressed in the form of \( MT = a + b \cdot ID \) (\( a \) and \( b \) are experimental coefficients). The index of difficulty, \( ID \), is the bit form of the ratio between target length and moving path, \( ID = \log_{2} (1 + D/W) \), where \( D \) is the length of the moving path, and \( W \) is the target width. Hick’s Law, with a similar form of prediction for user performance [15], focuses more on the condition of multiple choices. Both predictions are widely applicable to the domain of user performance. The computational model proposed in the research [4] attempts to establish a model of user experience on mobile devices and breaks down the user experience into usability, affect, and user value, which requires authentic quantitative inputs. Studies of user experience on touch devices are also fruitful; e.g., research that focuses on the gesture between touch [16] and slide provides a comprehensive study, and researches on virtual keyboards [17, 18] and different types of input methods [19] are also exhaustive. A valuable cross-platform comparison is conducted on three different mobile operating systems [20]—iOS, Android, and Windows—which compares the differences in virtual keyboards. In an era of fragmented devices, there is an urgent need for a user experience and performance comparison among multiple screen sizes.

3 Method

To achieve the purpose of capturing user posture and gathering the orientation of mobile devices, this research applies a Kinect v2 motion sensor in sync with the orientation data recorded by the position sensor of the mobile devices and establishes an integration system to combine these two types of data and attempts to reconstruct the entire physical process of interaction between the users and the mobile devices. An evaluation application based on Android is programmed to control the output stimulation of the mobile devices and records the user feedback under the specific visual stimulation. Various dimensions of devices are selected as the equipment for this research to facilitate the understanding of the user interaction and performance under different screens. Figure 1 illustrates the architecture of the approach of this research.

Fig. 1.
figure 1

Architecture of the approach used in this research

3.1 User Posture Capture

Microsoft Kinect v2 sensors are applied in this research to extract the user skeleton information during the interaction process. As shown in Fig. 2, Kinect v2 sensors are capable of capturing 2D RGB color images and 3D depth information and extracting the position coordinates of the 25 spatial joints of the individual user, including the head, neck, pelvis, shoulder, elbow, and palm. This research focuses on the interaction between mobile devices and the user’s upper body posture and extracts 17 upper body joint data to calculate the parameters of different user postures for different screens. The Kinect v2 sensor is positioned directly in front of the user at a proper distance that could capture the whole user body image and complies with the design specification of Microsoft Cooperation. The recorded 17 spatial joint data and relative time stamps are later converged in the integration system to reconstruct the user’s activity.

Fig. 2.
figure 2

Samples of the captured images from Kinect v2: (a) 2D RGB image, (b) 3D depth image, and (c) user skeleton joint data (Color figure online)

3.2 Mobile Application

Two functions of the mobile application are required for this research: (1) deriving and recording the orientation of the mobile devices and (2) providing controlled visual stimulation and recording the user feedback. The Android system is capable of calculating the relative angles between the device coordinates and earth—azimuth, pitch, and roll—via the position sensors, such as the geomagnetic field sensor and accelerometer (Fig. 3(a)). The recorded orientation data and relative time stamps are later merged with user skeleton data to reconstruct the user activity during the interaction process. For controlled visual stimulation, the tasks are designed as a single touch tap on numbered circular targets on a white background (Fig. 3(b)). Each target has the diameter of exactly 1 cm, representing the unit visual flux (unit \( J_{v} \)) corresponding to each device. Serial and distinctive numbers are rendered within each target as users are asked to touch each target to complete the task. When the largest (and last) target in the stage is finished, the next new stage with refreshed locations and colors of targets is immediately regenerated (Fig. 3(c)). Note that the number of targets is equal to the stage number: namely, the first stage contains one target, the fifth stage contains five targets, and the last stage, the twentieth stage, contains twenty targets. The locations of the targets are separately distributed over the screens and are randomly located to avoid the learning effect caused by repeated target locations. The colors of the targets are also randomly generated as red, blue, and green to simulate the daily usage of applications. User feedback, such as time, touch locations, successful targets and errors, is recorded during the process for later analysis.

Fig. 3.
figure 3

Snapshots of the Android application: (a) orientation data and basic options, (b) stage 5 of the evaluation screen, and (c) stage 15 of the evaluation screen (Color figure online)

3.3 Physical Interaction Reconstruction System

After the data acquisition of the user’s skeleton and the device’s orientation, a reconstruction of the user’s physical interaction behavior could be performed, and further related information could also be calculated. This research introduces the physical interaction reconstruction system, which integrates skeleton and orientation data and derive corresponding information, such as angle of view and angle of skew. The user’s horizontal angle of view to the devices is represented as the angle of view (\( \alpha \)), and the angle difference between the user’s fixation at the center of the device and the normal direction of the device screens is represented as the angle of skew (\( \theta \)). The angle of view, \( \alpha \), is defined as Eq. 2, where \( x \) is the width of the device screen, and \( l \) is the distance between the user’s eyes and the device, or the viewing distance:

$$ \alpha = 2\tan^{ - 1} \frac{x}{2 l} $$
(2)

The other parameter is the angle of skew, \( \theta \), which is defined as the angle difference between the normal direction of the device screen and the user’s line of sight, as Eq. 3. Vector \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {e} \) is defined as the vector pointing from the center of the device to the reference point of the user’s eyes, and vector \( \overset{\lower0.1em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {d} \) is defined as the normal vector of the device screen. Due to the constraint of the experimental sensors, this research assumes that the reference point of the user’s eyes is the geometric center of the user’s head, which is captured by the Kinect v2 sensor as the coordinates of the head joint.

$$ \theta = \cos^{ - 1} \frac{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {e} \cdot \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {d} }}{{|\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{e} ||\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {d} |}} $$
(3)

The physical interaction reconstruction system of this research integrates the skeletal data and device orientation data and could clearly reconstruct the moments of physical interaction between the users and the devices. In-time parameters, such as angle of view and angle of skew per frame, are calculated immediately to assist a better understanding of user behavior. It is also able to freely rotate and observe the relation between user posture and device orientation from multiple directions (Fig. 4). Primary functions such as play at normal speed, fast forward, or jump to a specific frame are also useful for observers to gain a more detailed understanding of the physical interaction process.

Fig. 4.
figure 4

Screenshots of the physical interaction reconstruction system

3.4 Experimental Procedure

This research focuses on the user’s performance across multiple dimensions of mobile devices, and the experiments are conducted on 4 devices of different sizes: ASUS ZenFone GO (4.5-in), ASUS ZenFone 3 Max (5.2-in), ASUS ZenPad (8-in), and ASUS ZenPad (10-in). The effective display area for each device is 46.2, 59.8, 159.84, and 260.55 cm2, respectively, and the average effective display ratio is 1.4375 with the standard deviation of 0.067 (all measurements of the effective display area exclude the title bar and the control bar). Fourteen subjects between 22 and 35 years of age (average age 26 years, standard deviation 2.75 years) participated in this experiment, 8 males and 6 females. One participant was left-hand dominant, while the other participants were right-hand dominant, and all participants exceeded 8 years of experience with touch-sensitive mobile devices. A 7-in non-experimental device was offered to participants for a free trial and to familiarize themselves with the operating process of the application. Participants were asked to finish tasks, touching the evaluating application targets throughout all stages in a natural gesture while the Kinect v2 sensor captured their image without any physical contact. An adequate rest between each task was given to eliminate visual fatigue and any carry-over effect, and two tasks were performed for each device dimension to minimize the situation of drift coordinates of skeleton joints by the Kinect v2 sensor. Participants were informed that they would be rewarded after the experiment to increase their willingness to participate. During the process, participants were asked to complete the tasks with the device in portrait orientation and use the dominant index finger to touch and the other hand to hold the device. The experiments were performed in a specially designed compartment with a unified lighting source, with three walls and the ceiling painted black to reduce the interference of additional light reflections. Adequate space was ensured so that no participant would feel stressed. Figure 5(a) shows the experimental setup and the controlled environment of this research; Fig. 5(b) shows the equipment used in this research, including four experimental devices and one non-experimental device for familiarizing users with the application.

Fig. 5.
figure 5

(a) The experimental setup and (b) the experimental equipment used in this research

4 Result

The participant performance against different dimensions of mobile devices under controlled stimulation is analyzed in this research, and the results show that during the experimental time span (less than 5 min), participant performance—i.e., the average number of finished targets per second (TPS)—decreases with increasing size of the device display (Fig. 6). For the 4.5-in device, the average TPS of participants is 1.622 with a standard deviation of 0.189; for the 5.2-in device, the average TPS of participants is 1.484 with a standard deviation of 0.205; for the 8-in device, the average TPS drops to 1.209 with a standard deviation of 0.125; and for the 10-in device, the average TPS is 1.063 with a standard deviation of 0.096. Statistical analysis using repeated measures analysis of variance (rmANOVA) indicates that the average TPS differences between the four screens are statistically significant (F(3,39) = 135.84, p < .001). Post-hoc comparison by Bonferroni corrections reveals that all average TPS differences among the four screens are statistically significant (4.5-in/5.2-in p < .001, 5.2-in/8-in p < .001, 8-in/10-in p = .001). Thus, it could be inferred that user performance drops with increasing effective display area under the condition of fixed target size in the mid-short term operating scenario.

Fig. 6.
figure 6

The averaged TPS among four experimental devices

Regarding the number of touch screen errors recorded across different screen sizes of experimental devices, shown in Fig. 7, for individual participants, the majority of the participants tended to commit more errors with increasing device display. Furthermore, the average number of errors is also positively correlated to the dimensions of the devices. The statistical analysis using rmANOVA shows that the differences in the average number of errors across the four screens are statistically significant (p < .001).

Fig. 7.
figure 7

The error counts among four experimental devices

Taking a closer look at the task process, the experiment reveals that participant performance declines on larger screens at each stage of the tasks. Figure 8(a) shows the completion time for each stage on the four different screens and reveals a negative correlation between participant performance and screen dimensions, which indicates that \( J_{v} \) could describe user performance under the same number (\( n \)) of controlled stimulations (\( q \)) over different screens (\( A \)). Linear correlations between the number of targets and completion time of the targets are shown in Fig. 8(b) and indicate good compliance with both Fitts’ Law and Hick’s Law of this experiment design.

Fig. 8.
figure 8

User performance on four experimental devices: (a) number of stages as a function of time (ms) and (b) number of targets as a function of time (ms)

The relation between the averaged time of milliseconds per target (MPT) and the number of completed targets is shown in Fig. 9. This reveals that the average MPT increases with increasing number of visual stimulations across the four screens, and the differences among the devices become recognizable after approximately 5 stages (15 target touches). The conjecture is that the unstable status before stage 5 is caused by the uneven distribution of visual stimulations.

Fig. 9.
figure 9

The averaged MPT on four experimental devices as a function of the number of targets

Figure 10 shows the accumulated targets for one of the task processes at stage 5 and stage 10 of a certain participant (Fig. 10(a) and (b) respectively). It is evident that the number of targets before stage 5 is insufficient to cover the entire screen, and this is speculated as the main cause of the fluctuation of averaged MPT before stage 5 in Fig. 9.

Fig. 10.
figure 10

Target distribution example of: (a) 0 to 5 stages and (b) 0 to 10 stages

The previous results indicate that participant performance under the condition of equal stimulation, same number and even size of stimulation, decreases with increasing device dimensions. This research further explored the effect of user posture with the detailed information of angle of view and angle of skew, both derived using the physical interaction reconstruction system. Although the pilot experiments show that increased angle of skew leads to decreased participant performance, the main experiment reveals that the average differences of angle of skew among the four devices are not statistically significant (the average is 14.01° with 0.74° standard deviation across four screens, p = .516 with rmANOVA). It may be practical for this research to neglect the effect of the angle of skew, but from the user-centered experience perspective, sudden and violent changes in angle of skew could offer precious information regarding the fluctuation in user experience since the dramatic change in angle of skew seriously affects the angle of view. In addition, the continuous quantitative data of angle of skew during the interaction process may be valuable for qualitative research of user behavior.

From the user’s perspective, \( J_{v} \) could be further represented as the number of stimulations in the user’s angle of view. The experimental results show that the average angle of view is 9.43° with a standard deviation of 1.85° for the 4.5-in device, 10.94° with a standard deviation of 2.14° for the 5.2-in device, 16.67° with a standard deviation of 3.38° for the 8-in device, and 19.54° with a standard deviation of 3.13° for the 10-in device. The performed rmANOVA indicates that the differences in angle of view among the four devices are statistically significant (p < .001). The MPT for each participant as a function of angle of view is shown in Fig. 11(a), which reveals a positive correlation between angle of view and MPT. The normalized data show a clear correlation between MPT and angle of view as the ratio corresponding to the average value of individual participant; see Fig. 11(b).

Fig. 11.
figure 11

The relation of (a) MPT as a function of angle of view and (b) normalized MPT as a function of normalized angle of view

In addition, the experiment reveals that participants tend to position different dimensions of mobile devices within a range of more similar angles of view, but not similar enough to obtain the exact same angle of view for different screens. Namely, for larger screens at further distances and vice versa, the tendency is consistent with the results of previous research on fixed displays [7]. A reference angle of view of the four devices, which is derived from the average viewing distance of each user, is depicted in Fig. 12 to contrast with the actual angle of view. Compared to the reference angle of view, which assumes that the devices are positioned at the same distance, this reveals that the actual angle of view covers a narrower range of angles and is more reflective of the certain conditions for the interactive process. The supposition is that users are familiar with particular viewing distances and have the tendency to position devices in the respective useful field of view (uFOV), and the reduced distance of eye movement facilitates user performance.

Fig. 12.
figure 12

Difference between range of actual angle of view and range of referenced angle of view

5 Discussion

This research introduces the concept of visual flux to describe the process of physical interaction between users and mobile devices and utilizes the unit visual flux as a datum to compare the user performance under different screen dimensions. This approach could depict the condition of receiving information under the mid-short time span (<300 s). Based on the data of device dimensions and relative user posture, this research proposes an alternative perspective for portraying users’ physical interaction process with mobile devices. Under the specifically controlled stimulation, the proposed concept could offer a quantitative index for design across platforms in fields where the consistent user experience across different screen sizes is crucial, such as mobile gaming, in both hardware manufacturing and application development. Furthermore, the physical interaction reconstruction system of this research offers a more intuitive and detailed point of study for understanding the physical process of human–computer interaction. Future research based on this concept could be conducted to extend the screen dimensions to laptops, desktops, or even projectors or scale the sizes down to wearable devices and head-mounted devices to explore a deeper understanding between visual stimulation and user feedback across the entire spectrum of informative glasses.