Keywords

1 Introduction

As the number and diversity of web user interfaces (WUIs) increase, there is the growing demand for their qualitative analysis and prediction of users’ subjective impressions. Given the shortening website update cycles and tightening IT budgets, the development of respective automated tools is deemed necessary to aid web engineers in their work [1]. Many of the tools and methods focus on evaluating usability as quality-in-use and require real or staged user interactions. However, it is also known that first but long-holding impressions of a website are formed in users after a very short visual perception, 50 ms or even less [2]. These purely visual-based subjective impressions significantly affect the subsequent user experience and usability.

Arguably, the most studied dimensions of WUI visual perception are visual complexity (VC) and aesthetics, sometimes supplemented by consistency/regularity/order. The traditional “static” analysis of WUIs that is based on code is understandably disadvantageous here, as it lacks knowledge about web pages’ appearance until it’s actually rendered in a browser. At the same time, the visual WUI analysis that is gaining in popularity [3] mostly works by decompounding the webpage screenshot, thus being computationally expensive and so far suffering from lower accuracy. Meanwhile, analysis of images uses some quantitative indexes (metrics) that apply to whole images, are reasonably inexpensive and accurate: byte size of files in various formats (most often, JPEG), Subband Entropy, etc. [4].

In our paper, we study applicability of metrics based on compression (JPEG and PNG algorithm) and on Shannon’s information entropy as predictors of web interface users’ impressions. In Sect. 2, we describe the three metrics, provide overview of related works, and describe our experimental study. In Sect. 3, we use statistical methods to analyze the data. In the final section, we discuss the results, provide conclusions and outline directions for further research.

2 Methods and Related Work

2.1 Information Entropy

Entropy reflects the degree of randomness, disorder in a system. In our study we are using Shannon’s information entropy (H), which is a measure of unpredictability and uncertainty of information in an event:

$$ H = - K\sum\limits_{i} {p_{i} } \log_{2} p_{i} . $$
(1)

K is the positive constant used to select the units of measure; pi is the probability that the system is in a state i. The entropy filtering transforms the original image on the basis of entropy, where the local neighborhood is described by a multidimensional array of zeros and ones [5]. Figure 1 shows a screenshot of a web page before and after the entropy filtration. One can note that with this filer, interface elements like text, buttons, input fields, drawings, etc. become highlighted, whereas content of the images rather grows dim. This suggests the entropy-based image processing can be rather original with respect to WUI analysis, as it is capable of negating graphical content.

Fig. 1.
figure 1

Screenshot of a web page before (left) and after (right) entropy filtering.

However, the results of studies in human visual perception suggest that it is not well explained with information entropy (information-theoretic complexity), particularly since it does not consider spatial structures. Through Algorithmic Information Theory, the use of compression algorithms that can approximate Kolmogorov algorithmic complexity was justified to gauge human perception, which is mostly top-down, i.e. focused on higher-order images and structures.

2.2 Compression Algorithms-Based Metrics

JPEG Algorithm.

This lossy compression algorithm (specified e.g. in ISO/IEC 10918-1:1994 standard) has gained extreme popularity, especially for photographic images. The compression ratio can be adjusted, with 100 corresponding to nearly lossless conversion. Popular implementation of the algorithm splits the image into blocks of 8*8 pixels and performs the discrete cosine transform. The subsequent quantization corresponds to particulars of human vision, which is not very sensitive to strength of high-frequency brightness variations. Finally, Huffman coding is used to further compress the image data. JPEG compression-based metrics are widely used to study complexity, aesthetics and other attributes of images perception [6], but its applicability for web UIs has not been convincingly demonstrated so far.

PNG Algorithm.

This algorithm (see ISO/IEC 15948:2004 standard) supports lossless data compression for palette-based images (commonly, 24-bit RGB). Pixels in the image are represented as numbers, while palette is a separate table. The compression is performed using Deflate algorithm, which is a combination of LZ77 and Huffman coding. The format is widely used for images that contain text, line art, graphics, etc., for which it provides better compress ratio than JPEG. PNG-based metric was found to be somehow predictive of VC for both pictures [7] and abstract patterns [8], although the correlation with visual complexity was lower than for the JPEG-based metric.

2.3 The Experimental Survey

Our experimental study was performed to check the following hypotheses:

  1. H1.

    Metrics based on compression algorithms (that are already know to work for images) are predictive of web UI’s visual perception.

  2. H2.

    The metric based on information entropy can further improve the predictive power.

Material.

The material in our study was homepages of higher educational organizations (universities and colleges), since we sought to eliminate the effect of different website domains. With dedicated Python script crawling through URLs we took from various catalogues, DBPedia, etc., we collected 10639 screenshots of the homepages. Then we hand-picked 497 screenshots from the pool, using the following criteria:

  • University or college corporate website with reasonably robust functionality;

  • Not overly famous university (to maintain neutrality in the website evaluations);

  • Website content in English and reasonably diverse (i.e. no photos-only websites);

  • Reasonable diversity in website designs (colors, page layouts, etc.).

The screenshots were made for full web pages, as they were rendered, – not just of the part above the fold or of a fixed size. This was needed to test the entropy metric for images of different sizes. Also, we sought to have more novel exploration of WUI metrics, since the crop to size approach is already widely used (cf. AIM Interface Metrics).

Design.

The experiment used within-subject design. The independent variables were:

  • The size of the homepage screenshot file in PNG-24 format, in MB: PNG_size;

  • File size for the same screenshot compressed in JPEG-100 format, in MB: JPEG_size;

  • Entropy value obtained for the .png file through MATLAB’s entropy(I) function [5]: M_Entropy.

Since we discovered the lack of standard questionnaires for assessing subjective complexity of websites (unlike for usability, aesthetics, satisfaction, etc.), the dependent variables in our study were simply represented by 3 subjective evaluation Likert scales, each ranging from 1 (lowest degree of the characteristic) to 7 (the highest degree):

  • How visually complex the WUI appears: Complex;

  • How aesthetically pleasant the WUI appears: Aesthetic;

  • How orderly the WUI appears: Orderly.

Participants.

In total, there were 70 participants (43 females, 27 males) in the survey, whose age ranged from 18 to 29 (mean 20.86, SD = 1.75). They were students of Novosibirsk State Technical University (NSTU) and specialists working in IT industry. The subjects took part in the experiment voluntary and no random selection was performed. All the participants had normal or corrected to normal vision and reasonable experience with websites.

There were another 10 participants, each of which provided less than 10 evaluations. Since their engagement and scrupulosity seemed doubtful, their evaluations were discarded.

Procedure.

The participants were provided with a link to the online questionnaire that we specially developed for this study. While they used varying screen resolutions, the screenshots pixel dimensions were uniform for each participant. In the survey, the screenshots were randomly selected from the pool of 497 (with priority given to the ones that had lower number of evaluations at the moment of selection) and presented to participants successively. The completeness of evaluation, i.e. ranking by all the 3 scales, was mandatory and controlled by the software. The default number of screenshots to be evaluated in each session was set as 50. However, participants were allowed to run the second session (up to another 50 evaluations) if they felt like it.

3 Results

Statistical analysis was performed with SPSS software. We must warn the reader that for the sake of the analysis robustness, some methods more suitable for interval measurement scales were applied with our ordinal dependent variables.

3.1 Descriptive Statistics

In total, the valid participants provided 4235 full evaluations, per the 3 scales each. Thus, each website screenshot was evaluated by 8–10 participants (mean 8.52, SD = 0.56), the average number of full evaluations being 60.05 per participant. Due to technical issues, 4 screenshots were discarded, so 493 remained valid (99.2%). We show descriptive statistics on the image dimensions and the variables in Table 1. The websites that got the highest and lowest average Complex evaluations are presented in Fig. 2.

Table 1. The descriptive statistics for the variables in the study.
Fig. 2.
figure 2

The website screenshots with the highest (left) and lowest (right) Complex values.

The Shapiro-Wilk tests suggested that the normality hypothesis had to be rejected for Orderly (p = 0.002), but not for Complex (p = 0.622) and Aesthetic (p = 0.085).

3.2 Correlation Analysis

The total image size (width * height) was highly correlated with JPEG_size (r = 0.871, p < 0.001) and PNG_size (r = 0.812, p < 0.001), but not with M_Entropy (r = 0.043, p = 0.340). In Table 2, we present Pearson’s and Kendall’s (tau-b, non-parametric statistic for ordinal scales) correlations for the main independent and dependent variables. The strongest correlations for each of the dependent variables are highlighted in bold.

Table 2. Correlations between the variables in the study.

We’d like to note lack of significant correlation between Complex and Aesthetic, and the expected low negative correlation between Complex and Orderly. Aesthetic was highly correlated with Orderly, which suggests the prevalence of “classic” aesthetic dimension in the target users’ perception. Also as expected, M_Entropy had significant positive correlations with JPEG_size and PNG_size. Its correlation with Orderly is remarkable though, since the frequency-based entropy(I) function does not consider the spatial allocation of the image elements. Of the three independent variables, JPEG_size had the strongest correlation with Complex, which is in line with the existing works on VC. So, this factor will be used as the baseline in our regression analysis.

3.3 Regression Analysis

The baseline regression model for Complex with the JPEG_size factor had rather low R2 = 0.05, but was significant (F1,491 = 25.65, p < 0.001). The baseline models for Aesthetic (R2 = 0.103, F1,491 = 56.19, p < 0.001) and Orderly (R2 = 0.034, F1,491 = 17.38, p < 0.001) with the same factor were also significant:

$$ Complex = 3.316 + 0.133 \times JPEG\_size . $$
(2)
$$ Aesthetic = 3.609 + 0.254 \times JPEG\_size , $$
(3)
$$ Orderly = 4.218 + 0.109 \times JPEG\_size . $$
(4)

In the extended regression models (Table 3), all the 3 factors were significant at \( \upalpha \)  = 0.052:

Table 3. Summary of the regression models with the three factors.
$$ \begin{aligned} & Complex = 3.504 + 0.504 \times JPEG\_size - 0.316 \times PNG\_size \\ &\qquad\qquad\qquad\quad - \,0.063 \times M\_Entropy \\ \end{aligned} , $$
(5)
$$ \begin{aligned} & Aesthetic = 2.731 - 0.373 \times JPEG\_size + 0.503 \times PNG\_size \\ & \qquad\qquad\qquad\quad + \,0.229 \times M\_Entropy \\ \end{aligned} , $$
(6)
$$ \begin{aligned} & Orderly = 3.541 - 0.188 \times JPEG\_size + 0.225 \times PNG\_size \\ & \qquad\qquad\qquad\quad + \,0.166 \times M\_Entropy \\ \end{aligned} . $$
(7)

To evaluate the quality of the regression models that had different number of factors (k), we used Akaike Information Criterion (AIC). The AIC values for the considered models are presented in Table 4. The minimal AIC values (highlighted in bold) were found for the models with the three factors, which suggests that the “information loss” in them is lower and therefore they should be preferred over the other models.

Table 4. AIC values for the considered regression models.

4 Discussion and Conclusions

In our work we studied if certain compression and entropy based metrics for web page screenshots can be used as predictors of users’ impressions formed on purely visual basis. Particularly, we proposed the use of straightforward Shannon information entropy metric, which was found to be not correlated with image size (in pixels), unlike most other existing metrics. Unexpectedly, we also found that higher entropy actually decreased perceived complexity (5) and increased perceived orderliness (7). The former finding is consistent with one of our previous works, where we considered a smaller sample of different websites with different evaluators [3].

In the extended regression models, we were able to considerably improve the R2 in comparison to the baseline models: Complex (+110%), Aesthetic (+141%), Orderly (+274%). The adjusted R2s and the AIC values also suggest that the three-factor models should be preferred over the others. Notably, the effects of the JPEG_size and of the two other factors were always the opposite, so the two latter seem to be an important supplement of the baseline factor. The M_Entropy factor had the lowest contributions (Beta coefficients), but still was significant and improved the models.

Contrary to many existing works (e.g. [1]), we found no correlation between Complex and Aesthetic. At the same time, Aesthetic was highly correlated with Orderly, which may suggest the prevalence of the “classical” dimension in the aesthetic perception for the target user group with the university websites. We might assume that more refined dimensions of the overall aesthetic impression would have been correlated with complexity, as it was the case in [9].

Overall, the results of our study support the conclusion that the information entropy obtained via a purely image-processing method can be a feasible metric in analyzing WUIs. So, developers of automated analysis tools for web engineering could consider inclusion of the three metrics. The limitations of our work include relatively meager R2s in the models, as well as low fidelity of scales that described users’ subjective impressions. In our future work we plan integrating more metrics in our WUI Measurement Platform [3] and combining visual and code based analyses of web user interfaces.