Entropy and Compression Based Analysis of Web User Interfaces

Boychuk, Egor; Bakaev, Maxim

doi:10.1007/978-3-030-19274-7_19

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11496))

Included in the following conference series:

International Conference on Web Engineering

Abstract

In our paper we explore whether user visual perception of web interfaces (WUI) can be predicted by certain quantitative characteristics of WUI screenshots. The considered metrics are JPEG file size, PNG file size, and information entropy value calculated with frequency-based MATLAB’s entropy(I) function. We ran survey with 70 subjects who provided subjective evaluations of complexity, aesthetics and orderliness for 497 website homepages. The results suggest that all the three metrics were significant, and the proposed regression models were considerably better than the respective baseline models that only used the popular JPEG-based metric. Remarkably, the entropy metric had significant positive correlations with aesthetic and orderliness evaluations, but not with the size of the image. We believe our findings might be used in development of automated WUI analysis tools to aid web engineers in their work.

You have full access to this open access chapter, Download conference paper PDF

Data Compression Algorithms in Analysis of UI Layouts Visual Complexity

HCI Vision for Automated Analysis and Mining of Web User Interfaces

Effects of visual complexity on user search behavior and satisfaction: an eye-tracking study of mobile news apps

Article 11 May 2021

Keywords

1 Introduction

As the number and diversity of web user interfaces (WUIs) increase, there is the growing demand for their qualitative analysis and prediction of users’ subjective impressions. Given the shortening website update cycles and tightening IT budgets, the development of respective automated tools is deemed necessary to aid web engineers in their work [1]. Many of the tools and methods focus on evaluating usability as quality-in-use and require real or staged user interactions. However, it is also known that first but long-holding impressions of a website are formed in users after a very short visual perception, 50 ms or even less [2]. These purely visual-based subjective impressions significantly affect the subsequent user experience and usability.

Arguably, the most studied dimensions of WUI visual perception are visual complexity (VC) and aesthetics, sometimes supplemented by consistency/regularity/order. The traditional “static” analysis of WUIs that is based on code is understandably disadvantageous here, as it lacks knowledge about web pages’ appearance until it’s actually rendered in a browser. At the same time, the visual WUI analysis that is gaining in popularity [3] mostly works by decompounding the webpage screenshot, thus being computationally expensive and so far suffering from lower accuracy. Meanwhile, analysis of images uses some quantitative indexes (metrics) that apply to whole images, are reasonably inexpensive and accurate: byte size of files in various formats (most often, JPEG), Subband Entropy, etc. [4].

In our paper, we study applicability of metrics based on compression (JPEG and PNG algorithm) and on Shannon’s information entropy as predictors of web interface users’ impressions. In Sect. 2, we describe the three metrics, provide overview of related works, and describe our experimental study. In Sect. 3, we use statistical methods to analyze the data. In the final section, we discuss the results, provide conclusions and outline directions for further research.

2 Methods and Related Work

2.1 Information Entropy

Entropy reflects the degree of randomness, disorder in a system. In our study we are using Shannon’s information entropy (H), which is a measure of unpredictability and uncertainty of information in an event:

$$ H = - K\sum\limits_{i} {p_{i} } \log_{2} p_{i} . $$

(1)

K is the positive constant used to select the units of measure; p_i is the probability that the system is in a state i. The entropy filtering transforms the original image on the basis of entropy, where the local neighborhood is described by a multidimensional array of zeros and ones [5]. Figure 1 shows a screenshot of a web page before and after the entropy filtration. One can note that with this filer, interface elements like text, buttons, input fields, drawings, etc. become highlighted, whereas content of the images rather grows dim. This suggests the entropy-based image processing can be rather original with respect to WUI analysis, as it is capable of negating graphical content.

However, the results of studies in human visual perception suggest that it is not well explained with information entropy (information-theoretic complexity), particularly since it does not consider spatial structures. Through Algorithmic Information Theory, the use of compression algorithms that can approximate Kolmogorov algorithmic complexity was justified to gauge human perception, which is mostly top-down, i.e. focused on higher-order images and structures.

2.2 Compression Algorithms-Based Metrics

JPEG Algorithm.

This lossy compression algorithm (specified e.g. in ISO/IEC 10918-1:1994 standard) has gained extreme popularity, especially for photographic images. The compression ratio can be adjusted, with 100 corresponding to nearly lossless conversion. Popular implementation of the algorithm splits the image into blocks of 8*8 pixels and performs the discrete cosine transform. The subsequent quantization corresponds to particulars of human vision, which is not very sensitive to strength of high-frequency brightness variations. Finally, Huffman coding is used to further compress the image data. JPEG compression-based metrics are widely used to study complexity, aesthetics and other attributes of images perception [6], but its applicability for web UIs has not been convincingly demonstrated so far.

PNG Algorithm.

This algorithm (see ISO/IEC 15948:2004 standard) supports lossless data compression for palette-based images (commonly, 24-bit RGB). Pixels in the image are represented as numbers, while palette is a separate table. The compression is performed using Deflate algorithm, which is a combination of LZ77 and Huffman coding. The format is widely used for images that contain text, line art, graphics, etc., for which it provides better compress ratio than JPEG. PNG-based metric was found to be somehow predictive of VC for both pictures [7] and abstract patterns [8], although the correlation with visual complexity was lower than for the JPEG-based metric.

2.3 The Experimental Survey

Our experimental study was performed to check the following hypotheses:

H1.
Metrics based on compression algorithms (that are already know to work for images) are predictive of web UI’s visual perception.
H2.
The metric based on information entropy can further improve the predictive power.

Material.

The material in our study was homepages of higher educational organizations (universities and colleges), since we sought to eliminate the effect of different website domains. With dedicated Python script crawling through URLs we took from various catalogues, DBPedia, etc., we collected 10639 screenshots of the homepages. Then we hand-picked 497 screenshots from the pool, using the following criteria:

University or college corporate website with reasonably robust functionality;
Not overly famous university (to maintain neutrality in the website evaluations);
Website content in English and reasonably diverse (i.e. no photos-only websites);
Reasonable diversity in website designs (colors, page layouts, etc.).

The screenshots were made for full web pages, as they were rendered, – not just of the part above the fold or of a fixed size. This was needed to test the entropy metric for images of different sizes. Also, we sought to have more novel exploration of WUI metrics, since the crop to size approach is already widely used (cf. AIM Interface Metrics).

Design.

The experiment used within-subject design. The independent variables were:

The size of the homepage screenshot file in PNG-24 format, in MB: PNG_size;
File size for the same screenshot compressed in JPEG-100 format, in MB: JPEG_size;
Entropy value obtained for the .png file through MATLAB’s entropy(I) function [5]: M_Entropy.

Since we discovered the lack of standard questionnaires for assessing subjective complexity of websites (unlike for usability, aesthetics, satisfaction, etc.), the dependent variables in our study were simply represented by 3 subjective evaluation Likert scales, each ranging from 1 (lowest degree of the characteristic) to 7 (the highest degree):

How visually complex the WUI appears: Complex;
How aesthetically pleasant the WUI appears: Aesthetic;
How orderly the WUI appears: Orderly.

Participants.

In total, there were 70 participants (43 females, 27 males) in the survey, whose age ranged from 18 to 29 (mean 20.86, SD = 1.75). They were students of Novosibirsk State Technical University (NSTU) and specialists working in IT industry. The subjects took part in the experiment voluntary and no random selection was performed. All the participants had normal or corrected to normal vision and reasonable experience with websites.

There were another 10 participants, each of which provided less than 10 evaluations. Since their engagement and scrupulosity seemed doubtful, their evaluations were discarded.

Procedure.

The participants were provided with a link to the online questionnaire that we specially developed for this study. While they used varying screen resolutions, the screenshots pixel dimensions were uniform for each participant. In the survey, the screenshots were randomly selected from the pool of 497 (with priority given to the ones that had lower number of evaluations at the moment of selection) and presented to participants successively. The completeness of evaluation, i.e. ranking by all the 3 scales, was mandatory and controlled by the software. The default number of screenshots to be evaluated in each session was set as 50. However, participants were allowed to run the second session (up to another 50 evaluations) if they felt like it.

3 Results

Statistical analysis was performed with SPSS software. We must warn the reader that for the sake of the analysis robustness, some methods more suitable for interval measurement scales were applied with our ordinal dependent variables.

3.1 Descriptive Statistics

In total, the valid participants provided 4235 full evaluations, per the 3 scales each. Thus, each website screenshot was evaluated by 8–10 participants (mean 8.52, SD = 0.56), the average number of full evaluations being 60.05 per participant. Due to technical issues, 4 screenshots were discarded, so 493 remained valid (99.2%). We show descriptive statistics on the image dimensions and the variables in Table 1. The websites that got the highest and lowest average Complex evaluations are presented in Fig. 2.

Table 1. The descriptive statistics for the variables in the study.

Full size table

The Shapiro-Wilk tests suggested that the normality hypothesis had to be rejected for Orderly (p = 0.002), but not for Complex (p = 0.622) and Aesthetic (p = 0.085).

3.2 Correlation Analysis

The total image size (width * height) was highly correlated with JPEG_size (r = 0.871, p < 0.001) and PNG_size (r = 0.812, p < 0.001), but not with M_Entropy (r = 0.043, p = 0.340). In Table 2, we present Pearson’s and Kendall’s (tau-b, non-parametric statistic for ordinal scales) correlations for the main independent and dependent variables. The strongest correlations for each of the dependent variables are highlighted in bold.

Table 2. Correlations between the variables in the study.

Full size table

We’d like to note lack of significant correlation between Complex and Aesthetic, and the expected low negative correlation between Complex and Orderly. Aesthetic was highly correlated with Orderly, which suggests the prevalence of “classic” aesthetic dimension in the target users’ perception. Also as expected, M_Entropy had significant positive correlations with JPEG_size and PNG_size. Its correlation with Orderly is remarkable though, since the frequency-based entropy(I) function does not consider the spatial allocation of the image elements. Of the three independent variables, JPEG_size had the strongest correlation with Complex, which is in line with the existing works on VC. So, this factor will be used as the baseline in our regression analysis.

3.3 Regression Analysis

The baseline regression model for Complex with the JPEG_size factor had rather low R² = 0.05, but was significant (F_1,491 = 25.65, p < 0.001). The baseline models for Aesthetic (R² = 0.103, F_1,491 = 56.19, p < 0.001) and Orderly (R² = 0.034, F_1,491 = 17.38, p < 0.001) with the same factor were also significant:

$$ Complex = 3.316 + 0.133 \times JPEG\_size . $$

(2)

$$ Aesthetic = 3.609 + 0.254 \times JPEG\_size , $$

(3)

$$ Orderly = 4.218 + 0.109 \times JPEG\_size . $$

(4)

In the extended regression models (Table 3), all the 3 factors were significant at $ \upalpha $ = 0.052:

Table 3. Summary of the regression models with the three factors.

Full size table

$$ \begin{aligned} & Complex = 3.504 + 0.504 \times JPEG\_size - 0.316 \times PNG\_size \\ &\qquad\qquad\qquad\quad - \,0.063 \times M\_Entropy \\ \end{aligned} , $$

(5)

$$ \begin{aligned} & Aesthetic = 2.731 - 0.373 \times JPEG\_size + 0.503 \times PNG\_size \\ & \qquad\qquad\qquad\quad + \,0.229 \times M\_Entropy \\ \end{aligned} , $$

(6)

$$ \begin{aligned} & Orderly = 3.541 - 0.188 \times JPEG\_size + 0.225 \times PNG\_size \\ & \qquad\qquad\qquad\quad + \,0.166 \times M\_Entropy \\ \end{aligned} . $$

(7)

To evaluate the quality of the regression models that had different number of factors (k), we used Akaike Information Criterion (AIC). The AIC values for the considered models are presented in Table 4. The minimal AIC values (highlighted in bold) were found for the models with the three factors, which suggests that the “information loss” in them is lower and therefore they should be preferred over the other models.

Table 4. AIC values for the considered regression models.

Full size table

4 Discussion and Conclusions

In our work we studied if certain compression and entropy based metrics for web page screenshots can be used as predictors of users’ impressions formed on purely visual basis. Particularly, we proposed the use of straightforward Shannon information entropy metric, which was found to be not correlated with image size (in pixels), unlike most other existing metrics. Unexpectedly, we also found that higher entropy actually decreased perceived complexity (5) and increased perceived orderliness (7). The former finding is consistent with one of our previous works, where we considered a smaller sample of different websites with different evaluators [3].

In the extended regression models, we were able to considerably improve the R² in comparison to the baseline models: Complex (+110%), Aesthetic (+141%), Orderly (+274%). The adjusted R²s and the AIC values also suggest that the three-factor models should be preferred over the others. Notably, the effects of the JPEG_size and of the two other factors were always the opposite, so the two latter seem to be an important supplement of the baseline factor. The M_Entropy factor had the lowest contributions (Beta coefficients), but still was significant and improved the models.

Contrary to many existing works (e.g. [1]), we found no correlation between Complex and Aesthetic. At the same time, Aesthetic was highly correlated with Orderly, which may suggest the prevalence of the “classical” dimension in the aesthetic perception for the target user group with the university websites. We might assume that more refined dimensions of the overall aesthetic impression would have been correlated with complexity, as it was the case in [9].

Overall, the results of our study support the conclusion that the information entropy obtained via a purely image-processing method can be a feasible metric in analyzing WUIs. So, developers of automated analysis tools for web engineering could consider inclusion of the three metrics. The limitations of our work include relatively meager R²s in the models, as well as low fidelity of scales that described users’ subjective impressions. In our future work we plan integrating more metrics in our WUI Measurement Platform [3] and combining visual and code based analyses of web user interfaces.

References

Reinecke, K., et al.: Predicting users’ first impressions of website aesthetics with a quantification of perceived visual complexity and colorfulness. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2049–2058 (2013)
Google Scholar
Tuch, A.N., et al.: The role of visual complexity and prototypicality regarding first impression of websites: working towards understanding aesthetic judgments. Int. J. Hum Comput Stud. 70(11), 794–811 (2012)
Article Google Scholar
Bakaev, M., Heil, S., Khvorostov, V., Gaedke, M.: Auto-extraction and integration of metrics for web user interfaces. J. Web Eng. 17(6&7), 561–590 (2019)
Article Google Scholar
Rosenholtz, R., Li, Y., Nakano, L.: Measuring visual clutter. J. Vis. 7(2), 1–22 (2007)
Article Google Scholar
Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing Using MATLAB. Pearson Education, Upper Saddle River (2004)
Google Scholar
Chikhman, V., et al.: Complexity of images: experimental and computational estimates compared. Perception 41(6), 631–647 (2012)
Article Google Scholar
Marin, M.M., Leder, H.: Examining complexity across domains: relating subjective and objective measures of affective environmental scenes, paintings and music. PLoS ONE 8(8), e72412 (2013)
Article Google Scholar
Gartus, A., Leder, H.: Predicting perceived visual complexity of abstract patterns using computational measures. PLoS ONE 12(11), e0185276 (2017)
Article Google Scholar
Michailidou, E., Harper, S., Bechhofer, S.: Visual complexity and aesthetic perception of web pages. In: Proceedings of the 26th Annual ACM International Conference on Design of Communication, pp. 215–224 (2008)
Google Scholar

Download references

Acknowledgment

This work was supported by Novosibirsk State Technical University, project No. TP-EI-1_17. We also thank Sebastian Heil from TU Chemnitz (Germany) and Vladimir Khvorostov from NSTU for aiding in the data collection.

Author information

Authors and Affiliations

Novosibirsk State Technical University, Novosibirsk, Russia
Egor Boychuk & Maxim Bakaev

Authors

Egor Boychuk
View author publications
You can also search for this author in PubMed Google Scholar
Maxim Bakaev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maxim Bakaev .

Editor information

Editors and Affiliations

Novosibirsk State Technical University, Novosibirsk, Russia
Maxim Bakaev
Erasmus University Rotterdam, Rotterdam, The Netherlands
Flavius Frasincar
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
In-Young Ko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boychuk, E., Bakaev, M. (2019). Entropy and Compression Based Analysis of Web User Interfaces. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-19274-7_19
Published: 26 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19273-0
Online ISBN: 978-3-030-19274-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Entropy and Compression Based Analysis of Web User Interfaces

Abstract

Similar content being viewed by others

Data Compression Algorithms in Analysis of UI Layouts Visual Complexity

HCI Vision for Automated Analysis and Mining of Web User Interfaces

Effects of visual complexity on user search behavior and satisfaction: an eye-tracking study of mobile news apps

Keywords

1 Introduction

2 Methods and Related Work

2.1 Information Entropy

2.2 Compression Algorithms-Based Metrics

JPEG Algorithm.

PNG Algorithm.

2.3 The Experimental Survey

Material.

Design.

Participants.

Procedure.

3 Results

3.1 Descriptive Statistics

3.2 Correlation Analysis

3.3 Regression Analysis

4 Discussion and Conclusions

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us