1 Introduction

The increase of multimedia applications such as information or entertainment systems is leading the industry to find new solutions related to image storage capacity and transmission bandwidth requirements, especially for smartphones and other mobile technologies. Over the last years, many methods for image compression have been proposed, aiming at efficiently reducing image file size without compromising image quality or changing image format. The compression methods proposed so far in the literature are often based on metrics for image quality measures such as Multi-scale Structural Similarity (MS-SSIM), Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM) [1]. Objective measures such as these evaluate the difference between the absolute quality of a distorted picture and a reference picture, therefore, they are fast, easily repeatable and do not require high human resources. However they do not always predict subjective ratings provided by human evaluators in a reliable way [1, 2]. Subjective quality evaluation is still a key process in image or video compression methods because low perceived quality contributes directly to a poor user experience [3, 4].

This paper shows the subjective evaluation of a new compression plug-in for current compression formats. The plug-in has been developed by an engineering company called Cogisen (www.cogisen.com). It follows an adaptive compression process that is able to evaluate the visual saliency of the image and also provide a quality perception model that is able to quickly calculate at which amount of compression a user will perceive a reduction in image quality. In this manner with the quality perception threshold the compression adapts differently for each image. The compression model proposed by Cogisen has been created with a deep learning platform, with algorithms that are able to capture processes that are highly nonlinear, sparse and have a high noise-to-signal. Since the Cogisen filter has no effect on compatibility with other main compression models, using the plug-in the resultant images can be made fully compliant with all formats. Moreover, the solution has a minimal impact on mobile processor usage and already allows for an adaptive JPEG solution, which is the most used image compression standard. The Cogisen compression plug-in does not use any mathematical correlation function between the aforementioned compression quality metrics for its quality perception model. For this reason the subjective evaluation and validation of the Cogisen quality perception model was fundamental.

This work describes the evaluation of the subjective quality perception of pictures compressed separately by the Facebook Mobile and the Cogisen plug-in integrated in the Facebook Mobile compression settings.

2 Methodology

The main aim of this work was to assess the user experience of the perceived quality of color images compressed by the Cogisen’s compression plug-in. The assessment procedure was first validated by using a subjective quality dataset (Phase I) and then applied to evaluate the experimental stimuli (Phase II).

Phase I. Implementation and validation of a web-based image quality assessment method. In the first phase of this work, we focused on the design process and the validation of a web-based image quality evaluation tool using a Single Stimulus method. We used an existing image database provided with previously validated subjective quality scores. The subjective data obtained with the new tool was compared to subjective data provided by the reference database.

Phase II. Subjective evaluation of pictures compressed by the Cogisen plug-in. The second phase of this work consists of the subjective evaluation of two kind of compressed color images: (1) pictures compressed by the Facebook Mobile application and (2) pictures compressed by the Cogisen plug-in integrated in the Facebook Mobile compression settings. Both groups of experimental images were compared to the quality scores assigned to corresponding high quality reference pictures. Three compression amounts were used in three different testing sessions: 15 % compression, 30 % compression, and 45 % compression.

3 Phase I. Implementation and Validation of a Web-Based Image Quality Assessment Method

This section describes the design process and the validation of a web-based image quality evaluation tool. We selected a public test image library called LIVE Image Quality Assessment databaseFootnote 1 [5]. In order to build an image quality evaluation tool based on the SurveyGizmo platformFootnote 2, we used the same methodology that was used to obtain the image quality scores that accompany the chosen database. The LIVE subjective scores were then compared to the subjective scores obtained with the new testing procedure. The purpose of validation testing is to ensure that the chosen method of test application using a crowdsourcing web platform (SurveyGizmo), produced reliable subjective image quality scores that were consistent with the reference scores obtained with traditional methods of recruiting and test administration, as for the LIVE image dataset.

3.1 Material: Source Database

As reference image set, we selected the LIVE Image Quality database because (1) it offers one the largest subjective image quality databases in the literature; and (2) it has been evaluated by a subjective quality assessment study using a Single Stimulus method, which is the most natural image quality assessment method for home viewing conditions [6]. The LIVE database is based on twenty-nine high-resolution color images, which are compressed in different image distortion types, including JPEG compression [6]. Since for the LIVE test each subject evaluated both the distorted and reference images in each session, the authors calculated a quality difference score for all distorted images and for all subjects. The LIVE experiments were conducted using a web-based interface showing the image to be ranked using a Java scale-and-slider applet for assigning a quality score. The subjective quality DMOS values obtained during the assessment of the LIVE database are publicly available.

In order to create a test model for Cogisen’s web-based subjective tests, efforts were made to replicate the LIVE test methodology. A selection of both reference high quality and JPEG distorted images was taken from the LIVE database and used to set up a web-based image quality evaluation test.

3.2 Method

In Single Stimulus (SS) method, a single image or sequence of images is presented and the observer assigns a score to the presentation. This method is generally used as an alternative to the Double Stimulus (DS) method, which asks observers to assess two simultaneously shown versions of each test picture. It was adopted the Single Stimulus Continuous Quality Scale (SSCQS) method [7] instead of a DS method because the former replicates the single stimulus home viewing conditions.

The SSCQS allows continuous measurement of the subjective quality of images, with subjects viewing the material once without a source reference. The technique presents one picture at a time to the viewer. The test pictures or sequences are presented only once in the test session. An example of a high quality image is presented only once at the beginning of the test so the users know what a high quality image looks like and are able to frame their expectation. The reference images are randomly shown during the test as a control condition. At the beginning of the first sessions some stabilization sequences (also called “dummy” sequences) are introduced to stabilize the user’s opinion. The sequence presentations are randomized to ensure that the same picture is not presented twice in succession. Observers evaluate the quality of each image using a grading scale as the presentation of each trial ends.

3.3 Procedure

At the beginning of the test, a preliminary questionnaire asks participants’ age, gender, visual acuity, contrast sensitivity, color vision, general health conditions, and prior experience with video display systems or devices. Participants are also asked to check the physical dimensions of their display and to regulate it to the maximum brightness. Other information such as operating system, browser and country are directly collected by the web service. If participants meet requirements—i.e., no vision impairments, no desktop device less than 13-inches wide, maximum brightness on—a new page with test instructions is shown before the test begins. The image quality evaluation test is implemented through a web-based platform called SurveGizmo. The test has been translated into both Italian and English to accommodate the participants’ mother language.

The presentation of each image lasted 7 s. After that, a quality scale field lasting at least 3 s is shown. The quality scale that was used consisted of integers in the range 1–100. The scale was marked numerically and divided into three equal portions, which were labeled with adjectives: “Bad”, “Fair”, and “Excellent”. Subjects were asked to report their assessment of quality by dragging the slider on the quality scale. The position of the slider was automatically reset after each evaluation.

Following ITU-R BT.500-8 [7], the test consists of 3 different trial sequences: a training sequence (4 trials), a stabilization (also called “dummy”) sequence (5 trials) and a testing sequence (25 trials). The training trials was presented only once per subject at the very beginning of the test. The stabilization trial was presented immediately before the test session without any noticeable interruption to the subjects. The stabilization phase consists of pictures ensuring coverage of the full quality range to help observers in stabilizing their opinion. In accordance with the ITU-R standard, the data issued during the stabilization phase is not taken into account in the results of the test, and the stabilization pictures do not appear during the test session.

The trial position in each sequence was randomized to avoid showing the same picture one after the other. The whole session is designed to last no longer than 15 min to avoid errors due to participants’ fatigue and loss of attention.

3.4 Subjects

The subjective test was carried out in the participants’ home conditions via the SurveyGizmo web platform. In total, 43 volunteers (44.2 % males, mean age = 35 years old, Italian speakers = 39, English speakers = 4), 23 non-expert viewers (53.3 % males, mean age = 35.4) and 20 expert users (34.7 % males, mean age = 34.5) participated in the study. The validation test was completed between June 17, 2015 and June 20, 2015. A post-screening of the subjective test scores was conducted prior to conducting the data analysis. For each viewer it was first checked that they met the preliminary requirements. The data from 19 participants was discarded from the subjective data set. The final screened subjective data set included scores from a total of 24 viewers.

3.5 Results

Opinion Scores.

Mean opinion score (MOS): Opinion scores were integers in the range 1–100. The mean opinion scores (MOS) were calculated for each subject (MOS = 53.3). The raw opinion scores were converted to difference mean opinion score (DMOS):

$$ dij = riref(j){-}rij $$

where rij is the raw score for the i-th subject and j-th image, and riref(j) denotes the raw quality score assigned by the i-th subject to the reference image corresponding to the j-th distorted image [5, p. 4]. The Difference Mean Opinion Scores (DMOS) were obtained by calculating the difference between the MOS of reference images and the MOS of the related compressed images (DMOS = 18.82).

Scale Assessment.

The Cronbach’s alpha of opinion scores was calculated in order to evaluate the internal consistency measure on the 34 trials composing the whole experimental session. An index of reliability Alpha = 0.919 (Cronbach’s Alpha Based on Standardized Items = 0.924) was obtained.

Comparison Between Subjective Scores.

The Pearson linear correlation was calculated between LIVE DMOS and the Test DMOS for measuring prediction accuracy. Results show a coefficient R = 0.541; p = 0.005.

Comparison within Subjects.

The effect of the participants’ expertise on their performance was investigated using the One-way Analysis of Variance (ANOVA). Results show no effect of expertise on difference mean opinion scores (F(1,23) = 0.398; p > 0.05).

3.6 Discussion

The analysis of the internal consistency of the proposed subjective evaluation procedure indicates an excellent internal consistency within the evaluation test. Moreover, the subjective quality score comparisons given by 24 subjects to 25 images strongly correlate with the corresponding difference mean opinion scores provided by the LIVE database, with no significant difference between expert and non-expert participants. Therefore, since the subjective image quality assessment tool that was developed is internally consistent, it can be used as a basis model for evaluating the perceived quality of images compressed by the Cogisen method.

4 Phase II. Subjective Evaluation of Pictures Compressed by the Cogisen Plug-in

This section describes subjective tests aimed at evaluating the participants’ quality perception of two sets of compressed images: (i) pictures compressed by Facebook Mobile; and (ii) pictures compressed by the Cogisen plug-in integrated in the Facebook Mobile compression settings. Three tests – Test 1, Test 2 and Test 3 – were conducted to respectively evaluate the 15 %, the 30 %, and the 45 % gain of the Cogisen plug-in over the Facebook Mobile compression amount. Participants were recruited through the Prolific Academic platformFootnote 3, which is a crowdsourcing platform for psychological research. Participants who completed the whole test were rewarded with a £1.50 payment.

4.1 Material

The images used for all three tests were obtained from high quality pictures selected by the Colourlab Image Database: Image Quality (CIDIQ)Footnote 4 [8]. The CIDIQ has 23 images, with varying attributes including hue, saturation, lightness and contrast. The resolution of all images is 800 pixels by 800 pixels.

Fourteen reference stimuli were selected from a group of 23 high-quality pictures. Each picture was compressed by Facebook Mobile and then additionally compressed by the Cogisen plug-in. Each testing session consisted of 8 high-quality reference images, 8 Facebook Mobile compressed pictures and 8 Cogisen compressed pictures. Each test consisted of a total of 37 trials: 4 trials for the testing sequence, 5 trials for the dummy sequence, 24 images for the testing session, and 4 attention checks, i.e., low quality compressed pictures placed twice into the test to check participants’ attention level. If participants assigned a significantly different rating to the same attentional picture, they were excluded from data analysis. The sequence presentation was randomized to avoid clustering of the same pictures.

4.2 Procedure

Participants are asked to answer a preliminary questionnaire and follow setting requirements as described in Sect. 3.3. The three tests consisted of assigning a quality rating to pictures by dragging the slider on a 1–100 quality scale, using the same methodology that was designed and validated in Phase I.

4.3 Subjects

Test 1: 15 % Gain Over Facebook Mobile.

In Test 1 there were 29 volunteers (mean age = 36.6 years old, 53.3 % males, Italian speakers = 26, English speakers = 3), 16 non-expert viewers (mean age = 33.3, 38.8 % males) and 13 expert users (mean age = 34.6, 69.2 % males) who took part in the subjective tests. All the tests were completed in a single session between June 30, 2015 and July 6, 2015. The post-screening of the subjective test scores consisted of determining if the participants met the preliminary requirements (no vision impairments, only personal computers, maximum brightness on). Viewers who did not correctly pass the training session were discarded. Nine participants’ data sets were deleted from the subjective database. Finally, the screened subjective database included the scores provided by a total of 20 subjects.

Test 2: 30 % Gain Over Facebook Mobile.

In Test 2, 37 subjects (mean age = 27 years old, 64.8 % male, 100 % English speakers), 31 non-expert viewers (mean age = 27.2, 64.5 % male) and 5 expert users (mean age = 26.4, 80 % male) took part in the tests. All the tests were completed in a single session in October 07, 2015. The screened subjective database included the scores provided by a total of 33 subjects, mean age = 26.5 years old, 69.7 % male, 48.4 % indoor with natural lights; 51.6 % indoor with artificial lights). Four subjects were excluded before data analysis because they did not pass the preliminary requirements. Since 2 outliers were excluded after a descriptive analysis, results refer to 31 subjects’ data.

Test 3: 45 % Gain Over Facebook Mobile.

Thirty-five subjects (mean age = 28.5 years old, 45.7 % males, 100 % English speakers), 32 non-expert viewers (mean age = 28.2, 46.8 % males) and 3 expert users (mean age = 26.7, 33.3 % males) took part in the tests. All the tests were completed in a single session in October 27, 2015. The screened subjective database included the scores provided by a total of 31 subjects, mean age = 29 years old, 41.9 % males, 45.2 % in-door with natural lights; 54.8 % indoor with artificial lights). Since 4 outliers were excluded after the descriptive analysis, results of Test 3 refer to 27 subjects’ data.

4.4 Results

Opinion Scores.

The mean opinion scores (MOS) were calculated for each subject. The Difference Mean Opinion Scores (DMOS) were obtained by calculating the difference between the MOS of reference images and the MOS of the related processed images (Table 1).

Table 1. Subjects’ mean opinion scores and difference mean opinion scores

Cogisen Pictures Compared to Facebook Mobile Pictures: Difference Mean Opinion Scores.

The Pearson linear correlation was calculated between the DMOS assigned to Cogisen compressed stimuli and the DMOS assigned to Facebook Mobile compressed stimuli. Results show high correlation coefficients, which means that the Cogisen pictures are greatly correlated with the Facebook Mobile pictures (Test 1: R = 0.944; p = 0.000; Test 2: R = 0.943; p = 0.000; Test 3: R = 0.845; p = 0.000).

Within Subjects Comparisons.

For each test, the effects of (i) compression level (ii) expertise (iii) lighting condition on the participants’ performance, and of (iv) position of trials into the testing sequence were investigated.

Test 1: 15 % gain over Facebook Mobile

  1. i.

    Compression level effect. The repeated measures ANOVA shows no significant difference in DMOS assigned to Cogisen compressed stimuli compared to the DMOS assigned to Facebook Mobile compressed stimuli (F(1,19) = 3.551; p > 0.05).

  2. ii.

    Expertise effect. The one-way ANOVA shows no effect of expertise on difference mean opinion scores (F(1,19) = 0.238; p > 0.05).

  3. iii.

    Lighting condition effect. The one-way ANOVA shows no significant difference in the DMOS assigned in four different lighting conditions (indoors with natural lights, indoors with artificial lights, outdoors with natural lights, outdoors with artificial lights), F(2,17) = 0.712; p > 0.05.

Test 2: 30 % gain over Facebook Mobile

  • i. Compression level effect. The repeated measures ANOVA shows a significant difference in DMOS assigned to Cogisen compressed stimuli compared to the DMOS assigned to Facebook Mobile com-pressed stimuli (F(1, 30) = 0.067; p > 0.05).

  • ii. Expertise effect. The one-way ANOVA shows no effect of expertise on difference mean opinion scores (F(1,30) = 0.699; p > 0.05).

  • iii. Lighting condition effect. The one-way ANOVA shows no significant difference in the DMOS assigned in two different lighting conditions (indoors with natural lights, indoors with artificial lights), (F(1,30) = 1.36; p > 0.05).

  • iv. Position effect. In test 2 and 3, it was investigated if the participants’ performance has been influenced by the position of the pictures into the testing sequence. Multiple linear regression analysis showed that the position of the stimuli on the test was not able to predict the subjects’ answers (R2 = 0.151, F(1,23) = 0.3.89, p > 0.05; β = −0.388, p > 0.05). Therefore, no differences were found between the answers given in the first half of the study and those assigned in the second half.

Test 3: 45 % gain over Facebook Mobile

  • i. Compression level effect. The repeated measures ANOVA shows no significant difference in DMOS assigned to Cogisen pictures compared to the DMOS assigned to Facebook Mobile pictures (Wilks’ lambda = F(1, 26) = 2.476; p > 0.05).

  • ii. Expertise effect. The one-way ANOVA shows no effect of expertise on difference mean opinion scores (F(1,26) = 1.321; p > 0.05).

  • iii. Lighting condition effect. The one-way ANOVA shows no significant difference in the DMOS as-signed in two different lighting conditions (indoors with natural lights, indoors with artificial lights), (F(1,26) = 0.011; p > 0.05).

  • iv. Position effect. It was investigated the role of trials position into the test following the same methodology as Test 2. Multiple linear regression analysis showed that the position of the stimuli on the test was not able to predict the subjects’ answers (R2 = 0.039, F(1,23) = 0.885; p > 0.05; β = −0.197; p > 0.05).

5 Discussion

No significant difference was found between the opinion scores assigned to Cogisen pictures compared to those assigned to Facebook Mobile pictures, meaning that the difference mean opinion scores assigned to Cogisen pictures were not significantly higher than those assigned to JPEG stimuli. No significant effects of expertise, lighting conditions and stimuli position were found on the image quality assessment performance. Table 2 shows a summary of main results.

Table 2. Summary of the main results obtained in Test 1, 2 and 3

6 Conclusion

In this work, it was evaluated the subjective quality perception of pictures compressed by a new plug-in developed by Cogisen which can be integrated into the compression settings of mobile applications such as Facebook Mobile. Three different groups of viewers were asked to assess the quality of both images compressed by the Cogisen plug-in and pictures compressed with the Facebook Mobile application. Since subjective evaluation does not refer to absolute values, participants were asked to assess the quality of high quality reference pictures randomly shown during the test. The experimental design followed the Single Stimulus Continuous Quality Scale (SSCQS) method. The tests were administered by means of a web-platform that was validated by comparing results with subjective data obtained with a traditional method. Findings highlight that web-based solutions can be as reliable as traditional methods for both recruiting participants and administering tests, thus improving efficiency and inexpensiveness of collecting data. The results of this study show that the Cogisen compression plug-in can be applied to JPEG compressed images with no significant impact on the perceived quality up to a gain of 45 % file size reduction compared to Facebook Mobile. This means that the Cogisen quality perception model allows for immediate compression gains by highly reducing transmission bandwidth and storage requirements for mobile systems and, at the same time, avoiding a poor user experience. The next step is to also evaluate the Cogisen plug-in for video compression by applying methods provided by International Recommendations for Subjective Video Quality Assessment [7]. The question of how users subjectively perceive and evaluate the quality of compressed videos is a priority for further investigation.