Elsevier

Digital Signal Processing

Volume 91, August 2019, Pages 11-20
Digital Signal Processing

Free-energy principle inspired visual quality assessment: An overview

https://doi.org/10.1016/j.dsp.2019.02.017Get rights and content

Abstract

The free energy principle was proposed several years ago as a unified justification for some brain theories. The process of human perception, cognition, action, and learning can be well explained using the free energy principle. The free energy principle suggests that the human perception and understanding of a given scene can be modeled as an active inference process, and the human brain tries to explain the scene using an internal generative model. The discrepancy between the given image or view and its best internal generative model explainable part is upper bounded by the free energy of the inference process. It was then conjectured that perceptual quality of the input image is closely related to free energy value of the process. Following this framework, dozens of visual quality assessment techniques have been proposed in the last few years and many have achieved state of the art performance. In this paper, we first give an overview of the free energy principle and then review the free energy principle inspired visual quality assessment metrics with a comparison in terms of algorithm design and performance.

Introduction

Vision is the most important information source for human beings. Human visual system (HVS) allows us to perceive the outside world by using the light induced retinal stimulation. However, it should be noted that the retinal stimulation is by no means what we finally see, since our brain works a series of psychological inference on the inputs. Hermann von Helmholtz pointed out in 1860s that “vision is an outcome of inferences, it is a process of making assumptions and drawing conclusions from (partial) sensory data” [1], [2]. It is now widely believed by the researchers in psychology, cognitive science and neuroscience that the physiological and psychological mechanisms of perception in human brain are intrinsic interactions between the retinal stimuli and human vision. In the fields of perceptual quality assessment, it was realized that the design of image quality metrics with purely signal processing approach, i.e. without consideration of the intrinsic interactions, is fundamentally limited in its resembling of the working mechanism of the HVS [3].

In order to model the active inference process in the HVS, we resort to the free energy principle [4], [5] which was proposed as a unification of several brain theories about human perception, learning and action. The second law of thermodynamics asserts that an isolated system tends to a disorder state and its entropy will increase over time, until reaching a maximum value at the equilibrium state. However, all biological agents try to maintain their internal states at low entropy level, as a prerequisite of being alive. The free energy principle suggests that the living agents achieve this goal by minimizing the free energy of the process, which turns out to be an upper bound of the total “surprise” encountered in different environments [4].

The free energy principle points out that free energy can be evaluated by a biological agent using its internal generative model and external sensory inputs. For visual perception, the brain gives predictions of those encountered scenes in a constructive manner with the internal generative model. The generative model can be decomposed into likelihood multiplies a prior and visual perception is therefore a process of inverting the likelihood towards the posterior possibilities of the given scene. Since a brain cannot possibly hold a universal generative model for all visual scenes, there will always be a discrepancy between the external visual input and its generative model explainable part. And the psychovisual quality of a scene can be defined using the agreement between the scene and its predicted version using the internal generative model that best describes the scene. Studies show that the primal visual system processes the visual inputs in multi-channel multi-resolution manner [6]. And many successful quality assessment algorithms are based on HVS plausible decomposition of visual signals [7]. It was argued that the free energy principle inspired visual quality metric differs fundamentally from this decomposition based methods and works in a “synthesis” way with visual signal inputs [3].

A key task to the design of free energy principle inspired visual quality metric, is the construction of the internal generative model. A good generative model should be able to approximate any given visual input with high precision, mimicing the working mechanism of the human brain. Following a pioneer work in [3], many algorithms used linear autoregressive (AR) model as an approximation of the optimal internal generative model. For example, AR model has been used as the internal generative model in full-reference (FR) quality metrics [8], [9], reduced-reference (RR) metrics [10], [11] no-reference (NR) metrics [12], [13] and comparative quality assessment [14]. However, it is noticed that under most circumstances, the AR model used in those quality metrics is not “optimal” in the statistical sense because the model order and its 2D shape is fixed. A more favorable AR model [15] can be designed using the minimum description length principle [16]. Dictionary learning and sparse representation was proposed as alternatives for improved accuracy [17], [18], [19].

The internal generative model predicted image is generally considered as the “ordered” portion of the scene in the sense that an “order” is inherently or explicitly depicted by the model. Meanwhile, the discrepancy between the model explainable part and the input, i.e. the model prediction residual is deemed as the “disordered” portion of the scene. In the view of dividing the image into ordered- and disordered-parts, the idea of computing the model prediction and discrepancy can also be fulfilled without using an explicit model. For instance, a bilateral filter [20] can be used to separate image into two layers with plain (ordered) and detailed (disordered) information [21], [22], [23], [24].

The rest of the paper is organized as follows. In Section 2, we review the free energy principle and relate it to psychovisual quality assessment. The free energy principle inspired visual quality assessment algorithms are surveyed and compared in Section 3. Finally, Section 4 concludes the paper with some discussions.

Section snippets

The free energy principle [3]

As introduced in Section 1, the internal generative model is of crucial importance to the implementation of the free energy principle. We defined the parametric internal generative model as G, which explains perceived scenes by adjusting the vector θ of parameters. Given a visual stimuli (an image) I, the ‘surprise’ can be computed by integrating over the space of model parameters θ the joint distribution P(I,θ|G)logP(I|G)=logP(I,θ|G)dθ. Since the joint distribution of the parameter and

Free energy inspired visual quality measures

We first give an overview of the free energy inspired visual quality measures. The algorithms are categorized in Fig. 2 according to the application domains, i.e. general purpose, distortion specific and application specific. Table 1 further gives details of the algorithms including the choice of internal generative model and the key ideas.

Conclusion and discussions

This survey gives an overview of the free energy principle inspired visual quality metrics. The free energy brain theory was briefly introduced with the idea of its application in visual quality assessment was reviewed. The free energy principle inspired visual quality assessment algorithms in general purpose tasks i.e., full-reference, reduced-reference and no-reference visual quality assessment and distortion/application-specific tasks was then listed and compared in terms of basic ideas and

Conflict of interest statement

The authors stated no conflict of interest.

Acknowledgement

The authors want to thank the support from the National Science Foundation of China under Grants 61831015, 61527804 and 61671301.

Guangtao Zhai received the B.E. and M.E. degrees from Shandong University, Shandong, China, in 2001 and 2004, respectively, and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2009. From 2006 to 2007, he was a Student Intern with the Institute for Infocomm Research, Singapore. From 2007 to 2008, he was a Visiting Student with the School of Computer Engineering, Nanyang Technological University, Singa- pore. From 2008 to 2009, he was a Visiting Student with the

References (63)

  • D. Hubel et al.

    Receptive fields and functional architecture in the cat's visual cortex

    J. Neurosci.

    (1962)
  • J. Wu et al.

    Perceptual quality metric with internal generative mechanism

    IEEE Trans. Image Process.

    (2013)
  • N. Liu et al.

    Free energy adjusted peak signal to noise ratio (fea-psnr) for image quality assessment

    Sensing and Imaging

    (2017)
  • J. Wu et al.

    Reduced-reference image quality assessment with visual information fidelity

    IEEE Trans. Multimed.

    (2013)
  • W. Zhu et al.

    Multi-channel decomposition in tandem with free-energy principle for reduced-reference image quality assessment

    IEEE Trans. Multimed.

    (2019)
  • K. Gu et al.

    No-reference image quality assessment metric by combining free energy theory and structural degradation model

  • K. Gu et al.

    Using free energy principle for blind image quality assessment

    IEEE Trans. Multimed.

    (2015)
  • G. Zhai et al.

    Comparative image quality assessment using free energy minimization

  • X. Wu et al.

    Adaptive sequential prediction of multidimensional signals with applications to lossless image coding

    IEEE Trans. Image Process.

    (2011)
  • M.H. Hansen et al.

    Model selection and the principle of minimum description length

    J. Am. Stat. Assoc.

    (2001)
  • Y. Liu et al.

    Reduced-reference image quality assessment in free-energy principle and sparse representation

    IEEE Trans. Multimed.

    (2018)
  • Y. Liu et al.

    Perceptual image quality assessment combining free-energy principle and sparse representation

  • Z. Han et al.

    A reduced-reference quality assessment scheme for blurred images

  • C. Tomasi et al.

    Bilateral filtering for gray and color images

  • K. Gu et al.

    Evaluating quality of screen content images via structural variation analysis

    IEEE Trans. Vis. Comput. Graph.

    (2018)
  • K. Gu et al.

    No-reference quality assessment of screen content pictures

    IEEE Trans. Image Process.

    (2017)
  • V. Jakhetiya et al.

    A prediction backed model for quality assessment of screen content and 3-d synthesized images

    IEEE Trans. Ind. Inform.

    (2018)
  • Z. Che et al.

    Reduced-reference quality metric for screen content image

  • G.E. Hinton et al.

    Keeping the neural networks simple by minimizing the description length of the weights

  • D.J. MacKay

    Ensemble learning and evidence maximization

  • W. Penny et al.

    Bayesian methods for autoregressive models

  • Cited by (36)

    • Dual-quality map based no reference image quality assessment using deformable convolution

      2022, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      However, subjectively labeling the IQA database is a time-consuming task, and the available databases are too small for training deep models. The free energy principle implies that the human brain tries to explain a given scene using an internal generative model which is an active inference process [12]. Inspired by this theory, researchers proposed intermediate target-based models.

    • Blind light field image quality assessment by analyzing angular-spatial characteristics

      2021, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      During the past decades, a large number of subjective and objective Image Quality Assessment (IQA) studies have been conducted. An overview of perceptual IQA has been given in [7,8]. In general, depending on whether the original image information is available, objective IQA methods can be divided into three categories: Full-Reference (FR), Reduced-Reference (RR) and No-Reference (NR).

    • Joint model of gradient magnitude and Gabor features via Spatio-Temporal slice

      2021, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      This survey serves as a benchmark for quality assessment problems in the field of visual communication for IQA researchers. In another related paper, the author’s [12] point out many model inspired visual quality assessment algorithms in general purpose task i.e., full-reference, reduce-reference, and no-reference visual quality assessment by the use of free energy principle. The free energy principle was proposed as a unification of several brain theories about human perception, learning and action.

    • Perceptual coding scheme for ultra-high definition video based on perceptual noise channel model

      2021, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      The HVS is a highly complex information processing mechanism, which involves biology, neurophysiology, psychology, and so on [20]. According to free-energy principle and brain theory, all biological agents try to maintain their internal states by minimizing disorder in constantly changing environments [21]. Specifically, the perception and understanding of outside scene is an active inference process.

    • Opinion-unaware blind picture quality measurement using deep encoder–decoder architecture

      2020, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      Picture quality measurement (PQM), which is one of the classical yet very challenging issues in the fields of picture processing, machine learning, and computer vision, has been attracting increasing attention in this rapidly growing research area because it is an important technology that can be used to accurately measure the quality of output pictures in numerous practical applications including picture acquisition, transmission, compression, reproduction, and enhancement [1–5].

    View all citing articles on Scopus

    Guangtao Zhai received the B.E. and M.E. degrees from Shandong University, Shandong, China, in 2001 and 2004, respectively, and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2009. From 2006 to 2007, he was a Student Intern with the Institute for Infocomm Research, Singapore. From 2007 to 2008, he was a Visiting Student with the School of Computer Engineering, Nanyang Technological University, Singa- pore. From 2008 to 2009, he was a Visiting Student with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada, where he was a Post-Doctoral Fellow from 2010 to 2012. From 2012 to 2013, he was a Humboldt Research Fellow with the Institute of Multimedia Communication and Signal Processing, Friedrich Alexander University of Erlangen Nurem- berg, Erlangen, Germany. Since 2012, he has been with the Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, where he is currently a Professor. His research interests include multimedia signal processing and perceptual signal processing. He was a recipient of the Award of National Excellent Ph.D. Thesis from the Ministry of Education of China in 2012 and the Best Paper Award of IEEE TRANSACTIONS ON MULTIMEDIA in 2018. His students received Best Student Paper Awards at the PCS 2015 and the IEEE ICME 2016. He and his students received the Grand Prize of the ICME 2018 Grand Challenge on Saliency 360!

    Xiongkuo Min received the B.E. degree from Wuhan University, Wuhan, China, in 2013, and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2018. From 2016 to 2017, he was a Visiting Student with the Department of Electrical and Computer Engineering, University of Waterloo, Canada. He is currently a Post-Doctoral Fellow with Shanghai Jiao Tong University. His research interests include image quality assessment, visual attention modeling, and perceptual signal processing. He received the Best Student Paper Award from the IEEE ICME 2016.

    Ning Liu received Ph.D degree from Shanghai Jiao Tong University in 2010 and is now an associate professor in the School of Electronics, Information and Electrical Engineering, Shanghai Jiao Tong University.

    View full text