Perceptual visual security assessment by fusing local and global feature similarity

https://doi.org/10.1016/j.compeleceng.2021.107071Get rights and content

Abstract

Selective encryption is an effective technique for multimedia big data encryption. A perceptual metric that takes the human visual system (HVS) into account is necessary for evaluating the visual security degrees of selectively encrypted images. Existing metrics are mainly based on the similarity of structural information which is represented by local spatial contrast features. However, visual security is concerned with not only the distortion of structural information but also the leakage of important visual information. The local features cannot exactly express the leakage of important visual information. This paper presents a perceptual visual security assessment metric by fusing local and global feature similarity. Considering the HVS response, the proposed metric measures in three aspects: the distortion of structural information, the leakage of important visual information, and the changes of frequency components. To measure the distortion of structure information, local pattern similarity is calculated based on the normalized Hamming distance between the local binary pattern (LBP) binary codes. The similarity of the global LBP histogram is computed to evaluate the leakage of important visual information. A lowpass weighted discrete cosine transform (DCT) frequency similarity is presented to detect the changes of various frequency components. Experimental results demonstrate that the proposed metric achieves significantly higher performance and stronger robustness than the state-of-the-art metrics.

Introduction

Perceptual encryption [1], also known as selective encryption, is an effective technique used for multimedia big data encryption. Due to the huge amount of big data in multimedia applications [2], [3], [4], [5], [6], [7], [8], encryption is a time-consuming operation [9], [10], [11]. To meet the real-time requirement of multimedia processing and transmission applications, only a portion of the multimedia data is selected and encrypted based on the characteristics of visual content. This method is called as perceptual/selective encryption. It can simultaneously maintain the confidentiality of multimedia content and reduce the computational overhead. Therefore, many researchers have been devoted to perceptual encryption of images and videos in the past decade [12], [13], [14], [15], [16].

However, the security degree varies significantly in different application scenarios. For instance, lightly encrypted versions of previews are provided in entertainment applications; whereas strong encryption is mandatory in some sensitive applications. Scalable selective encryption methods [17], [18], [19], [20] were proposed to encrypt multimedia data with tunable security degrees. Thus, an objective metric is necessary for the evaluation of the security degree and also can be used to optimize the encryption algorithms and their parameters. Moreover, since the human visual system (HVS) is the ultimate recipient of encrypted images, subjective evaluation is the most accurate and reliable way of assessing visual security. However, subjective tests conducted by human reviewers are laborious and even impracticable for real-time and automatic applications. Therefore, a perceptual visual security assessment (PVSA) metric which takes HVS into account is essential for the evaluation of visual security of selectively encrypted images.

Many efforts have been made to investigate the PVSA metrics. Initially, since there are few metrics dedicated to PVSA, well-known image quality assessment (IQA) metrics, such as peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [21], and visual information fidelity (VIF) [22], are adopted to evaluate the visual security degree of selectively encrypted images [23], [24], [25]. PSNR [26] is designed based on the mean square error (MSE) of signals and is widely used in image quality assessment. However, the HVS characteristics are not taken into account in the MSE-based metric. In consideration of the HVS, SSIM [21] measures the similarities of contrast, structure, and luminance. It is also employed to evaluate the visual security [23], [24], [25]. Moreover, the metric VIF [22] which quantifies the loss of visual information is also used to evaluate the visual security degree. However, these IQA metrics are originally concentrated on the evaluation of image quality, and the HVS response to image quality is not exactly consistent with that of visual security. For example, images with poor visual quality may not necessarily have low content leakage scores [27]. Thus, these IQA metrics generally demonstrate unsatisfactory performance on the evaluation of visual security.

Therefore, several PVSA metrics are proposed to aim at the evaluation of perceptual visual security. In [28], the edge similarity score (ESS) and luminance similarity score (LSS) are computed to evaluate the encrypted videos. In [29], the similarity of the pixel neighborhood is calculated to evaluate the visual security of cipher-image. In [30], the similarities of luminance and local gradient are computed to evaluate the encrypted videos. In [31], a local-entropy-based metric is proposed to evaluate the visual security of cipher-images. In [32], a visual security metric, named VSI-canny, is designed based on the weighted edge similarity and texture similarity. As a variant of VSI-canny, NMVSI [33] adds the wavelet-based frequency information. In [34], the image visual security is evaluated by measuring the image naturalness, structure, and texture. In [35], an image visual importance pooling strategy is proposed to weight the similarities of gradient magnitude and texture feature.

These PVSA metrics are designed mainly based on the similarity of structural information, which is represented by the local contrast features, such as edge, texture, gradient. In these metrics, an encrypted image that has a high spatial structural similarity with the original image is considered to have a low security degree. However, visual security is mainly concerned with the leakage of important visual information. These local features are not effective enough to represent the important visual information, such that structural similarity based on the local features cannot exactly measure the visual security. More specifically, the spatial correspondence between the encrypted image and the plain image is seriously disorganized by some other encryption schemes, such as Arnold's cat map [36] and Chaos pseudorandom generator [37]. In that case, the encrypted images have low structure similarity with their plain images, whereas the important visual information is also recognizable. Therefore, for these encryptions, structural similarity based on local features cannot exactly express the visual security degree. Moreover, it is known that many encryptions are performed on the frequency domain and the HVS is sensitive to the changes of various frequency components. The local features cannot cover a wide range of frequencies, such that they may lead to a bias of visual security assessment.

Motivated by the aforementioned problems, in this paper, we propose a PVAS metric by fusing local and global feature similarity. The local feature is used to evaluate the structural similarity and the global feature similarity is adopted to measure the leakage of important visual information. The features are mainly computed based on multi-scale local binary pattern (MLBP) feature. Local binary pattern (LBP) is a powerful texture feature that encodes the local occurrences of various patterns in the neighborhood of each pixel [38,39]. We denote the pattern of each pixel as local pattern. The LBP codes are then built in a histogram for texture classification [40,41] and face recognition [42]. We call the LBP histogram as global feature. Moreover, to detect the changes of a wide range of frequencies, the frequency similarity is computed based on low-pass weighted discrete cosine transform (DCT) frequency components. Note that both the LBP histogram similarity and frequency similarity are robust to the damage of spatial correspondence.

The proposed metric includes three aspects: the local pattern similarity, the global LBP histogram similarity, and the frequency similarity. Firstly, the local pattern similarity is calculated as a normalized Hamming distance between the binary LBP codes. The local pattern conveys the spatial structural information which is highly consistent with the HVS perception. Secondly, the global histogram similarity is computed as the similarity of the LBP histogram and is used to measure the leakage of important visual information. Thirdly, to effectively detect the changes of various frequency components, a low-pass weighted DCT frequency similarity is also adopted to evaluate the visual security degree. Experiments are tested on two typical encrypted image databases. The results show that the proposed metric achieves significantly higher performance and stronger robustness than the state-of-the-art metrics.

The rest of the paper is organized as follows. The motivation of the proposed PVSA metric is provided in Section 2. The details of the proposed metric are given in Section 3. Experiments are provided in Section 4 to validate the efficiency of the proposed method. Finally, this paper is concluded in Section 5.

Section snippets

Motivation

The basic idea of PVSA is to yield a score of an encrypted image that is highly consistent with the perception of HVS. Therefore, the key to this work is to select representative features that can effectively characterize the HVS perception of visual security.

It is reported that, when viewing an image, the HVS perception is highly related with the structural information.

Hence the similarity of structural information is widely used in the PVSA metrics [28], [29], [30]. Local spatial contrast

Proposed PVSA metric

In this section, we describe the proposed PVSA metric. Firstly, we introduce the overall framework of the proposed metric. Secondly, we present the MLBP-based similarity, including multi-scale LBP representation, global histogram similarity, and local pattern similarity. Thirdly, we propose a low-pass DCT frequency similarity. Finally, we give the overall measurement.

Experiments

In this section, the experimental results are presented and discussed. Firstly, we introduce the experimental conditions, including databases, evaluation indicators, and parameter settings. Secondly, we present the performance of the proposed metric and compare it with the state-of-the-art metrics. Thirdly, we discuss the performance of different types of encryptions. Finally, we analyze the contribution of each aspect included in the proposed metric.

Conclusion

This paper proposes a PVSA metric by fusing the local and global feature similarity. It declares that a PVSA metric is concerned with not only the distortion of structural information but also the leakage of important visual information and the change of various frequency components. In the proposed metric, the distortion of the structural information is evaluated based on the LBP binary codes, and the leakage of important visual information is measured based on the global LBP histogram.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper which entitled:

Perceptual Visual Security Assessment by Fusing Local and Global Feature Similarity

Jian Xiong received the Ph.D. degree in single and information processing from the University of Electronic Science and Technology of China, Chengdu, China, in 2015. Currently, he is an assistant professor with the College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China. His current research interests include image and video coding, point cloud compression, and computer vision.

References (55)

  • J. Yang et al.

    Color-guided depth recovery from rgb-d data using an adaptive autoregressive model

    IEEE Trans Image Process

    (2014)
  • J. Yang et al.

    Cdnet: cnn-based cloud detection for remote sensing imagery

    IEEE Trans Geosci Remote Sens

    (2019)
  • Z. Chen et al.

    Construction of a hierarchical feature enhancement network and its application in fault recognition

    IEEE Trans Ind Inf

    (2020)
  • H. Lu et al.

    Chinese image captioning via fuzzy attention-based densenet-bilstm

    ACM Trans Multimed Comput Commun Appl

    (2020)
  • Z. Shahid et al.

    Visual protection of HEVC video by selective encryption of CABAC binstrings

    IEEE Trans Multimed

    (2014)
  • W. Wen et al.

    A novel selective image encryption method based on saliency detection

  • A.I. Sallam et al.

    HEVC selective encryption using RC6 block cipher technique

    IEEE Trans Multimed

    (2018)
  • Hao Yin et al.

    CASM: a content-aware protocol for secure video multicast

    IEEE Trans Multimed

    (2006)
  • F. Peng et al.

    An ROI privacy protection scheme for h.264 video based on fmo and chaos

    IEEE Trans. Inf Forensics Secur

    (2013)
  • Z. Shahid et al.

    Selective and scalable encryption of enhancement layers for dyadic scalable h.264/avc by scrambling of scan patterns

  • Y. Wang et al.

    A tunable selective encryption scheme for H.264/AVC

  • Y. Wang et al.

    A tunable encryption scheme and analysis of fast selective encryption for CAVLC and CABAC in H.264/AVC

    IEEE Trans Circt Syst Video Technol

    (2013)
  • F. Peng et al.

    A tunable selective encryption scheme for h.265/hevc based on chroma ipm and coefficient scrambling

    IEEE Trans Circt Syst Video Technol

    (2020)
  • Zhou Wang et al.

    Image quality assessment: from error visibility to structural similarity

    IEEE Trans Image Process

    (2004)
  • H.R. Sheikh et al.

    Image information and visual quality

    IEEE Trans. Image Process

    (2006)
  • Y. Mao et al.

    A joint signal processing and cryptographic approach to multimedia encryption

    IEEE Trans Image Process

    (2006)
  • S. Lian et al.

    Secure advanced video coding based on selective encryption algorithms

    IEEE Trans Consum Electron

    (2006)
  • Cited by (0)

    Jian Xiong received the Ph.D. degree in single and information processing from the University of Electronic Science and Technology of China, Chengdu, China, in 2015. Currently, he is an assistant professor with the College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China. His current research interests include image and video coding, point cloud compression, and computer vision.

    Xinzhong Zhu works in China Aerospace Science and Technology Corporation. He devotes to satellite data processing related research and product development planning and organization. Currently, Zhu serves as the deputy chief engineer of Shanghai Aerospace Eighth Academy.

    Jie Yuan, senior engineer, graduated from Nanjing University of Science and Technology with master's degree in control theory and control engineering in 2013. Currently he works in China Aerospace Science and Technology Corporation and devotes to satellite image data processing related research and product development. Currently, Yuan acts as the head of department image processing technology.

    Ran Shi joined The Chinese University of Hong Kong as a Research Assistant in 2012, and obtained his Ph.D. in Electronic Engineering in 2017. Currently, he is an assistant professor at the School of Computer Science and Engineering, Nanjing University of Science and Technology. His research interests include object segmentation, visual quality evaluation, interactive segmentation and salient object detection.

    Hao Gao is a Professor at the College of Automation, College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing, China. His current research areas are artificial intelligence and computer vision and he has published more than 50 internatinal journal and conference papers. He has served as the editorial member/referee for many international journals.

    This work was supported in part by the National Natural Science Foundation of China (No. 61701258, No. 61931012, No. 61801219, and No. 61906098).

    This paper is for regular issues of CAEE. Reviews processed and approved for publication by the co-Editor-in-Chief Huimin Lu.

    View full text