Perceptual visual security assessment by fusing local and global feature similarity
Introduction
Perceptual encryption [1], also known as selective encryption, is an effective technique used for multimedia big data encryption. Due to the huge amount of big data in multimedia applications [2], [3], [4], [5], [6], [7], [8], encryption is a time-consuming operation [9], [10], [11]. To meet the real-time requirement of multimedia processing and transmission applications, only a portion of the multimedia data is selected and encrypted based on the characteristics of visual content. This method is called as perceptual/selective encryption. It can simultaneously maintain the confidentiality of multimedia content and reduce the computational overhead. Therefore, many researchers have been devoted to perceptual encryption of images and videos in the past decade [12], [13], [14], [15], [16].
However, the security degree varies significantly in different application scenarios. For instance, lightly encrypted versions of previews are provided in entertainment applications; whereas strong encryption is mandatory in some sensitive applications. Scalable selective encryption methods [17], [18], [19], [20] were proposed to encrypt multimedia data with tunable security degrees. Thus, an objective metric is necessary for the evaluation of the security degree and also can be used to optimize the encryption algorithms and their parameters. Moreover, since the human visual system (HVS) is the ultimate recipient of encrypted images, subjective evaluation is the most accurate and reliable way of assessing visual security. However, subjective tests conducted by human reviewers are laborious and even impracticable for real-time and automatic applications. Therefore, a perceptual visual security assessment (PVSA) metric which takes HVS into account is essential for the evaluation of visual security of selectively encrypted images.
Many efforts have been made to investigate the PVSA metrics. Initially, since there are few metrics dedicated to PVSA, well-known image quality assessment (IQA) metrics, such as peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [21], and visual information fidelity (VIF) [22], are adopted to evaluate the visual security degree of selectively encrypted images [23], [24], [25]. PSNR [26] is designed based on the mean square error (MSE) of signals and is widely used in image quality assessment. However, the HVS characteristics are not taken into account in the MSE-based metric. In consideration of the HVS, SSIM [21] measures the similarities of contrast, structure, and luminance. It is also employed to evaluate the visual security [23], [24], [25]. Moreover, the metric VIF [22] which quantifies the loss of visual information is also used to evaluate the visual security degree. However, these IQA metrics are originally concentrated on the evaluation of image quality, and the HVS response to image quality is not exactly consistent with that of visual security. For example, images with poor visual quality may not necessarily have low content leakage scores [27]. Thus, these IQA metrics generally demonstrate unsatisfactory performance on the evaluation of visual security.
Therefore, several PVSA metrics are proposed to aim at the evaluation of perceptual visual security. In [28], the edge similarity score (ESS) and luminance similarity score (LSS) are computed to evaluate the encrypted videos. In [29], the similarity of the pixel neighborhood is calculated to evaluate the visual security of cipher-image. In [30], the similarities of luminance and local gradient are computed to evaluate the encrypted videos. In [31], a local-entropy-based metric is proposed to evaluate the visual security of cipher-images. In [32], a visual security metric, named VSI-canny, is designed based on the weighted edge similarity and texture similarity. As a variant of VSI-canny, NMVSI [33] adds the wavelet-based frequency information. In [34], the image visual security is evaluated by measuring the image naturalness, structure, and texture. In [35], an image visual importance pooling strategy is proposed to weight the similarities of gradient magnitude and texture feature.
These PVSA metrics are designed mainly based on the similarity of structural information, which is represented by the local contrast features, such as edge, texture, gradient. In these metrics, an encrypted image that has a high spatial structural similarity with the original image is considered to have a low security degree. However, visual security is mainly concerned with the leakage of important visual information. These local features are not effective enough to represent the important visual information, such that structural similarity based on the local features cannot exactly measure the visual security. More specifically, the spatial correspondence between the encrypted image and the plain image is seriously disorganized by some other encryption schemes, such as Arnold's cat map [36] and Chaos pseudorandom generator [37]. In that case, the encrypted images have low structure similarity with their plain images, whereas the important visual information is also recognizable. Therefore, for these encryptions, structural similarity based on local features cannot exactly express the visual security degree. Moreover, it is known that many encryptions are performed on the frequency domain and the HVS is sensitive to the changes of various frequency components. The local features cannot cover a wide range of frequencies, such that they may lead to a bias of visual security assessment.
Motivated by the aforementioned problems, in this paper, we propose a PVAS metric by fusing local and global feature similarity. The local feature is used to evaluate the structural similarity and the global feature similarity is adopted to measure the leakage of important visual information. The features are mainly computed based on multi-scale local binary pattern (MLBP) feature. Local binary pattern (LBP) is a powerful texture feature that encodes the local occurrences of various patterns in the neighborhood of each pixel [38,39]. We denote the pattern of each pixel as local pattern. The LBP codes are then built in a histogram for texture classification [40,41] and face recognition [42]. We call the LBP histogram as global feature. Moreover, to detect the changes of a wide range of frequencies, the frequency similarity is computed based on low-pass weighted discrete cosine transform (DCT) frequency components. Note that both the LBP histogram similarity and frequency similarity are robust to the damage of spatial correspondence.
The proposed metric includes three aspects: the local pattern similarity, the global LBP histogram similarity, and the frequency similarity. Firstly, the local pattern similarity is calculated as a normalized Hamming distance between the binary LBP codes. The local pattern conveys the spatial structural information which is highly consistent with the HVS perception. Secondly, the global histogram similarity is computed as the similarity of the LBP histogram and is used to measure the leakage of important visual information. Thirdly, to effectively detect the changes of various frequency components, a low-pass weighted DCT frequency similarity is also adopted to evaluate the visual security degree. Experiments are tested on two typical encrypted image databases. The results show that the proposed metric achieves significantly higher performance and stronger robustness than the state-of-the-art metrics.
The rest of the paper is organized as follows. The motivation of the proposed PVSA metric is provided in Section 2. The details of the proposed metric are given in Section 3. Experiments are provided in Section 4 to validate the efficiency of the proposed method. Finally, this paper is concluded in Section 5.
Section snippets
Motivation
The basic idea of PVSA is to yield a score of an encrypted image that is highly consistent with the perception of HVS. Therefore, the key to this work is to select representative features that can effectively characterize the HVS perception of visual security.
It is reported that, when viewing an image, the HVS perception is highly related with the structural information.
Hence the similarity of structural information is widely used in the PVSA metrics [28], [29], [30]. Local spatial contrast
Proposed PVSA metric
In this section, we describe the proposed PVSA metric. Firstly, we introduce the overall framework of the proposed metric. Secondly, we present the MLBP-based similarity, including multi-scale LBP representation, global histogram similarity, and local pattern similarity. Thirdly, we propose a low-pass DCT frequency similarity. Finally, we give the overall measurement.
Experiments
In this section, the experimental results are presented and discussed. Firstly, we introduce the experimental conditions, including databases, evaluation indicators, and parameter settings. Secondly, we present the performance of the proposed metric and compare it with the state-of-the-art metrics. Thirdly, we discuss the performance of different types of encryptions. Finally, we analyze the contribution of each aspect included in the proposed metric.
Conclusion
This paper proposes a PVSA metric by fusing the local and global feature similarity. It declares that a PVSA metric is concerned with not only the distortion of structural information but also the leakage of important visual information and the change of various frequency components. In the proposed metric, the distortion of the structural information is evaluated based on the LBP binary codes, and the leakage of important visual information is measured based on the global LBP histogram.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper which entitled:
Perceptual Visual Security Assessment by Fusing Local and Global Feature Similarity
Jian Xiong received the Ph.D. degree in single and information processing from the University of Electronic Science and Technology of China, Chengdu, China, in 2015. Currently, he is an assistant professor with the College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China. His current research interests include image and video coding, point cloud compression, and computer vision.
References (55)
- et al.
Underwater image dehazing using joint trilateral filter
Comput Electr Eng
(2014) - et al.
Numerical and experimental study on the maneuverability of an active propeller control based wave glider
Appl Ocean Res
(2020) - et al.
A novel image encryption scheme based on a linear hyperbolic chaotic system of partial differential equations
Signal Process Image Commun
(2013) - et al.
Reversible data hiding in encrypted images using cross division and additive homomorphism
Signal Process Image Commun
(2015) - et al.
A visually secure image encryption scheme based on compressive sensing
Signal Process
(2017) - et al.
A comparative study of texture measures with classification based on featured distributions
Pattern Recognit
(1996) Efficient selective encryption for jpeg 2000 images using private initial table
Pattern Recognit
(2006)- et al.
On the design of partial encryption scheme for multimedia content
Math Comput Model
(2013) - et al.
Multimedia security handbook
(2004) - et al.
Brain intelligence: go beyond artificial intelligence
Mob Netw Appl
(2018)
Color-guided depth recovery from rgb-d data using an adaptive autoregressive model
IEEE Trans Image Process
Cdnet: cnn-based cloud detection for remote sensing imagery
IEEE Trans Geosci Remote Sens
Construction of a hierarchical feature enhancement network and its application in fault recognition
IEEE Trans Ind Inf
Chinese image captioning via fuzzy attention-based densenet-bilstm
ACM Trans Multimed Comput Commun Appl
Visual protection of HEVC video by selective encryption of CABAC binstrings
IEEE Trans Multimed
A novel selective image encryption method based on saliency detection
HEVC selective encryption using RC6 block cipher technique
IEEE Trans Multimed
CASM: a content-aware protocol for secure video multicast
IEEE Trans Multimed
An ROI privacy protection scheme for h.264 video based on fmo and chaos
IEEE Trans. Inf Forensics Secur
Selective and scalable encryption of enhancement layers for dyadic scalable h.264/avc by scrambling of scan patterns
A tunable selective encryption scheme for H.264/AVC
A tunable encryption scheme and analysis of fast selective encryption for CAVLC and CABAC in H.264/AVC
IEEE Trans Circt Syst Video Technol
A tunable selective encryption scheme for h.265/hevc based on chroma ipm and coefficient scrambling
IEEE Trans Circt Syst Video Technol
Image quality assessment: from error visibility to structural similarity
IEEE Trans Image Process
Image information and visual quality
IEEE Trans. Image Process
A joint signal processing and cryptographic approach to multimedia encryption
IEEE Trans Image Process
Secure advanced video coding based on selective encryption algorithms
IEEE Trans Consum Electron
Cited by (0)
Jian Xiong received the Ph.D. degree in single and information processing from the University of Electronic Science and Technology of China, Chengdu, China, in 2015. Currently, he is an assistant professor with the College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China. His current research interests include image and video coding, point cloud compression, and computer vision.
Xinzhong Zhu works in China Aerospace Science and Technology Corporation. He devotes to satellite data processing related research and product development planning and organization. Currently, Zhu serves as the deputy chief engineer of Shanghai Aerospace Eighth Academy.
Jie Yuan, senior engineer, graduated from Nanjing University of Science and Technology with master's degree in control theory and control engineering in 2013. Currently he works in China Aerospace Science and Technology Corporation and devotes to satellite image data processing related research and product development. Currently, Yuan acts as the head of department image processing technology.
Ran Shi joined The Chinese University of Hong Kong as a Research Assistant in 2012, and obtained his Ph.D. in Electronic Engineering in 2017. Currently, he is an assistant professor at the School of Computer Science and Engineering, Nanjing University of Science and Technology. His research interests include object segmentation, visual quality evaluation, interactive segmentation and salient object detection.
Hao Gao is a Professor at the College of Automation, College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing, China. His current research areas are artificial intelligence and computer vision and he has published more than 50 internatinal journal and conference papers. He has served as the editorial member/referee for many international journals.
This work was supported in part by the National Natural Science Foundation of China (No. 61701258, No. 61931012, No. 61801219, and No. 61906098).
This paper is for regular issues of CAEE. Reviews processed and approved for publication by the co-Editor-in-Chief Huimin Lu.