Privacy-preserving human action recognition as a remote cloud service using RGB-D sensors and deep CNN
Introduction
Evolution of deep learning has strengthened the expert systems to perform human action recognition very precisely, (Ronao & Cho, 2016). Human action recognition has been studied for decades and is still a very popular topic due to broad real-world applications, such as video retrieval, visual surveillance, human-computer interaction, and robotics for human behavior characterization, (Mabrouk & Zagrouba, 2018). However, rich hardware and software resources along with a team of specialized persons are required to maintain an expert system for human action recognition. This restricts the real world utility of this system as mid-level organizations are not capable enough to maintain an in-house system for a long time. Recently, cloud service providers have addressed this problem by introducing pay-as-you-go models. Examples include Azure Machine Learning,1 IBM Watson Machine Learning,2 etc. These services relieve users from infrastructure maintenance responsibility by outsourcing their data for deep learning based advanced facilities over the cloud. However, the massive data collection required for deep learning presents privacy issues.
The personal and highly sensitive user data, such as photos and video recordings, are indefinitely stored by companies that collect them. The images and video recordings often contain accidentally captured sensitive items including faces, license plates, computer screens, etc. which lead to privacy loss. For example, an organization that may want to apply cloud-based deep learning techniques to identify suspicious actions in its critical areas is prevented by privacy concerns from sharing its surveillance data and thus benefiting from large-scale deep learning. The emerging privacy-paradox studies, (Ooi, Hew, & Lin, 2018) and (Hew, Tan, Lin, & Ooi, 2017), have also shown ambiguous user behavior regarding their privacy. In this situation, privacy and confidentiality restrictions significantly reduce the use of necessary facilities offered by cloud service providers. To overcome this problem, strong cryptographic techniques can be used to secure user data before sending it to the cloud. The solution works well for secure storage. However, it introduces the challenge of data processing in encrypted form, known as Encrypted Domain (ED). We address these problems and propose a novel approach for secure outsourcing of user data for a cloud-based expert system.
The proposed approach ensures data privacy and enables user to access deep learning model for human action recognition over the cloud. The application of the proposed approach can be found in a wide range of situations, where surveillance data contains sensitive information. These include secure and automatic identification of suspicious human actions in a parking lot, Intensive Care Unit (ICU) and other sensitive places, secure monitoring of human actions at traffic signals, bank counters, Active and Assisted Living (AAL) systems for smart homes, etc.
In the context of multimedia data, there exist various methods to assure image privacy, including mosaic, blurring, scrambling, and encryption. Each method is unique and finds its scope in situation specific solutions. For example, automatic face blurring was introduced by YouTube3 in 2012. The big software giant assured identity preservation by blurring human’s faces in the video content. For example, securing the identity of activists involved in a protest march. However, this approach does not fulfill our privacy goal as there can be multiple video frames with similar visual information that may enable an adversary to obtain significant information from non-protected parts, such as background objects. The next method, namely image mosaicing, secures image information by creating big pixel-like patches known as pixelation. Though, there is a risk as some recent attack models have concentrated to re-identify identity related information from obfuscated image parts, (McPherson, Shokri, & Shmatikov, 2016). The remaining two methods, namely, image scrambling and encryption, focuses on securing full image information with standard cryptographic methods. However, pixel information is found to be distorted to a greater extent which results an infeasible situation for data processing.
In this paper, our primary objective is to design a novel approach for full image obfuscation in a manner such that automatic human action recognition can be performed without revealing any identity related information over the cloud. We achieve this by transforming the secret image into extremely low resolution, e.g. 224 × 224 to 14 × 14, using a Position based superpixel transformation. The proposed transformation is designed to concentrate a group of pixels into a single composition value in accumulation with random noise. As a result, the identity related spatial information of the underlying image is reduced to a greater extent, making it a challenging task to relate two obfuscated images for an adversary. This successfully removes the chances of re-identification attack. As compared to existing image obfuscation methods, the proposed approach successfully improves image security with higher recognition accuracy in “security-recognition accuracy (%)” trade-off relation.
Recently, a few research works, (Dai, Saghafi, Wu, Konrad, Ishwar, 2015, Ryoo, Kim, Yang, 2018, Ryoo, Rothrock, Fleming, Yang, 2017) and Chou et al. (2018), have been proposed that use extremely low resolution images to achieve privacy. Their main idea is to resize the original high resolution image into extremely low resolution so that the sensitive information, such as face of the person, number plate of a vehicle, etc. can be obfuscated. The authors used the resized training data sample to train a deep learning model for secure recognition. However, some drawbacks are observed, which are described as follows -
- 1.
Intrinsic information is lost due to extreme low resolution, restricting the model to learn in a constrained manner as only RGB (Red-Green-Blue) images are considered, and
- 2.
There is a risk of privacy as resulting low resolution images are generated in an analogue manner that falls in the category of full image mosaicing. This makes cloud service provider to identify similar images, even if they are claimed to be secure. Recently, McPherson et al. (2016) described this as a potential risk in their article entitled Defeating Image Obfuscation with Deep Learning by demonstrating how deep learning models can be used to re-identify the sensitive information. The authors claimed higher re-identification accuracies over four standard datasets, namely MNIST, CIFAR-10, AT&T, and FaceScrub.
We address the first drawback by utilizing RGB and depth data along with a four channel deep Convolutional Neural Network (CNN). Considering several advantages of depth data as compared to using RGB solely, we use depth data as the second modality to overcome the data restriction problem.
In the context of re-identification attack which has been discussed as the second drawback, we propose a non-invertible position-based superpixel transformation for image obfuscation. McPherson et al. (2016) demonstrated that neural networks can automatically discover relevant features, and can learn to exploit correlation in the obfuscated image. This problem is further increased with low resolution images used in state-of-the-art schemes as they are generated in an analogue manner. On the other hand, unlike existing privacy-preserving schemes of (Dai, Saghafi, Wu, Konrad, Ishwar, 2015, Ryoo, Kim, Yang, 2018, Ryoo, Rothrock, Fleming, Yang, 2017) and (Chou et al., 2018) that used simple image resizing for obfuscation, the proposed transformation is accumulated with random noise. Due to this, an adversary cannot identify two encrypted images, generated from the same secret image. To summarize, the major contributions of this paper are as follows -
- •
As compared to state-of-the-art works of (Dai, Saghafi, Wu, Konrad, Ishwar, 2015, Ryoo, Kim, Yang, 2018, Ryoo, Rothrock, Fleming, Yang, 2017) and (Chou et al., 2018), the proposed transformation is accumulated with additional noise. This results in improved security, the robustness of which is validated using several statistical and differential tests.
- •
Unlike previous schemes that utilized RGB data, we use depth maps in integration with RGB as the second modality. Therefore, for a video of t number of frames, only four images, that is, one Motion History Image (MHI) and three Depth Motion Maps (DMMs) are secured by transforming them into extremely low resolution. As a result, the data overhead is significantly reduced.
- •
A four channel deep CNN is used corresponding to MHI and DMMs. The respective output is then fused, resulting in more accurate recognition.
- •
The proposed approach outperforms other image obfuscation methods in “security-recognition accuracy (%)” trade-off relation.
We review the existing privacy-preserving schemes in Section 2. A brief overview of the proposed approach is provided in Section 3. Section 4 presents detailed description of the proposed transformation, followed by complete methodology and model description in Section 5. Recognition results and security analysis are presented in Sections 6 and 7 respectively. Analysis of data efficiency, achieved by the proposed approach is discussed in Section 8. Section 9 provides a comprehensive discussion of all results reported in the paper with future directions. Finally the paper is concluded in Section 10.
Section snippets
Related work
In this section, we first provide a brief overview of existing schemes that support secure data processing and then move towards the recent developments for privacy-preserving human action recognition.
System overview
The proposed approach is designed to run in cloud environment, and operations required for its functioning are classified as per treat model and cloud server modalities. Any user accessible device, located at the actual field where the multimedia is temporarily stored and secured for further transmission to the cloud server, falls in the category of treat model. The rest of the tasks, that is, secure storage and recognition, are performed at the cloud server.
The cloud server needs to deploy a
Position-based superpixel transformation for image obfuscation
In order to secure image information, we propose a position-based superpixel transformation f. Superpixel is the process of clustering connected pixels in an image with similar features so that an abstract image can be obtained. It finds wide scope in various applications such as medical image segmentation, (Kitrungrotsakul, Han, & Chen, 2015), image retrieval, (Haas, Donner, Burner, Holzer, Langs, 2011, Stutz, Hermans, Leibe, 2018), dataset annotation, (Liu et al., 2013), etc. However, unlike
Privacy-preserving human action recognition as a service
The two primary objectives of the proposed approach are: (i) Obfuscating image information at treat model, and (ii) performing secure human action recognition over cloud server. The functionality is organized into two phases which are described as follows -
Recognition results
In this section, we provide network details and validate effectiveness of the proposed approach over standard UTD-MHAD dataset.
Security analysis
Motivated by the recent work of (McPherson et al., 2016), demonstrating the risk of re-identification, we propose to obfuscate full image information using position based superpixel transformation in this paper. A scheme is considered as secure if no adversary can break it with likelihood altogether more noteworthy than irregular speculating. We achieve higher obfuscation by accumulating random noise during position based super pixel transformation, which results an extremely low resolution
Data overhead
The existing cloud-based paradigm requires the active involvement of communication network for data transmission. Therefore, the data overhead should be very less so that a real time system can be supported. The state-of-the-art privacy-preserving human action recognition schemes utilized extremely low resolution images to achieve obfuscation. Also, the data expansion, caused due to huge multimedia dimensions is reduced. For example, a colored video with 500 frames each of size 224 × 224
Discussion
In this paper, we have emphasized on the development of a novel approach for secure human action recognition using position based superpixel transformation. The primary objective of the proposed approach is to obfuscate image information by resizing it into extremely low resolution with added noise. This prevents re-identification attack. A four stream deep CNN is utilized for recognition purpose, where each stream is based on pretrained MobileNet. After a series of experiments, we found a
Conclusion
In this paper, a privacy-preserving human action recognition approach has been proposed. We have emphasized the feasibility to perform effective human action recognition with higher security and least storage overhead. Initially, recent vulnerabilities related to image obfuscation are discussed along with existing work and their problems. Next, image obfuscation is formulated by computing MHI and DMMs of the underlying video sequences and transforming them to extremely low resolution with
Declaration of Competing Interest
All authors declare that they have no conflict of interest regarding the publication of this manuscript.
CRediT authorship contribution statement
Amitesh Singh Rajput: Conceptualization, Methodology, Writing - original draft. Balasubramanian Raman: Supervision, Validation, Resources. Javed Imran: Methodology, Writing - review & editing, Validation.
Acknowledgments
We would like to thank the editor and external reviewers for their thoughtful and detailed comments on our paper. We would also like to thank Information Security Education and Awareness (ISEA) Project (phase II), MeitY, Government of INDIA for the necessary support.
References (42)
- et al.
Protecting the privacy of humans in video sequences using a computer vision-based de-identification pipeline
Expert Systems with Applications
(2017) - et al.
Combining cnn streams of rgb-d and skeletal data for human activity recognition
Pattern Recognition Letters
(2018) - et al.
Upper approximation based privacy preserving in online social networks
Expert Systems with Applications
(2017) - et al.
A privacy-aware feature selection method for solving the personalization–privacy paradox in mobile wellness healthcare services
Expert Systems with Applications
(2015) - et al.
Collusion-resistant and privacy-preserving p2p multimedia distribution based on recombined fingerprinting
Expert Systems with Applications
(2017) - et al.
A survey on using domain and contextual knowledge for human activity recognition in video streams
Expert Systems with Applications
(2016) - et al.
Privacy-preserving collaborative recommendations based on random perturbations
Expert Systems with Applications
(2017) - et al.
Human activity recognition with smartphone sensors using deep learning neural networks
Expert Systems with Applications
(2016) - et al.
Secure data deduplication using secret sharing schemes over cloud
Future Generation Computer Systems
(2018) - et al.
Superpixels: An evaluation of the state-of-the-art
Computer Vision and Image Understanding
(2018)
Semantic human activity recognition: a literature review
Pattern Recognition
The recognition of human movement using temporal templates
IEEE Transactions on Pattern Analysis and Machine Intelligence
A simple public-key cryptosystem with a double trapdoor decryption mechanism and its applications
International conference on the theory and application of cryptology and information security
Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor
International conference on image processing (icip)
Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning
Conference on computer vision and pattern recognition (cvpr)
Privacy-preserving action recognition for smart hospitals using low-resolution depth images
Towards privacy-preserving recognition of human activities
International conference on image processing (icip)
Imagenet: A large-scale hierarchical image database
Conference on computer vision and pattern recognition (cvpr)
Information fusion for human action recognition via biset/multiset globality locality preserving canonical correlation analysis
IEEE Transactions on Image Processing
Superpixel-based interest points for effective bags of visual words medical image retrieval
Miccai international workshop on medical content-based retrieval for clinical decision support
Deep residual learning for image recognition
Conference on computer vision and pattern recognition (cvpr)
Cited by (22)
A two-stage deep generative adversarial quality enhancement network for real-world 3D CT images
2022, Expert Systems with ApplicationsHuman activity recognition using temporal convolutional neural network architecture
2022, Expert Systems with ApplicationsCitation Excerpt :One drawback of HAR recognition using RGB-D is the lack of privacy in obtaining the information. For this reason, Rajput et al. (2020) developed a deep CNN method in the cloud with the priority of preserving privacy. In their method, they obfuscated the information by using Motion History Images (MHI) and three Depth Motion Maps (DMM).
Complex Network-based features extraction in RGB-D human action recognition
2022, Journal of Visual Communication and Image RepresentationCitation Excerpt :Boissiere, et al. [52] proposed a modular network combining skeleton and infrared data and a pre-trained 2D convolutional neural network (CNN) expressed as a pose module to extract features from skeleton data. Rajput, et al. [53] proposed a novel cloud-based approach to securely recognize human activities. They considered color and depth data, and secured them using position based super pixel transformation.
Secure and Privacy-Preserving Human Interaction Recognition of Pervasive Healthcare Monitoring
2023, IEEE Transactions on Network Science and EngineeringExploiting Security Issues in Human Activity Recognition Systems (HARSs)
2023, Information (Switzerland)