Abstract
The rapid development of multimedia tools has changed the digital world drastically. Consequently, several new technologies like virtual reality, 3D gaming, and VFX (Visual Effects) have emerged from the concept of computer graphics. These technologies have created a revolution in the entertainment world. However, photorealistic computer generated images can also play damaging roles in several ways. This paper proposes a deep learning based technique to differentiate computer generated images from photographic images. The idea of transfer learning is applied in which the weights of pre-trained deep convolutional neural network DenseNet-201 are transferred to train the SVM to classify the computer generated images and photographic images. The experimental results performed on the DSTok dataset show that the proposed technique outperforms other existing techniques.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Digital forensics
- DenseNet-201
- Deep convolutional neural network
- Computer generated images
- SVM
- Photographic image
- Transfer learning
1 Introduction
In today’s era of digital technology, digital images have become a primary carrier of information. At the same time, an unprecedented involvement of the digital images with misinformation or fake news can be seen on the social media platforms [33]. When an image with false or misleading information goes viral on social media it can disrupt social harmony. Moreover, political parties are using social media for their election campaigning purpose frequently. From the study conducted by Machado et al. [16], it has been observed that more fake posts are shared on these platforms during such political campaigning. This study revealed that 13.1% of Whatsapp posts were fake during the Brazilian presidential elections. Recently, uncountable fake posts were shared globally on social media related to the novel coronavirus and COVID-19 (coronavirus disease 2019) pandemic [11]. There are various ways to tamper with an image. The most common types of forgeries are copy-move forgery [17] and image splicing forgery [18]. In the last decades, several methods [1, 19, 20], were developed to detect these forgeries.
The invention of Computer Generated (CG) imagery has enabled various new technologies like virtual reality, 3D gaming, and VFX (Visual Effects). These technologies are widely used in the fields of the film industry, education, and medicine. Though, there are so many good applications of CG images, the computer generated images that were created with malicious intent may create the problems. The problem becomes worse when the CG image is highly photorealistic, as human eyes cannot differentiate between the CG image and actual photographic (PG) image. GAN (Generative Adversarial Network) tool can generate CG images with high photorealism. A good collection of CG images that are generated using GAN is available on the website www.thisartworkdoesnotexist.com. CG detection has become an open area for research. In past, a good number of techniques were presented to distinguish the CG and PG images. Recently, Meena and Tyagi [21] have surveyed the existing methods that were developed to distinguish the CG images from PG images. This survey paper discussed various methods from the literature, and then these methods were grouped into four classes: acquisition process based, visual feature based, statistical feature based, and deep learning based.
The methods based on the deep Convolutional Neural Network (CNN) have gained unprecedented success for image classification. A series of deep convolutional neural networks were proposed in the last few years to solve the different challenging problems. This paper proposes a deep learning based technique to discriminate between CG and PG images. The contributions of this paper are twofold; first, a fully automated model based on the deep CNN DenseNet-201 and transfer learning is proposed to mitigate the laborious task of designing the hand-crafted features; second, to the best of our knowledge, first-time DenseNet-201 network is used to solve this task. The proposed technique shows comparatively better detection accuracy and lower time complexity.
2 Related Works
The recent survey paper [21] has discussed a total of 52 state-of-the-art techniques available in the literature, therefore, a brief summary of the related works is presented in this section. The existing techniques to identify PG and CG images can be categorized as traditional or hand-crafted feature based, and deep learning based. The existing hand-crafted feature based techniques have two basic steps, feature extraction, and classification. In the past, the authors have explored various feature extraction mechanisms and classifiers to improve the detection accuracy of their proposed methods. Conversely, in deep learning based techniques, the image features are leaned using a specific neural network. Generally, in deep learning based techniques feature extraction and classification steps are performed in a single step by CNN.
Lyu et al. [15] designed a statistical model based on the first-order and higher-order wavelet statistics to identify CG and PG images. Two supervised machine learning methods, linear discrimination analysis and Support Vector Machine (SVM) were used for the classification task. A low CG detection rate (71%) was the main drawback of this method. Wu et al. [35] put forward a technique based on histogram features to solve this problem. This method achieved a good detection accuracy of up to 95.3%. However, this method was evaluated on a comparatively small image dataset. Fan et al. [6] employed contourlet transform to propose an approach to discriminate between the CG and PG images. This method follows a statistical model similar to [15], but in place of wavelet transform the authors have used contourlet transform. The authors have recommended that the HSV color model can improve the detection accuracy. Wang et al. [34] designed a technique based on color quaternion wavelet transform. Recently, Meena and Tyagi [22] developed an approach to detect CG and PG images based on Tetrolet transform and Neuro-fuzzy classifier.
Cui et al. [4] proposed a deep learning based approach to distinguish the CG and PG images. This approach first applies high-pass filters to pre-process all the images in the dataset. After that, this model was trained on the pre-processed images. He et al. [9] combined CNN and recurrent neural network to propose a model to detect CG and PG images. The detection accuracy of this method was 93.87% on the image dataset comprising 6,800 CG and 6,800 PG images. A CNN based framework to classify CG and PG images was introduced by He [8]. In this method, the author has explored two different networks VGG-19 and ResNet-50. Rezende et al. [30] have proposed a deep learning based model where they have used ResNet-50 network as a feature extractor. In this method for classification purposes, several classifiers such as softmax, SVM, k-nearest-neighbor were investigated. Meanwhile, Quan et al. [29] developed a CNN model to classify CG and PG images. This model was trained from scratch on the Columbia image dataset [25]. Recently, Ni et al. [26] have presented a comprehensive survey of the deep learning based CG detection methods.
3 The Proposed Technique
The overview of the proposed technique is presented in Fig. 1. The following subsections will describe the major steps of the proposed technique.
3.1 Pre-processing
Generally, the image datasets comprise the images of various pixel resolutions. However, the proposed technique can work on the images of size 224 × 224 pixels. The only reason for selecting image size as 224 × 224 pixels is that the DenseNet-201 is trained on the images of size 224 × 224 pixels. Therefore, as pre-processing we have resized all the images in the DSTok dataset [32] before training the network. Similar to [30], the mean RGB value that was computed over the ImageNet dataset is subtracted pixel-wise from each image in the DSTok dataset.
3.2 DenseNet-201 Network
Several numbers of deep CNNs are available online with the pre-trained models. Some of the popular deep CNNs are AlexNet (2012) [29], VGG-16 (2014) [8], VGG-19 (2014) [8], ResNet-50 (2015) [30], Inception-v3 (2015) [10], Xception (2016) [10], DenseNet-121 (2017) [10], and DenseNet-201 (2017) [10]. A large number of applications of these networks can be seen in areas like data science, image processing, computer vision, and digital image forensics [21]. More specifically, VGG-19 and ResNet-50 have been used for CG detection in [8], and [30] respectively. Recently, Cui et al. [4] also developed a CNN model for CG and PG image classification purposes. In this paper, many experiments were performed with a varying number of hidden layers in CNN. Based on these experiments the authors have suggested that the detection accuracy can be improved if a CNN with more number of layers is used. Therefore, in this paper, we have tried to solve the problem of CG detection by utilizing the recently proposed very deep CNN DenseNet-201.
The complete layer-wise architecture of the DenseNet-201 is shown as a gray rectangular box in the left part of Fig. 1. From the architecture, it can be observed that this network comprises a total of 201 layers. There are four dense blocks and three transition layers. Each dense block is a combination of a different number of convolutional layers. Whereas, each transition layer contains a 1 × 1 convolutional layer followed by a 2 × 2 average pooling layer. The second last layer is a 7 × 7 global average pooling layer. Finally, the last layer is a fully-connected softmax layer with 1000 neurons as this network was primarily designed to classify the images into 1000 categories.
3.3 Transfer Learning
Two main challenges arise if we try to train a deep CNN based technique from scratch. First, an enormous amount of data is required to train the model effectively; otherwise, if the model is trained on small data it may show the sign of overfitting; second, the training process of the model on a very large dataset requires a huge computation power and time. It may require very high power multiple Graphical Processing Units (GPUs) with a huge amount of physical memory. Even after using such high power computers, the training process of a deep CNN model may take several hours or days. To overcome these two limitations, a concept of transfer learning has gained much attention in the last few years. In the transfer learning, the parameters of the pre-trained neural network (source network) that was trained for one particular task are transferred to the new neural network (target network) designed to solve the somewhat similar tasks.
During the transfer learning, there is always a scope of adjusting the number of parameters used from the particular number of layers. The proposed technique uses a very deep CNN DenseNet-201 that comprises a total of 201 layers. It becomes impractical to train such a deep CNN from the beginning, hence we have used the weights of the first 200 layers of pre-trained DenseNet-201. The transferred part of DenseNet-201 is denoted by the red dotted box in Fig. 1. Note that the DenseNet-201 network was trained on the very large image dataset namely ImageNet [5] that comprises over 1.28 million images for object classification in 1000 classes.
3.4 Classifier
The problem of distinguishing CG and PG is a binary classification problem; therefore we have employed a non-linear binary SVM classifier in place of the last layer of DenseNet-201 which is a fully connected softmax layer with 1000 neurons. Though there exist several variants of an SVM, we have used a non-linear binary SVM with Radial Basis Function (RBF) kernel. In an SVM there are two parameters Cost (C) and gamma (γ) that need to be set appropriately to manage the trade-off between variance and bias. A high value of γ may give better accuracy but results may be more biased, the reverse is also true. Similarly, a large value of C shows poor accuracy but low bias. The optimum values of these two parameters can be found using the grid-search method. Based on the experiments we have set the value of C as 10.0 and the value of γ as 0.001.
4 Experimental Results
4.1 Datasets
For evaluating and analyzing the performance of any technique the availability of the image dataset plays a crucial role. There are very few image datasets available to asses the effectiveness of the methods that are developed to discriminate between the CG and PG images. Ng et al. created the Columbia image dataset [25] to evaluate their method of CG detection in 2004. Most of the early works were evaluated only on this image dataset; this is because no other image dataset was available until 2013. There are two main drawbacks of this dataset: first, less number of images (800 CG and 800 PG images) is available in the dataset, and second, the CG images are less photorealistic. As deep learning based methods need more image data to train the model effectively, hence Columbia dataset was less relevant for evaluating the proposed technique. Due to this reason, the proposed approach is assessed on the well-designed image dataset that was created by Tokuda et al. [32]. This dataset is commonly referred to as the ‘DSTok’ dataset in the literature.
There are 4,850 CG images and 4,850 PG images in the DSTok dataset. The computer graphics images in this dataset are having high photorealism as compared to Columbia dataset. The CG images were collected from various sources such as gaming websites and screenshots of the latest 3D computer games. All the images are stored in JPEG format, and the physical size of these images varies from 12 KB to 1.8 MB. Figure 2 shows some of the example images from the DSTok dataset. The top row in Fig. 2 illustrates CG images whereas the bottom row shows the PG images.
4.2 Validation Protocol and Evaluation Metrics
Due to the hardware limitation, it becomes impractical to evaluate the proposed technique on the images in the actual size in the DSTok dataset. Thus, all the images in the DSTok dataset are resized to 224 × 224 pixels for all the experiments. The 5-fold cross-validation approach, similar to [32] and [30], is considered to analyze the proposed technique. Note that, the authors in [32] and [30] have also used the resized images of size 224 × 224 pixels. All 9,700 images (4,850 CG and 4,850 PG) are partitioned into five folds of equal size. Hence, each fold comprises 1940 images. In each cross-validation step, any four folds are used to train the model whereas the remaining other fold is used to test the model.
The proposed technique is assessed based on three metrics [3, 9]; True Positive Rate (TPR), True Negative Rate (TNR), and detection accuracy. These metrics are defined in Eq. 1–3.
The TPR represents the detection rate of computer generated images, and TNR represents the detection rate of photographic images. Whereas, the detection accuracy is a simple mean of TPR and TNR.
The ROC (Receiver Operating Characteristics) curve provides important visual information of a binary-classification model. The ROC curve is drawn between two metrics true-positive rates and false-positive rates. The value of AUC (Area Under Curve) is also used as an evaluation metric, and this metric is used to determine the effectiveness of the binary classification model.
4.3 Implementation Details
The proposed technique has been implemented using the Python deep learning library Keras v2.2.4 with Python v3.6.10. The TensorFlow-GPU v1.13.1 is used as a backend. A computer system with a configuration of 16Â GB RAM and Quadro RTX 4000 GPU from NVIDIA is used for all the experiments.
4.4 Results of the Proposed Technique
The detection accuracy and training time of the proposed technique is reported in Table 1. It can be observed that the average detection accuracy is 94.12%, and the average training time of the model is 835.30 s when the model is trained on the DSTok dataset. As there are a total of 9700 images in the DSTok dataset the average time to process an image of size 224 × 224 pixels is only 0.0861 s. Therefore the proposed technique can be used to distinguish the CG images from PG images in real-time. The ROC curve is shown in Fig. 3, it also shows the encouraging performance of our method as the obtained value of AUC is 0.9486 with a very small standard deviation of \( \pm \)0.0181. The small value of the standard deviation is the indication of the better stability of our approach. Moreover, the learning curve of the proposed technique is shown in Fig. 4, from where it can be observed that the cross-validation score is increasing with the size of the training dataset. Hence, it can be believed that the accuracy of the proposed technique can further be enhanced if the size of the training dataset is increased.
4.5 Comparison and Analysis of Results
The proposed technique can distinguish between the CG and PG images with the detection accuracy of 94.12%. The comparative results of our technique with the existing techniques are reported in Table 2. A total of 16 techniques were considered for this comparison, out of which two techniques, [8], and [30] are based on deep learning; whereas, the remaining 14 techniques are based on the hand-crafted features. As the validation protocol and experimental setup of the proposed technique are exactly same as [32], we have obtained the results of all the 14 hand-crafted based techniques from [32], whereas the results of [8], and [30] were taken from their respective original articles.
The rows in Table 2 are sorted according to the values of Acc in increasing order. The TPR and TNR values corresponding to the technique proposed by Rezende et al. [30] were not provided in the paper, therefore we have reported only the detection accuracy for this technique. It can be noticed that this technique shows the second best detection accuracy among all the referenced techniques. It can also be noticed that the detection accuracy of the proposed technique is greater than the detection accuracies of all the reported techniques. The proposed technique obtains the values of TPR and TNR as 93.6% and 94.6% respectively. The simultaneous higher values of these two parameters indicate that the proposed technique has shown balanced behavior while correctly predicting the more accurate results in each category. Additionally, the proposed technique can be used for real-time classification of CG and PG images.
5 Conclusion
The challenges to solve the problem of differentiating between computer generated images and photographic images are growing with the development of multimedia tools. Therefore, the techniques proposed so far have become less powerful to address this problem. This paper has introduced a technique to address this problem using the concept of deep learning. The very deep convolutional neural network DenseNet-201 was used as a feature extractor and then the support vector machine is applied as a classifier. The proposed technique achieved a detection accuracy of 94.12% on the DSTok dataset, which is higher than the detection accuracies of the existing techniques in the literature.
Additionally, the proposed technique can be used for real-time applications as it can process an image of size 224 × 224 pixels in 0.0861 s. In the future, the detection accuracy of the proposed technique can be improved further if the model is trained on the large training dataset. Furthermore, the proposed technique can also be modified to classify the computer generated images and photographic images, when the images are post-processed by various operations such as noise addition, image blurring, and contrast enhancement.
References
Ansari, M.D., Ghrera, S.P., Tyagi, V.: Pixel-based image forgery detection: a review. IETE J. Educ. 55(1), 40–46 (2014)
Candes, E., Donoho, D.L., Candès, E.J., Donoho, D.L.: Curvelets: a surprisingly effective nonadaptive representation of objects with edges. Curves Surf. Fitting C(2), 1–10 (2000)
Chen, W., Shi, Y.Q.: Identifying computer graphics using HSV color model and statistical moments of characteristic functions. In: IEEE International Conference on Multimedia, pp. 1123–1126 (2007)
Cui, Q., McIntosh, S., Sun, H.: Identifying materials of photographic images and photorealistic computer generated graphics based on deep CNNs. Comput. Mater. Continua 55(2), 229–241 (2018)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Fan, S., Wang, R., Zhang, Y., Guo, K.: Classifying computer generated graphics and natural images based on image contour information. J. Inf. Comput. Sci. 10(2010), 2877–2895 (2012)
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6), 610–621 (1973)
He, M.: Distinguish computer generated and digital images: a cnn solution. Concurr. Comput. Pract. Exp. 4788, 1–10 (2018)
He, P., Jiang, X., Sun, T., Member, S., Li, H.: Computer graphics identification combining convolutional and recurrent neural network. IEEE Signal Process. Lett. 25(9), 1369–1373 (2018)
Huang, G., Liu, Z., van der Maaten, L.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708 (2017)
McDonald, J.: Social Media Posts Spread Bogus Coronavirus Conspiracy Theory (2020). https://www.factcheck.org/2020/01/social-media-posts-spread-bogus-coronavirus-conspiracy-theory/. Accessed 24 Feb 2020
Kutyniok, G., Lim, W.-Q.: Compactly supported shearlets are optimally sparse. J. Approx. Theory 11, 1564–1589 (2011)
Li, W., Zhang, T., Zheng, E., Ping, X.: Identifying photorealistic computer graphics using second-order difference statistics. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 2316–2319 (2010)
Liebovitch, L.S., Toth, T.: A fast algorithm to determine fractal dimensions by box counting. Phys. Lett. A 141(8), 386–390 (1989)
Lyu, S., Farid, H.: How realistic is photorealistic? IEEE Trans. Signal Process. 53(2), 845–850 (2005)
Machado, C., Kira, B., Howard, P.N.: A study of misinformation in WhatsApp groups with a focus on the Brazilian presidential elections. In: WWW 2019: Companion Proceedings of the 2019 World Wide Web Conference, pp. 1013–1019 (2019)
Meena, K.B., Tyagi, V.: A copy-move image forgery detection technique based on Gaussian-Hermite moments. Multimed. Tools Appl. 78, 33505–33526 (2019)
Meena, K.B., Tyagi, V.: Image forgery detection : survey and future directions. In: Data, Engineering and applications, pp. 163–195 (2019)
Meena, K.B., Tyagi, V.: A copy-move image forgery detection technique based on tetrolet transform. J. Inf. Secur. Appl. 52, 102481–102490 (2020)
Meena, K.B., Tyagi, V.: A hybrid copy-move image forgery detection technique based on Fourier-Mellin and scale invariant feature transforms. Multimed. Tools Appl. 79(11), 8197–8212 (2020)
Meena, K.B., Tyagi, V.: Methods to distinguish photorealistic computer generated images from photographic images: a review. In: Singh, M., Gupta, P.K., Tyagi, V., Flusser, J., Ören, T., Kashyap, R. (eds.) ICACDS 2019. CCIS, vol. 1045, pp. 64–82. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-9939-8_7
Meena, K.B., Tyagi, V.: A novel method to distinguish photorealistic computer generated images from photographic images. In: 2019 Fifth International Conference on Image Information Processing (ICIIP), pp. 385–390 (2019)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Ng, T., Chang, S.: Distinguishing between natural photography and photorealistic computer graphics. IEEE Signal Process. Mag. 26(2), 49–58 (2009)
Ng, T., Chang, S., Hsu, J., Pepeljugoski, M.: Columbia Photographic Images and Photorealistic Computer Graphics Dataset. ADVENT Technical Report #205-2004-5, Columbia University (2005)
Ni, X., Chen, L., Yuan, L., Wu, G., Yao, Y.E.: An evaluation of deep learning-based computer generated image detection approaches. IEEE Access 7, 130830–130840 (2019)
Ojala, T., Pietikäinen, M., Mäenpää, T.: A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification. In: International Conference on Advances in Pattern Recognition (ICAPR), Brazil, pp. 399–408 (2001)
Popescu, A.C., Farid, H.: Exposing digital forgeries in color filter array interpolated images. IEEE Trans. Signal Process. 53(10), 3948–3959 (2005)
Quan, W., Wang, K., Yan, D.M., Zhang, X.: Distinguishing between natural and computer-generated images using convolutional neural networks. IEEE Trans. Inf. Forensics Secur. 13(11), 2772–2787 (2018)
De Rezende, E.R.S., Ruppert, G.C.S., Archer, C.T.I.R.: Exposing computer generated images by using deep convolutional neural networks. In: 30th SIBGRAPI Conference on Graphics, Patterns and Images, pp. 71–78 (2017)
Schwartz, W.R., da Silva, R.D., Davis, L.S., Pedrini, H.: A novel feature descriptor based on the shearlet transform. In: IEEE International Conference on Image Processing (ICIP), Belgium, pp. 1053–1056 (2011)
Tokuda, E., Pedrini, H., Rocha, A.: Computer generated images vs. digital photographs: a synergetic feature and classifier combination approach. J. Vis. Commun. Image Represent. 24(8), 1276–1292 (2013)
Tyagi, V.: Understanding Digital Image Processing. CRC Press, Boca Raton (2018)
Wang, J., Li, T., Luo, X., Shi, Y., Liu, R., Jha, S.K.: Identifying computer generated images based on quaternion central moments in color quaternion. IEEE Trans. Circ. Syst. Video Technol. 29(9), 2775–2785 (2018)
Wu, R., Li, X., Bin, Y.: Identifying computer generated graphics via histogram features. In: 18th IEEE International Conference on Image Processing, pp. 1973–1976 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Meena, K.B., Tyagi, V. (2020). A Deep Learning Based Method to Discriminate Between Photorealistic Computer Generated Images and Photographic Images. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T., Valentino, G. (eds) Advances in Computing and Data Sciences. ICACDS 2020. Communications in Computer and Information Science, vol 1244. Springer, Singapore. https://doi.org/10.1007/978-981-15-6634-9_20
Download citation
DOI: https://doi.org/10.1007/978-981-15-6634-9_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6633-2
Online ISBN: 978-981-15-6634-9
eBook Packages: Computer ScienceComputer Science (R0)