Abstract
Recently, interest has grown in using tattoos as a biometric feature for person identification. Previous works used handcrafted features for the tattoo identification task, such as SIFT. However, deep learning methods have shown better results than this kind of methods in many computer vision tasks. Taking into account that there are little research on tattoo identification using deep learning, we asses several publicly available CNNs models, pre-trained on large generic image databases, for the task of tattoo identification. We believe that, since tattoos mostly depict objects of the real world, their semantic and visual features might be related to those learned from a generic image database with real objects. Our experiments show that these models can outperform previous approaches without even fine-tuning them for tattoo identification. This allows developing tattoo identification applications with minimum implementation cost. Besides, due to the difficult access to public tattoo databases, we created two tattoo datasets and put one of them in public domain.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In recent years, several forensic techniques have been developed in order to identify victims and criminals in forensic scenarios [7]. These systems are mainly based on biometric traits such as the face, fingerprints and iris. However, there are many situations in which primary biometric traits like these are not available, and therefore it is necessary to resort to other types of information [11]. The so called “soft biometric traits” are physiological or behavioral characteristics that provide some identifying information about an individual [2], but lack distinctiveness and permanence to sufficiently differentiate any two individuals [11]. Eye color, gender, ethnicity, skin color, height, weight, hair color, scars, birthmarks and tattoos are examples of soft biometric traits. Several techniques have been proposed to identify or verify the identity of a person, automatically, based on soft biometric traits [2, 8]. In particular, person identification and retrieval systems based on tattoos have gained much interest in recent years [4, 16]. This is due to several reasons, including the fact that the tendency of people to have tattoos has increased [11]. Furthermore, tattoos provide additional information about the person: affiliation to groups or gangs, religious beliefs, years in prison, etc. These have been used to assist law enforcement authorities in investigations that lead to the identification of offenders and victims of natural disasters and accidents [11].
This paper focuses on the identification of individuals based on tattoos. Most of existing works in this topic [9, 11] propose to use handcrafted features to describe the images and then evaluate the similarity between two images by comparing their features. However, in recent years deep learning methods have shown better results than this kind of features in similar computer vision tasks [3]. That is the reason why some works [3, 4] have proposed the use of deep neural networks for tattoo identification.
Due to the great variability that usually exists between tattoos of different individuals, in [3], it is explored the idea of fine-tuning deep networks trained on large generic databases such as ImageNet [17] for the task of tattoo image classification considering two classes: Tattoo and Not-Tattoo. This database contains many varied images of a large number of object types, something similar to what happens with tattoos. Therefore, it has a certain logic to think that a neural network trained to discern the images that belong to the same class and those that do not, can also do it for tattoo images. This way, it is not necessary to train a network from scratch or count on the large volume of data that this training requires. Nevertheless, for the task of tattoo matching, the authors of [3] used a Siamese network, which has the disadvantage that it is necessary to perform the inference, every time a comparison is made between two images. This incurs in a high computational cost when used in an identification scenario since it is necessary to compare the image to be identified with all the images of a database. In contrast to that approach, if a network is used to extract features from the database images, they can be stored and then used for future comparisons. In this way, the network is executed only once for each requested identification. In this paper we adopt this strategy as a way to perform tattoo identification more efficiently.
On the other hand, recent papers [1] have shown that top layers of a large convolutional neural network (CNN) provide high-level descriptors of the visual content of the image. They proved that these descriptors can be used for different tasks other than that for which the network was trained. This allows using them effectively without the need of having large databases to train the network, which is the case for tattoos, where there are few public databases. Based on the previous ideas, the main contribution of our work is a study to assess the use of some deep neural networks trained on generic databases such as ImageNet, for tattoo identification. We extract the features from intermediate layers of these networks and used them as descriptors of the tattoo images. The difference with previous works [3, 4] is that we show that it is possible to achieve competitive results without training or even fine-tuning the networks.
The remainder of this paper is structured as follows. We briefly review related literature in Sect. 2. In Sect. 3, the details of the proposed deep networks for tattoo identification are provided. We show some experimental results in Sect. 4. Finally, we conclude this work in Sect. 5.
2 Related Works
The early practice of tattoo image retrieval relied on keywords or metadata based matching [4]. However, a keyword-based tattoo image retrieval has several limitations in practice: (i) Labels are insufficient for describing all visual information of a tattoo; (ii) multiple keywords may be needed to adequately describe a tattoo image; (iii) human annotation is subjective and different subjects can give dramatically different labels to the same tattoo image [4].
Due to these problems, interest has grown in developing content-based image retrieval techniques (CBIR) to improve the efficiency and accuracy of the tattoo search [9, 14]. CBIR aims to extract features such as edges, color and texture, that can reflect the content of an image, and use them to identify images with high visual similarity [4]. The scale-invariant feature transform (SIFT) [13] and its variants have been the most used among this kind of methods for tattoo identification [9, 14].
Recently, most researches are focusing on deep learning methods due to its success in many computer vision tasks [3]. In particular, AlexNet [10], which won the ImageNet challenge of 2012, has been successfully used for tattoo vs. non-tattoo classification in [3, 22]. Other works [21] focus on tattoo localization using Faster R-CNN. However, little research exists on tattoo image identification. To the best of our knowledge, only [3] and [4] studied the identification of tattoos using deep learning. In [3] a Siamese network pre-trained on ImageNet and fine-tuned for tattoo identification is used. As mentioned in the Introduction, the use of Siamese networks for matching is less efficient because it is required to perform the network inference for every comparison. This is not suitable for an identification scenario where it is necessary to match a query image with thousands of images in an operational database.
In [4] a network based on a Faster R-CNN was used to learn tattoo detection and a compact representation of the tattoo in the same network. The features that return this network are binarized in order to make search efficient. This work obtains comparable results with other state-of-the-art methods and generalizes well to other retrieval tasks. However, this network was trained with hundred of thousands of images, which are not available for everyone neither the computational resources to do it. Moreover, a pre-trained model of this network is not publicly available.
3 Generic Neural Networks for Tattoo Identification
The proposal of this paper is to use the features from intermediate layers of neural networks, trained on large generic databases, to describe the content of a tattoo image. These features should be matched with the features of a tattoo gallery in order to identify the tattoo.
In order to evaluate the proposal, we selected some neural networks that have obtained good results in the ImageNet classification challenge. It is worth noting that none of these networks was fine-tuned with images of tattoos as was done in [3], instead we used their publicly available models trained on ImageNet. We believe that, since tattoos mostly depict objects of the real world, their semantic and visual features might be related to those learned from a generic image database with real objects.
-
MobileNetV1 [5]: network designed to be used in mobile and embedded devices thanks to its low complexity. It is trained on ImageNet, where it obtained 70.6% classification accuracy.
-
MobileNetV2 [18]: improved version of MobileNetV1. It obtains 74.7% of accuracy, improving the previous version at a similar processing cost.
-
Inception21k [6]: it is trained on ImageNet, but with 21841 classes, unlike the 1000 that are generally used. It obtains 68.3% accuracy.
-
Resnet50_CVGJ [19]: variant with batch normalization [6] of the ResNet50 network trained on ImageNet.
-
VGG_CVGJ [19]: variant with batch normalization [6] of the VGG19 [20] network trained on ImageNet.
We also evaluate some networks that have been proposed for image retrieval. These are designed to return a feature vector as a descriptor of the visual content of the image instead of a classification output.
-
DeepBit [12]: network trained in an unsupervised manner to build high-level compressed features. During the training, restrictions were used so that the features met three requirements: invariance to the rotation of the image, high entropy of the features and high standard deviation.
-
SSDH [23]: network that constructs hash functions as a latent layer in a deep network. It was designed so that classification and retrieval were unified in a single learning model.
Both networks were designed so that their features were binarized allowing a more efficient matching using the Hamming distance. Both define a function to do that, however, we do not binarize the features because we obtained better results with the real values.
Table 1 shows the layer used for each network, as well as the vector dimension of output features. For DeepBit and SSDH we used the last layer, while for the other networks we report the layers that achieve better results.
All networks have a fixed input of 224 \(\times \) 224 pixels, except SSDH that has an input of 227 \(\times \) 227. The euclidean distance was used to match the features, except for DeepBit, for which was used the cosine distance that gaves much better results.
4 Experimental Evaluation
In this section we aim at evaluating and comparing the selected networks in the task of tattoo identification. Different tattoo databases have been used in the literature [15, 16], but most of them are not public or their access is difficult. In particular, most of the works have used the Tatt-C database [16] for experimenting, training and comparing with other methods, but it is not public any more due to legal issues. The authors of [4] created a large dataset named WebTattoo combining other tattoo datasets that they have access. However, this dataset is not public yet; should be released soon. Therefore, in order to validate our proposal, we have created our own datasets.
The conducted experiments were performed in a PC with a CPU Intel Core i7-4470 with 8 GB of RAM.
4.1 Proposed Databases and Evaluation Protocol
Two databases of tattoo images were created in order to evaluate the proposal. The first one (BIVTatt)Footnote 1 was collected by the authors of this paper and contains 210 images belonging to 159 individuals (some individuals have only one image). The second (PinTatt) is composed of 454 images downloaded from Pinterest belonging to 160 individuals. Images from BIVTatt have higher resolution than PinTatt images and their content is sharper. All the images are cropped around the tattoo so there is not much background information. Figure 1 shows some sample images from both databases. For each image, 20 new images were generated applying transformations with two different intensities of illumination, two diffusions, four affine transformations, four aspect ratio transformations, four different rotations and four color changes, as described in [11]. This way, the databases were increased obtaining a total of 4410 images for BIVTatt and 9534 for PinTatt. Figure 2 shows examples of transformations of an image from the BIVTatt dataset.
For the aim of identification experiments on each database, the probe set was conformed by the transformed images, while the gallery, was composed of the original images. For every test image, the original image that originate it (by some transformation) was excluded from the comparison. Thus, we simulate a real scenario of forensic identification were different images from the same tattoo can be available, but not exactly the same image. The final BIVTatt probe set consists of 1540 images and the gallery of 209 images. In the case of PinTatt it has 9080 and 453, respectively.
To evaluate the performance of the compared methods, we employed the cumulative match characteristic (CMC) curve. Each point of the CMC curve is the fraction of images of the probe set that were correctly matched with any of its pairs in the gallery, in a given range.
4.2 Experimental Results
Besides the experiments with the networks analyzed in Sect. 3, all the tests were also carried out using SIFT in order to compare the proposal of this work with a general approach used in previous works. The matching method used for SIFT was a knn-based matcher implemented in the Fast Library for Approximate Nearest Neighbors (FLANN) of the OpenCV library with k = 2. The Fig. 3 and the Fig. 4 show the CMC curves for BIVTatt and PinTatt databases respectively. The Table 2 and the Table 3 show the identification recognition rates at different rank values for BIVTatt and PinTatt databases respectively.
As can be seen, the best general performance was achieved by MobileNetV2 on both datasets. The two image retrieval networks, DeepBit and SSDH, obtained poor results. The image classification networks obtain good results in general. All of them outperformed SIFT, except for Rank-1 on BIVTatt dataset. In the case of Resnet50_CVGJ, it only exceeds SIFT starting from Rank-20 on this dataset.
It is also necessary to evaluate the efficiency of these networks to know if it is feasible to use them in real applications and for which scenarios. We measure the time each method takes to extract the features and the time required by the matching method to match an image against all images in the gallery. The sum of both times is the identification time of a query tattoo image. Figure 5 shows the time for the feature extraction of an image and the time to match it with all images in the gallery, averaged over 100 images in the BIVTatt dataset. Similar results were observed for the same experiment on the PinTatt dataset.
As can be seen in Fig. 5, SIFT is the fastest method for feature extraction but it is too slow on the matching step. In an identification scenario the matching efficiency is critical because the matching algorithm must be executed for each image in the gallery. We think that this fact makes SIFT a questionable alternative for this kind of scenario. On the other hand, MobileNetV2 has the best accuracy/efficiency relation which makes it a good option for tattoo identification. These experiments show that CNNs trained for image classification on generic databases such as ImageNet, can be extended to the tattoo domain, achieving good results as well.
5 Conclusions
In this article, we studied the use of intermediate features of deep neural networks, trained on generic databases, as descriptors of tattoo images. Unlike previous works where they used transfer learning to adjust the network to the context of tattoo identification, we used the original pre-trained network models. The results of the identification tests showed that by using this approach we can obtain better results, both in efficiency and accuracy, than the handcrafted solutions previously adopted, such as SIFT. In addition, the implementation cost is minimal since there are many publicly available CNNs similar to those used in this work. In future research, we will consider the use of tattoo detection and segmentation methods to extract the background of the image.
Notes
- 1.
The BIVTatt dataset is available at https://github.com/mnicolas94/BIVTatt-Dataset.
References
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38
Dantcheva, A., Elia, P., Ross, A.: What else does your biometric data reveal? A survey on soft biometrics. IEEE Trans. Inf. Forensics Secur. 11(3), 441–467 (2016)
Di, X., Patel, V.M.: Deep learning for tattoo recognition. In: Bhanu, B., Kumar, A. (eds.) Deep Learning for Biometrics. ACVPR, pp. 241–256. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61657-5_10
Han, H., Li, J., Jain, A.K., Chen, X.: Tattoo image search at scale: joint detection and compact representation learning. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2333–2348 (2019)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Jain, A., Nandakumar, K., Ross, A.: 50 years of biometric research: accomplishments, challenges, and opportunities. Pattern Recogn. Lett. 79, 80–105 (2016)
Jain, A.K., Park, U.: Facial marks: soft biometric for face recognition. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 37–40. IEEE (2009)
Kim, J., Parra, A., Yue, J., Li, H., Delp, E.J.: Robust local and global shape context for tattoo image matching. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 2194–2198. IEEE (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Lee, J.E., Jain, A.K., Jin, R.: Scars, marks and tattoos (SMT): soft biometric for suspect and victim identification. In: Biometrics Symposium, BSYM 2008, pp. 1–8. IEEE (2008)
Lin, K., Lu, J., Chen, C.S., Zhou, J.: Learning compact binary descriptors with unsupervised deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1183–1192 (2016)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Manger, D.: Large-scale tattoo image retrieval. In: 2012 Ninth Conference on Computer and Robot Vision (CRV), pp. 454–459. IEEE (2012)
Martin, M., Dawson, J., Bourlai, T.: Large scale data collection of tattoo-based biometric data from social-media websites. In: 2017 European Intelligence and Security Informatics Conference (EISIC), pp. 135–138. IEEE (2017)
Ngan, M., Grother, P.: Tattoo recognition technology-challenge (tatt-c): an open tattoo database for developing tattoo recognition research. In: 2015 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA), pp. 1–6. IEEE (2015)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Simon, M., Rodner, E., Denzler, J.: Imagenet pre-trained models with batch normalization. arXiv preprint arXiv:1612.01452 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, Z.H., Baumes, J., Tunison, P., Turek, M., Hoogs, A.: Tattoo detection and localization using region-based deep learning. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3055–3060. IEEE (2016)
Xu, Q., Ghosh, S., Xu, X., Huang, Y., Kong, A.W.K.: Tattoo detection based on CNN and remarks on the NIST database. In: 2016 International Conference on Biometrics (ICB), pp. 1–7. IEEE (2016)
Yang, H.F., Lin, K., Chen, C.S.: Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 437–451 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Nicolás-Díaz, M., Morales-González, A., Méndez-Vázquez, H. (2019). Deep Generic Features for Tattoo Identification. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2019. Lecture Notes in Computer Science(), vol 11896. Springer, Cham. https://doi.org/10.1007/978-3-030-33904-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-33904-3_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33903-6
Online ISBN: 978-3-030-33904-3
eBook Packages: Computer ScienceComputer Science (R0)