Deep feature extraction for document forgery detection with convolutional autoencoders☆
Introduction
Analyzing ink is an essential aspect of detecting forgery in handwritten questioned documents, including forged cheques, wills, or altered signatures and records. Though all inks of the same color, like blue, black, or red, may look similar. But it's the chemical composition that makes them idiosyncratic. The naked eye cannot identify these subtle properties. The experimental analysis can investigate them, and the outcomes assist in determining any alterations or modifications in a questioned document.
Initially, the microscopic analysis of a questioned document can be instructive. The analyzer may detect or view feeble modifications in ink color not identified by the naked eye [1]. It is a clue for the changes and alterations or obliteration and overwriting made in the questioned documents. The methods used to analyze the inks are classified as destructive and non-destructive [2]. Promising destructive practices comprise chromatography and electrophoresis. The primary non-destructive methods implemented in ink analysis include: Fourier transform infrared (FTIR), Raman spectroscopy, Video spectral comparator (VSC), multispectral imaging, and hyperspectral imaging [3,4]. Non-destructive methods are preferred over destructive methods as they leave the questioned document intact.
Hyperspectral imaging is an evolving technique that blends imaging and spectroscopy to capture spatial and spectral information regarding an object. It is famous as imaging spectrometry or imaging spectroscopy [5]. The term "Hyperspectral imaging" originated from the work done for recognizing the surface materials in remote sensing images [5].
The human eye possesses only three color receptors that are red, green, and blue. Hyperspectral imaging captures images at different wavelengths (ranging from a few tens to several hundred), resulting in numerous images [6]. Compared to a three-channel RGB image or to few spectral bands in multispectral, hyperspectral images detain more detailed information. Continuity and narrowness (10–20 nm) of the spectral bands are the two main peculiarities of hyperspectral images that make them novel for analysis. Image data captures two-dimensional data; on appending the third "spectrum" dimension, a three-dimensional data cube is formed [5,7]. Data cube assists in analyzing the hyperspectral images, as shown in Fig. 1.
In this paper, we have examined the capacity unsupervised deep learning approach using spectral features for ink mismatch detection in hyperspectral document images. The main contributions of this paper are as follows:
- (1)
We introduced an unsupervised deep learning approach for ink mismatch detection for hyperspectral document images which has not been explored in this domain to the best of our knowledge.
- (2)
We proposed a convolutional autoencoder that captures deep features from hyperspectral document images, contributing to a reliable classification followed by logistic regression (CAE-LR).
- (3)
The proposed approach is compared against three machine learning algorithms with variants of each, CNN, and five state-of-art methods used by the researchers. For the sake of the fair comparison, the proportions of ink mixings have been adapted from the previous work [8,9,13,15,18]. The results are analyzed for unequal mixing of two inks in varying proportions and equal mixing of inks by changing the number of inks in the proportion.
- (4)
The proposed approach efficiently handles the spectra compactness of inks at different spectral bands which tends to reduce the accuracy. It outperforms all the compared methods for blue and black inks and improves average accuracy up to 4.74% for black inks and 1.3% for blue inks.
The rest of the paper is organized into seven sections. Section 2 depicts the related work. Section 3 highlights the discussion on existing work. The experimental description is elaborated in Section 4. The experiments and results are illustrated in Section 5. Section 6 depicts the major findings of the proposed approach CAE-LR. Conclusion and future work are explained in Section 7.
Section snippets
Related work
Hyperspectral unmixing using MVES (Minimum Volume Enclosing Simplex algorithm) was illustrated in [8]. Dimensionality reduction algorithms-Maximum noise fraction (MNF) and Principal Component Analysis (PCA), endmember extraction algorithm-Hysime acted as an input to MVES. Since the Hysime algorithm overestimates the number of endmembers, manual discarding was applied. Khan et al. [9] utilized spectral information for making a 6,6 sized data cube to provide input to a CNN. Six subjects' spectral
Discussion on existing work
The main findings from work elaborated in the literature are as follows:
- •
Supervised and unsupervised machine learning approaches, similarity measures, unmixing and supervised deep learning approaches using spectral features have been implemented to detect ink mismatch in hyperspectral document images.
- •
The major limitation of the previous work is that it demands preliminary information about the number of inks used in the document forgery, which restricts its usage in real-life scenarios.
- •
Experimental description
Unsupervised deep learning approaches are not yet explored for ink mismatch detection in hyperspectral document images. Moreover, to overcome the spectral compactness using spectral features are the primary concerns. Therefore, it provides an opportunity for researchers to explore. The workflow of the proposed work is elaborated in Fig. 2. Each step is explained further in detail. All the steps are done separately for blue and black inks.
Experiments and results
This section elaborates the evaluation metrics used for validation. Further, results obtained after comparing CAE – LR with three machine learning algorithms and their variants , CNN, and five state-of-art approaches are elaborated.
Major findings of the proposed approach CAE-LR
The significant findings of the proposed approach CAE-LR are as follows
- •
CAE-LR outperforms the results attained by supervised (Figs. 11 and 12) and unsupervised machine learning approaches (Abbas et al. [8], Khan et al. [13], Luo et al. [15], and Khan et al. [18]).
- •
CAE-LR attained better results when compared to supervised deep learning approaches (CNN and Khan et al. [9]), which depicts the efficacy of the proposed unsupervised deep learning approach.
- •
For black inks, the proposed approach CAE-LR
Conclusion and future prospects
It is challenging to design an unsupervised deep learning approach to detect ink mismatch detection in hyperspectral document images. To the best of our knowledge, for the first time an unsupervised deep learning approach CAE-LR, is proposed that exploits feature extraction using a convolutional autoencoder and classification of extracted features using logistic regression. Further, the ink mismatch detection is illustrated by equal and unequal mixing on various ink types (2∼5) using spectral
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Garima Jaiswal is currently pursuing her PhD in the Department of IT from Indira Gandhi Delhi Technical University for Women, Delhi. She has 09 years of experience in teaching and research domain. Her interest areas include machine learning, database management systems, and data structures. She has published more than15 research papers in SCI/SCIE/SCOPUS and other journals and conferences.
References (31)
- et al.
Raman spectroscopy for forensic analysis of inks in questioned documents
Forensic Sci Int
(2013) - et al.
Hyperspectral document image processing: applications, challenges and future prospects
Pattern Recognit
(2019) - et al.
Automatic ink mismatch detection for forensic document analysis
Pattern Recognit
(2015) - et al.
Hyperspectral imaging of gel pen inks: an emerging tool in document analysis
Sci Justice
(2014) - et al.
An analysis of image forgery detection techniques
Stat Optim Inf Comput
(2019) Scientific investigation of copies, fakes and forgeries
(2009)- et al.
Applications of non-destructive testing (NDT) in vehicle forgery examinations
J Forensic Sci
(1994) - et al.
Critical insights into modern hyperspectral image applications through deep learning
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
(2021) - et al.
Efficient ink mismatch detection using supervised approach
- et al.
Towards automated ink mismatch detection in hyperspectral document images
Deep learning for automated forgery detection in hyperspectral document images
J Electron Imaging
A spatio-spectral hybrid convolutional architecture for hyperspectral document authentication
Comparison of ink classification capabilities of classic hyperspectral similarity features
Ink classification using convolutional neural network
NISK J
Hyperspectral imaging for ink mismatch detection
Cited by (14)
An efficient technique for detecting document forgery in hyperspectral document images
2023, Alexandria Engineering JournalIntegration of hyperspectral imaging and autoencoders: Benefits, applications, hyperparameter tunning and challenges
2023, Computer Science ReviewDiagnosis of brain diseases in fusion of neuroimaging modalities using deep learning: A review
2023, Information FusionCitation Excerpt :Without convolutional layers, AEs cannot learn spatial patterns in data. However, the applications of CNN-AE are not limited to this, as some forms of them (VAEs) can be used to generate data, and, with a few changes, they are widely used for image segmentation [156] and forgery detection [157]. AEs are among the most preferred choices for unsupervised re-presentation learning.
Combination of hyperspectral imaging and machine learning models for fast characterization and classification of municipal solid waste
2023, Resources, Conservation and RecyclingCitation Excerpt :In recent years, with the application and development of spectral technology and data mining technology, various spectral technologies have been widely used in the identification and characterization of various biomass and solid wastes (Tao et al., 2020; Yan et al., 2021). Compared with traditional ultimate analysis and proximate analysis methods, spectral technology has advantages in fast speed, convenient procedure, and nondestructive feature, which is suitable for consecutive online analysis and testing(Jaiswal et al., 2022). The feasibility of combining spectroscopy with machine learning for elemental composition and heating value prediction of waste through two previous spectroscopy technologies has been validated previously (Tao et al., 2020; Yan et al., 2021).
DFD-SS: Document Forgery Detection using Spectral – Spatial Features for Hyperspectral Images
2022, Journal of Visual Communication and Image RepresentationCitation Excerpt :In the present study, ink mismatch detection for hyperspectral document images is highlighted. Jaiswal et al. [7] extracted the spectral features from the hyperspectral document images to detect the forgery. Their approach yielded good results for blue inks.
Forged document detection and writer identification through unsupervised deep learning approach
2024, Multimedia Tools and Applications
Garima Jaiswal is currently pursuing her PhD in the Department of IT from Indira Gandhi Delhi Technical University for Women, Delhi. She has 09 years of experience in teaching and research domain. Her interest areas include machine learning, database management systems, and data structures. She has published more than15 research papers in SCI/SCIE/SCOPUS and other journals and conferences.
Arun Sharma is currently working as Professor and Head of the Department – AI and Data Sciences at Indira Gandhi Delhi Technical University for Women, Delhi. His-areas of interests include software engineering, soft computing and Big Data. He has published more than 60 papers in international SCI/SCIE/SCOPUS and other journals and conferences.
Sumit Kumar Yadav is Asst. Director (S) in Income Tax Department, Delhi. He has around 09 years of vast experience in the field of Administration, Teaching & Research. Previous to join Income Tax Department he has worked as Assistant Professor in Delhi Government University. He has published more than 30 (Thirty) research papers in International Conferences and Journals of repute.
- ☆
This paper is for regular issues of CAEE. Reviews were processed by Associate Editor Dr. Ali Dehghantanha and recommended for publication.