Deep feature extraction for document forgery detection with convolutional autoencoders

https://doi.org/10.1016/j.compeleceng.2022.107770Get rights and content

Abstract

Context

Document forgery is a significant problem for ages due to paper-based documents' pervasive use. Classical destructive approaches for this problem, such as chromatography and electrophoresis, cannot be implemented as they flaw the document under analysis. Hyperspectral imaging - non-destructive approach that assists in finding the unique features of an image under investigation through their unique spectral signatures. It captures multiple narrow-band images at the electromagnetic spectrum, which is difficult through conventional imaging. Deep learning approaches for hyperspectral images have attained state-of-the-art results for solving many complex and challenging problems. Supervised classification of hyperspectral images is a tedious task since obtaining image labels and labeling the training data is a time-consuming and expensive process. In this paper, an unsupervised approach for classification of hyperspectral document images is proposed.

Objective

To propose an unsupervised deep learning approach for ink mismatch detection in hyperspectral document images using spectral features.

Approach

CAE-LR approach is proposed that uses Convolutional Autoencoder (CAE) for feature extraction and utilizing them for ink mismatch detection through Logistic Regression (LR).

Results

We evaluated the performance of CAE-LR on UWA writing ink hyperspectral images dataset for blue and black inks. Artificially similar color inks of different types (2∼5) were mixed in varying proportions to detect ink mismatch. Additionally, results are compared with three machine learning algorithms with variants of each, CNN, and five state-of-art methods used by the researchers. Experimental results illustrated that the CAE-LR outperforms all the above – mentioned approaches by achieving the state of art results, which depicts the efficacy of unsupervised deep learning approach for ink mismatch detection in hyperspectral document images.

Introduction

Analyzing ink is an essential aspect of detecting forgery in handwritten questioned documents, including forged cheques, wills, or altered signatures and records. Though all inks of the same color, like blue, black, or red, may look similar. But it's the chemical composition that makes them idiosyncratic. The naked eye cannot identify these subtle properties. The experimental analysis can investigate them, and the outcomes assist in determining any alterations or modifications in a questioned document.

Initially, the microscopic analysis of a questioned document can be instructive. The analyzer may detect or view feeble modifications in ink color not identified by the naked eye [1]. It is a clue for the changes and alterations or obliteration and overwriting made in the questioned documents. The methods used to analyze the inks are classified as destructive and non-destructive [2]. Promising destructive practices comprise chromatography and electrophoresis. The primary non-destructive methods implemented in ink analysis include: Fourier transform infrared (FTIR), Raman spectroscopy, Video spectral comparator (VSC), multispectral imaging, and hyperspectral imaging [3,4]. Non-destructive methods are preferred over destructive methods as they leave the questioned document intact.

Hyperspectral imaging is an evolving technique that blends imaging and spectroscopy to capture spatial and spectral information regarding an object. It is famous as imaging spectrometry or imaging spectroscopy [5]. The term "Hyperspectral imaging" originated from the work done for recognizing the surface materials in remote sensing images [5].

The human eye possesses only three color receptors that are red, green, and blue. Hyperspectral imaging captures images at different wavelengths (ranging from a few tens to several hundred), resulting in numerous images [6]. Compared to a three-channel RGB image or to few spectral bands in multispectral, hyperspectral images detain more detailed information. Continuity and narrowness (10–20 nm) of the spectral bands are the two main peculiarities of hyperspectral images that make them novel for analysis. Image data captures two-dimensional data; on appending the third "spectrum" dimension, a three-dimensional data cube is formed [5,7]. Data cube assists in analyzing the hyperspectral images, as shown in Fig. 1.

In this paper, we have examined the capacity unsupervised deep learning approach using spectral features for ink mismatch detection in hyperspectral document images. The main contributions of this paper are as follows:

  • (1)

    We introduced an unsupervised deep learning approach for ink mismatch detection for hyperspectral document images which has not been explored in this domain to the best of our knowledge.

  • (2)

    We proposed a convolutional autoencoder that captures deep features from hyperspectral document images, contributing to a reliable classification followed by logistic regression (CAE-LR).

  • (3)

    The proposed approach is compared against three machine learning algorithms with variants of each, CNN, and five state-of-art methods used by the researchers. For the sake of the fair comparison, the proportions of ink mixings have been adapted from the previous work [8,9,13,15,18]. The results are analyzed for unequal mixing of two inks in varying proportions and equal mixing of inks by changing the number of inks in the proportion.

  • (4)

    The proposed approach efficiently handles the spectra compactness of inks at different spectral bands which tends to reduce the accuracy. It outperforms all the compared methods for blue and black inks and improves average accuracy up to 4.74% for black inks and 1.3% for blue inks.

The rest of the paper is organized into seven sections. Section 2 depicts the related work. Section 3 highlights the discussion on existing work. The experimental description is elaborated in Section 4. The experiments and results are illustrated in Section 5. Section 6 depicts the major findings of the proposed approach CAE-LR. Conclusion and future work are explained in Section 7.

Section snippets

Related work

Hyperspectral unmixing using MVES (Minimum Volume Enclosing Simplex algorithm) was illustrated in [8]. Dimensionality reduction algorithms-Maximum noise fraction (MNF) and Principal Component Analysis (PCA), endmember extraction algorithm-Hysime acted as an input to MVES. Since the Hysime algorithm overestimates the number of endmembers, manual discarding was applied. Khan et al. [9] utilized spectral information for making a 6,6 sized data cube to provide input to a CNN. Six subjects' spectral

Discussion on existing work

The main findings from work elaborated in the literature are as follows:

  • Supervised and unsupervised machine learning approaches, similarity measures, unmixing and supervised deep learning approaches using spectral features have been implemented to detect ink mismatch in hyperspectral document images.

  • The major limitation of the previous work is that it demands preliminary information about the number of inks used in the document forgery, which restricts its usage in real-life scenarios.

Experimental description

Unsupervised deep learning approaches are not yet explored for ink mismatch detection in hyperspectral document images. Moreover, to overcome the spectral compactness using spectral features are the primary concerns. Therefore, it provides an opportunity for researchers to explore. The workflow of the proposed work is elaborated in Fig. 2. Each step is explained further in detail. All the steps are done separately for blue and black inks.

Experiments and results

This section elaborates the evaluation metrics used for validation. Further, results obtained after comparing CAE – LR with three machine learning algorithms and their variants , CNN, and five state-of-art approaches are elaborated.

Major findings of the proposed approach CAE-LR

The significant findings of the proposed approach CAE-LR are as follows

  • CAE-LR outperforms the results attained by supervised (Figs. 11 and 12) and unsupervised machine learning approaches (Abbas et al. [8], Khan et al. [13], Luo et al. [15], and Khan et al. [18]).

  • CAE-LR attained better results when compared to supervised deep learning approaches (CNN and Khan et al. [9]), which depicts the efficacy of the proposed unsupervised deep learning approach.

  • For black inks, the proposed approach CAE-LR

Conclusion and future prospects

It is challenging to design an unsupervised deep learning approach to detect ink mismatch detection in hyperspectral document images. To the best of our knowledge, for the first time an unsupervised deep learning approach CAE-LR, is proposed that exploits feature extraction using a convolutional autoencoder and classification of extracted features using logistic regression. Further, the ink mismatch detection is illustrated by equal and unequal mixing on various ink types (2∼5) using spectral

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Garima Jaiswal is currently pursuing her PhD in the Department of IT from Indira Gandhi Delhi Technical University for Women, Delhi. She has 09 years of experience in teaching and research domain. Her interest areas include machine learning, database management systems, and data structures. She has published more than15 research papers in SCI/SCIE/SCOPUS and other journals and conferences.

References (31)

  • M.J. Khan et al.

    Deep learning for automated forgery detection in hyperspectral document images

    J Electron Imaging

    (2018)
  • M.J. Khan et al.

    A spatio-spectral hybrid convolutional architecture for hyperspectral document authentication

  • B.M. Devassy et al.

    Comparison of ink classification capabilities of classic hyperspectral similarity features

  • B.M. Devassy et al.

    Ink classification using convolutional neural network

    NISK J

    (2019)
  • Z. Khan et al.

    Hyperspectral imaging for ink mismatch detection

  • Cited by (14)

    • Diagnosis of brain diseases in fusion of neuroimaging modalities using deep learning: A review

      2023, Information Fusion
      Citation Excerpt :

      Without convolutional layers, AEs cannot learn spatial patterns in data. However, the applications of CNN-AE are not limited to this, as some forms of them (VAEs) can be used to generate data, and, with a few changes, they are widely used for image segmentation [156] and forgery detection [157]. AEs are among the most preferred choices for unsupervised re-presentation learning.

    • Combination of hyperspectral imaging and machine learning models for fast characterization and classification of municipal solid waste

      2023, Resources, Conservation and Recycling
      Citation Excerpt :

      In recent years, with the application and development of spectral technology and data mining technology, various spectral technologies have been widely used in the identification and characterization of various biomass and solid wastes (Tao et al., 2020; Yan et al., 2021). Compared with traditional ultimate analysis and proximate analysis methods, spectral technology has advantages in fast speed, convenient procedure, and nondestructive feature, which is suitable for consecutive online analysis and testing(Jaiswal et al., 2022). The feasibility of combining spectroscopy with machine learning for elemental composition and heating value prediction of waste through two previous spectroscopy technologies has been validated previously (Tao et al., 2020; Yan et al., 2021).

    • DFD-SS: Document Forgery Detection using Spectral – Spatial Features for Hyperspectral Images

      2022, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      In the present study, ink mismatch detection for hyperspectral document images is highlighted. Jaiswal et al. [7] extracted the spectral features from the hyperspectral document images to detect the forgery. Their approach yielded good results for blue inks.

    View all citing articles on Scopus

    Garima Jaiswal is currently pursuing her PhD in the Department of IT from Indira Gandhi Delhi Technical University for Women, Delhi. She has 09 years of experience in teaching and research domain. Her interest areas include machine learning, database management systems, and data structures. She has published more than15 research papers in SCI/SCIE/SCOPUS and other journals and conferences.

    Arun Sharma is currently working as Professor and Head of the Department – AI and Data Sciences at Indira Gandhi Delhi Technical University for Women, Delhi. His-areas of interests include software engineering, soft computing and Big Data. He has published more than 60 papers in international SCI/SCIE/SCOPUS and other journals and conferences.

    Sumit Kumar Yadav is Asst. Director (S) in Income Tax Department, Delhi. He has around 09 years of vast experience in the field of Administration, Teaching & Research. Previous to join Income Tax Department he has worked as Assistant Professor in Delhi Government University. He has published more than 30 (Thirty) research papers in International Conferences and Journals of repute.

    This paper is for regular issues of CAEE. Reviews were processed by Associate Editor Dr. Ali Dehghantanha and recommended for publication.

    View full text