Elsevier

Signal Processing

Volume 87, Issue 7, July 2007, Pages 1754-1771
Signal Processing

Text luminance modulation for hardcopy watermarking

https://doi.org/10.1016/j.sigpro.2007.01.029Get rights and content

Abstract

This paper improves a recently proposed hardcopy watermarking method by introducing new approaches to decode the embedded information. The proposed method, coined as text luminance modulation (TLM), embeds hidden data in office-like documents, while presenting robustness to the print–scan (PS) channel. The hidden data is embedded by slightly modulating the luminance of characters and symbols. This change can be set unperceivable to the human eye and detected with the aid of a scanner. In previous works, the hidden data was retrieved by using the average luminance level or the halftone pattern as a detection metric. In this paper, however, the detection process combines different metrics into a single metric, significantly improving the detection performance. This paper proposes a new PS model where characteristics induced by the halftoning in the printing process are considered, allowing the use of the variance of a character as a new detection metric. Performance analyses validate the proposed PS model. Experiments illustrate the precision of the analyses and the applicability of the method.

Introduction

Digital watermarking provides an effective alternative to content authentication of images, audio and video signals [1], [2]. In authentication applications, the objective of the watermark is to ensure that the original signal has not been tampered with and that it is originated from a trustworthy source.

An important class of media is that of text documents. While in natural images there is a rich gray scale or even color content suitable to be modified, in binary text it is not possible to benefit from such a highly diversified host. The problem becomes even more challenging if the watermarked document is intended to be printed and to remain watermarked. In this scenario, printed document watermark detection is usually carried out with the help of a flatbed scanner, to digitize the document to allow the detection of a possible watermark. Watermarking techniques which are designed to survive the print–scan (PS) channel fall into the category of hardcopy watermarking [[3], [4], [5]].

This paper analyzes and improves the TLM method, which embeds hidden data in printed form office-like documents while presenting low perceptual impact and robustness to the PS channel. Besides serving simply as an additional side message, the hidden code can also be used to authenticate the information of sensitive parts of a document (names, dates, or values, for example), by hiding this information over the whole document.

It is acknowledged that the bulk of documents in the office-like class are composed by black text on white background, being referred to as bi-level or binary documents. Using TLM, information is embedded by introducing gray tones into a binary text image, respecting a perceptual transparency requirement. Although these luminance modifications do not significantly affect the perceived text quality, they can be detected by a scanner, and can be decoded to retrieve the embedded message. This approach was originally proposed in [6], and has been further discussed in [7], where the authors also suggest the use of halftone modulation, and in [8], where preliminary theoretical error rates are presented. Based on TLM approach, the main contributions given in this paper are:

(i) A novel analytical PS channel model is proposed. This new model includes most characteristics that influence the performance of the system and still allows a mathematical tractability in the analyses. A study of the underlying physical process of the channel derives the proposed model.

(ii) As a consequence of modifying the luminances when using TLM, the variances of the characters also change due to the halftoning process. This work employs the character sample variance as a detection metric, in addition to the average luminance of a character, previously proposed in [7], [8].

(iii) Moreover, this work combines the detection metrics into a single metric, instead of using them separately. The metrics are combined using the Bayes classifier, which yields the minimum average classification error for normally distributed patterns. This procedure does not affect the original embedding process and significantly improves the detection performance, as indicated by the analyses and the experiments. Using this strategy, it is possible to improve performance by including other detection metrics not discussed in this paper.

(iv) Analyses to determine the error probability of the method are presented, where the proposed channel model is considered. The analyses and the applicability of TLM are validated by experiments.

This paper is organized as follows: Section 2 discusses some related methods, indicating their advantages and drawbacks, followed by a brief description of the halftoning process. Section 3 proposes an analytical PS model. Section 4 describes TLM and presents a practical implementation, which provides satisfactory robustness to the PS distortions. Section 5 proposes detection alternatives, analyzing their resulting error rates. Section 6 proposes to combine the detection metrics into a single metric, achieving a significantly better detection performance. Finally, several experimental results are presented in Section 7, with conclusions in Section 8.

Section snippets

Existing methods

In agreement with [7], [9], an extensive survey on the literature reveals that a rather small number of works have been developed for text watermarking, when compared to image, audio and video watermarking.

A landmark paper on text watermarking has been published by Brassil et al. in [10]. In their work, the authors describe and compare several mechanisms to watermark documents and several other mechanisms for decoding the marks, which are remarkably robust to the PS channel. One method is

Proposed PS channel

This section discusses the PS channel, providing a basis for the study of hardcopy watermarking methods.

Text alteration of text

Using TLM, information is inserted in a document by altering its luminance through an embedding function E(·) to insert a watermark w into c, where c is a binary image of size M×N representing a text document. Working in the range c{0,1} and w[0,1], where 0 represents white and 1 represents black, the general embedding function is given bys(m,n)=w(m,n)c(m,n),where s is the gray level watermarked version of c, before the PS process.

Notice that in (17) the white background is left unchanged

Detection by the sample mean

The simplest detection metric to determine the symbol embedded in an element ci is the average luminance of the element, given by (19). It is known from detection theory [29] that this detection statistic is the Neyman–Pearson (NP) detector (which minimizes the error probability) when detecting a change in the mean level considering Gaussian noise, which is the framework of the application.

By mapping the (m,n) coordinates to an one-dimensional notation, the detection metric dMi for element i is

Combining different metrics

This section proposes and discusses improvements by using additional detection metrics in the system, and by combining the results of these metrics into a new decision criterion. This approach falls into a multicriteria classification problem, where each element ci must be classified as belonging to one among S classes by determining an estimated ω^, ω^Ω.

The mean luminance and the sample variance are the optimum detectors for detecting a DC level change and a variance change in Gaussian noise

Experiments

The purpose of this section is to illustrate through Monte Carlo simulations the applicability of TLM and the reduced error rate when using the Bayes classifier, as well as to validate the analyses of Section 5 and the proposed PS channel model.

In contrast to the previous sections of the paper, this section maps the image luminance scale [0,1] to the [0,255] scale, where 255 represents black and 0 represents white. Therefore, s[0,255], c{0,255} and w[0,1]. Recall that w=1 represents no

Conclusions

This paper improves and analyzes the detection of a novel method to transmit hidden information in text documents. The method is a hardcopy watermarking system which modulates the luminance of character to embed information, and it can be applied in documents with any kind of characters and symbols, as well as different text alignments and spacing. It is important to notice that TLM can be combined with other text watermarking techniques discussed in Section 2. A new channel model for the

References (34)

  • I.J. Cox, M.L. Miller, J.A. Bloom, Digital Watermarking, Morgan Kaufmann,...
  • M. Wu, B. Liu, Data hiding in binary image for authentication and annotation, IEEE Trans. Multimedia 6 (4) (August...
  • K. Solanki, U. Madhow, B.S. Manjunath, S. Chandrasekaran, Modeling the print-scan process for resilient data hiding,...
  • C.-Y. Ling, Public watermarking surviving general scaling and cropping: an application for print-and-scan process,...
  • A.M. Alattar, O.M. Alattar, Watermarking electronic text documents containing justified paragraphs and irregular line...
  • A.K. Bhattacharjya, H. Ancin, Data embedding in text for a copier system, Proceedings IEEE International Conference on...
  • R. Víllan, S. Voloshynovskiy, O. Koval, J. Vila, E. Topak, F. Deguillaume, Y. Rytsar, T. Pun, Text data-hiding for...
  • P.V. Borges, J. Mayer, Document watermarking via character luminance modulation, Proceedings of the IEEE International...
  • Y.-W. Kim, K.-A. Moon, I.-S. Oh, A text watermarking algorithm based on word classification and inter-word space...
  • J.T. Brassil et al.

    Copyright protection for the electronic distribution of text documents

    Proc. IEEE

    (July 1999)
  • D. Huang et al.

    Interword distance changes represented by sine waves for watermarking text images

    IEEE Trans. Circuits and Systems Video Technol.

    (December 2001)
  • H. Yang, A.C. Kot, Text document authentication by integrating inter character and word spaces watermarking,...
  • T. Amano, A feature calibration method for watermarking of document images, IEEE Proceedings of the Fifth International...
  • 〈www.textmark.com〉, Compris Intelligence,...
  • S. Low et al.

    Capacity of text marking channel

    IEEE Signal Process. Lett.

    (December 2000)
  • F. Deguillaume, Y. Rytsar, S. Voloshynovskiy, T. Pun, Data-hiding based text document security and automatic...
  • S. Voloshynovskiy, O. Koval, R. Villán, E. Topak, J.E. Vila-Forcén, F. Deguillaume, Y. Rytsar, T. Pun,...
  • Cited by (23)

    • Language universal font watermarking with multiple cross-media robustness

      2023, Signal Processing
      Citation Excerpt :

      Therefore, these methods are more suitable for text steganography [17]. In addition, image-based watermarking algorithms such as [18–20] can also be used for document images, which disguise the watermark information as a background image with colors and patterns visible to the human eye and then superimpose it with the document image. However, such textures or under-paintings are also not allowed in many practical document application scenarios.

    • Verifying document integrity

      2022, Multimedia Security 2: Biometrics, Video Surveillance and Multimedia Encryption
    • Review on text watermarking resistant to print-scan, screen-shooting

      2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Fundamentals and applications of hardcopy communication: Conveying side information by printed media

      2018, Fundamentals and Applications of Hardcopy Communication: Conveying Side Information by Printed Media
    • Text watermarking design based on invisible characters

      2017, Nanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology
    View all citing articles on Scopus

    This work was supported by CNPq, Proc. No. 550658/02-5 and 552164/01-1.

    View full text