Text luminance modulation for hardcopy watermarking☆
Introduction
Digital watermarking provides an effective alternative to content authentication of images, audio and video signals [1], [2]. In authentication applications, the objective of the watermark is to ensure that the original signal has not been tampered with and that it is originated from a trustworthy source.
An important class of media is that of text documents. While in natural images there is a rich gray scale or even color content suitable to be modified, in binary text it is not possible to benefit from such a highly diversified host. The problem becomes even more challenging if the watermarked document is intended to be printed and to remain watermarked. In this scenario, printed document watermark detection is usually carried out with the help of a flatbed scanner, to digitize the document to allow the detection of a possible watermark. Watermarking techniques which are designed to survive the print–scan (PS) channel fall into the category of hardcopy watermarking [[3], [4], [5]].
This paper analyzes and improves the TLM method, which embeds hidden data in printed form office-like documents while presenting low perceptual impact and robustness to the PS channel. Besides serving simply as an additional side message, the hidden code can also be used to authenticate the information of sensitive parts of a document (names, dates, or values, for example), by hiding this information over the whole document.
It is acknowledged that the bulk of documents in the office-like class are composed by black text on white background, being referred to as bi-level or binary documents. Using TLM, information is embedded by introducing gray tones into a binary text image, respecting a perceptual transparency requirement. Although these luminance modifications do not significantly affect the perceived text quality, they can be detected by a scanner, and can be decoded to retrieve the embedded message. This approach was originally proposed in [6], and has been further discussed in [7], where the authors also suggest the use of halftone modulation, and in [8], where preliminary theoretical error rates are presented. Based on TLM approach, the main contributions given in this paper are:
(i) A novel analytical PS channel model is proposed. This new model includes most characteristics that influence the performance of the system and still allows a mathematical tractability in the analyses. A study of the underlying physical process of the channel derives the proposed model.
(ii) As a consequence of modifying the luminances when using TLM, the variances of the characters also change due to the halftoning process. This work employs the character sample variance as a detection metric, in addition to the average luminance of a character, previously proposed in [7], [8].
(iii) Moreover, this work combines the detection metrics into a single metric, instead of using them separately. The metrics are combined using the Bayes classifier, which yields the minimum average classification error for normally distributed patterns. This procedure does not affect the original embedding process and significantly improves the detection performance, as indicated by the analyses and the experiments. Using this strategy, it is possible to improve performance by including other detection metrics not discussed in this paper.
(iv) Analyses to determine the error probability of the method are presented, where the proposed channel model is considered. The analyses and the applicability of TLM are validated by experiments.
This paper is organized as follows: Section 2 discusses some related methods, indicating their advantages and drawbacks, followed by a brief description of the halftoning process. Section 3 proposes an analytical PS model. Section 4 describes TLM and presents a practical implementation, which provides satisfactory robustness to the PS distortions. Section 5 proposes detection alternatives, analyzing their resulting error rates. Section 6 proposes to combine the detection metrics into a single metric, achieving a significantly better detection performance. Finally, several experimental results are presented in Section 7, with conclusions in Section 8.
Section snippets
Existing methods
In agreement with [7], [9], an extensive survey on the literature reveals that a rather small number of works have been developed for text watermarking, when compared to image, audio and video watermarking.
A landmark paper on text watermarking has been published by Brassil et al. in [10]. In their work, the authors describe and compare several mechanisms to watermark documents and several other mechanisms for decoding the marks, which are remarkably robust to the PS channel. One method is
Proposed PS channel
This section discusses the PS channel, providing a basis for the study of hardcopy watermarking methods.
Text alteration of text
Using TLM, information is inserted in a document by altering its luminance through an embedding function to insert a watermark w into c, where c is a binary image of size representing a text document. Working in the range and , where 0 represents white and 1 represents black, the general embedding function is given bywhere s is the gray level watermarked version of c, before the PS process.
Notice that in (17) the white background is left unchanged
Detection by the sample mean
The simplest detection metric to determine the symbol embedded in an element is the average luminance of the element, given by (19). It is known from detection theory [29] that this detection statistic is the Neyman–Pearson (NP) detector (which minimizes the error probability) when detecting a change in the mean level considering Gaussian noise, which is the framework of the application.
By mapping the coordinates to an one-dimensional notation, the detection metric for element i is
Combining different metrics
This section proposes and discusses improvements by using additional detection metrics in the system, and by combining the results of these metrics into a new decision criterion. This approach falls into a multicriteria classification problem, where each element must be classified as belonging to one among S classes by determining an estimated , .
The mean luminance and the sample variance are the optimum detectors for detecting a DC level change and a variance change in Gaussian noise
Experiments
The purpose of this section is to illustrate through Monte Carlo simulations the applicability of TLM and the reduced error rate when using the Bayes classifier, as well as to validate the analyses of Section 5 and the proposed PS channel model.
In contrast to the previous sections of the paper, this section maps the image luminance scale [0,1] to the [0,255] scale, where 255 represents black and 0 represents white. Therefore, , and . Recall that represents no
Conclusions
This paper improves and analyzes the detection of a novel method to transmit hidden information in text documents. The method is a hardcopy watermarking system which modulates the luminance of character to embed information, and it can be applied in documents with any kind of characters and symbols, as well as different text alignments and spacing. It is important to notice that TLM can be combined with other text watermarking techniques discussed in Section 2. A new channel model for the
References (34)
- I.J. Cox, M.L. Miller, J.A. Bloom, Digital Watermarking, Morgan Kaufmann,...
- M. Wu, B. Liu, Data hiding in binary image for authentication and annotation, IEEE Trans. Multimedia 6 (4) (August...
- K. Solanki, U. Madhow, B.S. Manjunath, S. Chandrasekaran, Modeling the print-scan process for resilient data hiding,...
- C.-Y. Ling, Public watermarking surviving general scaling and cropping: an application for print-and-scan process,...
- A.M. Alattar, O.M. Alattar, Watermarking electronic text documents containing justified paragraphs and irregular line...
- A.K. Bhattacharjya, H. Ancin, Data embedding in text for a copier system, Proceedings IEEE International Conference on...
- R. Víllan, S. Voloshynovskiy, O. Koval, J. Vila, E. Topak, F. Deguillaume, Y. Rytsar, T. Pun, Text data-hiding for...
- P.V. Borges, J. Mayer, Document watermarking via character luminance modulation, Proceedings of the IEEE International...
- Y.-W. Kim, K.-A. Moon, I.-S. Oh, A text watermarking algorithm based on word classification and inter-word space...
- et al.
Copyright protection for the electronic distribution of text documents
Proc. IEEE
(July 1999)
Interword distance changes represented by sine waves for watermarking text images
IEEE Trans. Circuits and Systems Video Technol.
Capacity of text marking channel
IEEE Signal Process. Lett.
Cited by (23)
Language universal font watermarking with multiple cross-media robustness
2023, Signal ProcessingCitation Excerpt :Therefore, these methods are more suitable for text steganography [17]. In addition, image-based watermarking algorithms such as [18–20] can also be used for document images, which disguise the watermark information as a background image with colors and patterns visible to the human eye and then superimpose it with the document image. However, such textures or under-paintings are also not allowed in many practical document application scenarios.
Verifying document integrity
2022, Multimedia Security 2: Biometrics, Video Surveillance and Multimedia EncryptionA high capacity watermarking technique for the printed document
2019, Electronics (Switzerland)Review on text watermarking resistant to print-scan, screen-shooting
2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Fundamentals and applications of hardcopy communication: Conveying side information by printed media
2018, Fundamentals and Applications of Hardcopy Communication: Conveying Side Information by Printed MediaText watermarking design based on invisible characters
2017, Nanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology
- ☆
This work was supported by CNPq, Proc. No. 550658/02-5 and 552164/01-1.