skip to main content
10.1145/2938503.2938510acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Content-preserving Text Watermarking through Unicode Homoglyph Substitution

Published: 11 July 2016 Publication History

Abstract

Digital watermarking has become crucially important in authentication and copyright protection of the digital contents, since more and more data are daily generated and shared online through digital archives, blogs and social networks. Out of all, text watermarking is a more difficult task in comparison to other media watermarking. Text cannot be always converted into image, it accounts for a far smaller amount of data (eg. social network posts) and the changes in short texts would strongly affect the meaning or the overall visual form. In this paper we propose a text watermarking technique based on homoglyph characters substitution for latin symbols1. The proposed method is able to efficiently embed a password based watermark in short texts by strictly preserving the content. In particular, it uses alternative Unicode symbols to ensure visual indistinguishability and length preservation, namely content-preservation. To evaluate our method, we use a real dataset of 1.8 million New York articles. The results show the effectiveness of our approach providing an average length of 101 characters needed to embed a 64bit password based watermark.

References

[1]
T. Amano and D. Misaki. A feature calibration method for watermarking of document images. In Proceedings of the Fifth International Conference on Document Analysis and Recognition, pages 91--94. IEEE, 1999.
[2]
M. J. Atallah, V. Raskin, M. Crogan, C. Hempelmann, F. Kerschbaum, D. Mohamed, and S. Naik. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding, pages 185--200. Springer, 2001.
[3]
J.-P. Aumasson and D. J. Bernstein. Siphash: a fast short-input prf. In Progress in Cryptology-INDOCRYPT 2012, pages 489--508. Springer, 2012.
[4]
A. K. Bhattacharjya and H. Ancin. Data embedding in text for a copier system. In Image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on, volume 2, pages 245--249. IEEE, 1999.
[5]
J. T. Brassil, S. Low, N. F. Maxemchuk, and L. O. Gorman. Electronic marking and identification techniques to discourage document copying. Selected Areas in Communications, IEEE Journal on, 13(8):1495--1504, 1995.
[6]
M. Davis and M. Suignard. Unicode security mechanisms. Unicode technical standard #39, Unicode. http://www.unicode.org/reports/tr39/.
[7]
D. Gross-Amblard. Query-preserving watermarking of relational databases and xml documents. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 191--201. ACM, 2003.
[8]
F. Hartung and M. Kutter. Multimedia watermarking techniques. Proceedings of the IEEE, 87(7):1079--1107, 1999.
[9]
S. Hosmani, H. R. Bhat, and K. Chandrasekaran. Dual stage text steganography using unicode homoglyphs. In Security in Computing and Communications, pages 265--276. Springer, 2015.
[10]
D. Huang and H. Yan. Interword distance changes represented by sine waves for watermarking text images. Circuits and Systems for Video Technology, IEEE Transactions on, 11(12):1237--1245, 2001.
[11]
Z. Jalil and A. M. Mirza. A review of digital watermarking techniques for text documents. In Information and Multimedia Technology, 2009. ICIMT'09. International Conference on, pages 230--234. IEEE, 2009.
[12]
S. Katzenbeisser and F. Petitcolas. Information hiding techniques for steganography and digital watermarking. Artech house, 2000.
[13]
M. Kaur and K. Mahajan. An existential review on text watermarking techniques. International Journal of Computer Applications, 120(18), 2015.
[14]
Y.-W. Kim, K.-A. Moon, and I.-S. Oh. A text watermarking algorithm based on word classification and inter-word space statistics. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, pages 775--779. IEEE, 2003.
[15]
Y.-W. Kim and I.-S. Oh. Watermarking text document images using edge direction histograms. Pattern Recognition Letters, 25(11):1243--1251, 2004.
[16]
J. C. Lai and C. B. Graber. Is digital text-watermarking the long-desired user friendly digital rights management? copyright and fundamental values from a comparative perspective. European Intellectual Property Review, published by Sweet & Maxwell, 2016.
[17]
S. H. Low, N. F. Maxemchuk, and A. M. Lapone. Document identification for copyright protection using centroid detection. Communications, IEEE Transactions on, 46(3):372--383, 1998.
[18]
H. M. Meral, B. Sankur, A. S. Özsoy, T. Güngör, and E. Sevinç. Natural language watermarking via morphosyntactic alterations. Computer Speech & Language, 23(1):107--125, 2009.
[19]
N. Mir. Copyright for web content using invisible text watermarking. Computers in Human Behavior, 30:648--653, 2014.
[20]
R. Patel and P. Bhatt. A review paper on digital watermarking and its techniques. International Journal of Computer Applications, 110(1):10--13, 2015.
[21]
L. Y. Por, K. Wong, and K. O. Chee. Unispach: A text-based data hiding method using unicode space characters. Journal of Systems and Software, 85(5):1075--1082, 2012.
[22]
E. Sandhaus. The new york times annotated corpus. Linguistic Data Consortium, Philadelphia, 6(12):e26752, 2008.
[23]
M. Topkara, C. M. Taskiran, and E. J. Delp III. Natural language watermarking. In Electronic Imaging 2005, pages 441--452. International Society for Optics and Photonics, 2005.
[24]
U. Topkara, M. Topkara, and M. J. Atallah. The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In Proceedings of the 8th workshop on Multimedia and security, pages 164--174. ACM, 2006.
[25]
X. Zhou, W. Zhao, Z. Wang, and L. Pan. Security theory and attack analysis for text watermarking. In E-Business and Information System Security, 2009. EBISS'09. International Conference on, pages 1--6. IEEE, 2009.

Cited By

View all
  • (2024)Adaptive text watermark for large language modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693308(30718-30737)Online publication date: 21-Jul-2024
  • (2024)A review of digital watermarking techniques: Current trends, challenges and opportunitiesWeb Intelligence10.3233/WEB-23028022:4(523-553)Online publication date: 15-Nov-2024
  • (2024)A Survey of Text Watermarking in the Era of Large Language ModelsACM Computing Surveys10.1145/369162657:2(1-36)Online publication date: 3-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '16: Proceedings of the 20th International Database Engineering & Applications Symposium
July 2016
420 pages
ISBN:9781450341189
DOI:10.1145/2938503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Keio University: Keio University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Copyright Protection
  2. Document Authentication
  3. Tampering Detection
  4. Text Watermark
  5. Unicode Characters

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IDEAS '16

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)102
  • Downloads (Last 6 weeks)11
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Adaptive text watermark for large language modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693308(30718-30737)Online publication date: 21-Jul-2024
  • (2024)A review of digital watermarking techniques: Current trends, challenges and opportunitiesWeb Intelligence10.3233/WEB-23028022:4(523-553)Online publication date: 15-Nov-2024
  • (2024)A Survey of Text Watermarking in the Era of Large Language ModelsACM Computing Surveys10.1145/369162657:2(1-36)Online publication date: 3-Sep-2024
  • (2024)Watermarking technique for document images using discrete curvelet transform and discrete cosine transformMultimedia Tools and Applications10.1007/s11042-024-18770-383:40(87647-87671)Online publication date: 19-Mar-2024
  • (2023)Perspective Chapter: Text Watermark Analysis – Concept, Technique, and ApplicationsInformation Security and Privacy in the Digital World - Some Selected Topics10.5772/intechopen.106914Online publication date: 27-Sep-2023
  • (2023)Natural Language Watermarking via Paraphraser-based Lexical SubstitutionArtificial Intelligence10.1016/j.artint.2023.103859(103859)Online publication date: Jan-2023
  • (2023)Tamper Detection Technique for Text Images based on Vowels and Unicode Zero Length CharactersWireless Personal Communications10.1007/s11277-023-10724-6132:4(2421-2436)Online publication date: 10-Sep-2023
  • (2023)String Editing Under Pattern ConstraintsNew Trends in Computer Technologies and Applications10.1007/978-981-19-9582-8_2(13-24)Online publication date: 10-Feb-2023
  • (2022)Autoregressive Linguistic Steganography Based on BERT and Consistency CodingSecurity and Communication Networks10.1155/2022/90927852022Online publication date: 1-Jan-2022
  • (2022)Efficient watermarking technique for protection and authentication of document imagesMultimedia Tools and Applications10.1007/s11042-022-12174-x81:16(22985-23005)Online publication date: 8-Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media