skip to main content
research-article

A new data hiding method via revision history records on collaborative writing platforms

Published: 14 February 2014 Publication History

Abstract

A new data hiding method via collaboratively-written articles with forged revision history records on collaborative writing platforms is proposed. The hidden message is camouflaged as a stego-document consisting of a stego-article and a revision history created through a simulated process of collaborative writing. The revisions are forged using a database constructed by mining word sequences used in real cases from an English Wikipedia XML dump. Four characteristics of article revisions are identified and utilized to embed secret messages, including the author of each revision, the number of corrected word sequences, the content of the corrected word sequences, and the word sequences replacing the corrected ones. Related problems arising in utilizing these characteristics for data hiding are identified and solved skillfully, resulting in an effective multiway method for hiding secret messages into the revision history. To create more realistic revisions, Huffman coding based on the word sequence frequencies collected from Wikipedia is applied to encode the word sequences. Good experimental results show the feasibility of the proposed method.

Supplementary Material

a20-lee-apndx.pdf (lee.zip)
Supplemental movie, appendix, image and software files for, A new data hiding method via revision history records on collaborative writing platforms

References

[1]
A. M. Alattar and O. M. Alattar. 2004. Watermarking electronic text documents containing justified paragraphs and irregular line spacing. In Proceedings of the Security, Steganography, and Watermarking of Multimedia Contents VI. E. J. Delp III and P. W. Wong, Eds., SPIE, vol. 5306, 685--695.
[2]
K. Bennett. 2004. Linguistic steganography: Survey, analysis, and robustness concerns for hiding information in text, CERIAS Tech. rep. 2004-13, Purdue Univ., West Lafayette, IN.
[3]
L. Bergroth, H. Hakonen, and T. Raita. 2000. A survey of longest common subsequence algorithms. In Proceedings of the 7th International Symposium on String Processing and Information Retrieval. 39--48.
[4]
A. Biryukov and D. Khovratovich. 2009. Related-key cryptanalysis of the full AES-192 and AES-256. In Proceedings of the 15th International Conference on the Theory and Application of Cryptology and Information Security. 1--18.
[5]
A. Bogdanov, D. Khovratovich, and C. Rechberger. 2011. Biclique cryptanalysis of the full AES. In Proceedings of the 17th International Conference on the Theory and Application of Cryptology and Information Security. 344--371.
[6]
I. Bolshakov. 2004. A method of linguistic steganography based on collocationally-verified synonymy. In Proceedings of the 6th International Workshop on Information Hiding Workshop. 180--191.
[7]
A. Bronner and C. Monz. 2012. User edits classification using document revision histories. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. 356--366.
[8]
A. Bronner, M. Negri, Y. Mehdad, A. Fahrni, and C. Monz. 2012. CoSyne: Synchronizing multilingual wiki content. In Proceedings of the 8th Annual International Symposium on Wikis and Open Collaboration. Article 33, 1--4.
[9]
M. Chapman, G. Davida, and M. Rennhard. 2001. A practical and effective approach to large-scale automated linguistic steganography. In Proceedings of the 4th Information Security Conference. 156--165.
[10]
A. Cheddad, J. Condell, K. Curran, and P. McKevitt. 2010. Digital image steganography: Survey and analysis of current methods. Signal Process. 90, 3, 727--752.
[11]
G. Doerr and J. L. Dugelay. 2003. A guide tour of video watermarking. Signal Process: Image Commun. 18, 4, 263--282.
[12]
C. Dutrey, D. Bernhard, H. Bouamor, and A. Max. 2010. Local modifications and paraphrases in Wikipedia's revision history. Procesamiento de Lenguaje Natural 46, 51--58.
[13]
M. Erdmann, K. Nakayama, T. Hara, and S. Nishio. 2009. Improving the extraction of bilingual terminology from Wikipedia. ACM Trans. Multimedia Comput. Commun. Appl. 5, 4, Article 31.
[14]
A. Kerckhoffs. 1883. La cryptographie militaire. J. Sciences Militaires 9, 5--38.
[15]
Y. W. Kim, K. A. Moon, and S. I. Oh. 2003. A text watermarking algorithm based on word classification and inter-word space statistics. In Proceedings of the 7th International Conference on Document Analysis & Recognition. 775--779.
[16]
I. S. Lee and W. H. Tsai. 2010. A new approach to covert communication via pdf files. Signal Process. 90, 2, 557--565.
[17]
W. N. Lie and L. C. Chang. 2006. Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Trans. Multimedia 8, 1, 46--59.
[18]
P. Y. Lin, J. S. Lee, and C. C. Chang. 2011. Protecting the content integrity of digital imagery with fidelity preservation. ACM Trans. Multimedia Comput. Commun. Appl. 7, 3, Article 15.
[19]
T. Y. Liu and W. H. Tsai. 2007. A new steganographic method for data hiding in Microsoft Word documents by a change tracking technique. IEEE Trans. Inf. Forensics Secur. 2, 1, 24--30.
[20]
N. Madnani and B. J. Dorr. 2010. Generating phrasal and sentential paraphrases: A survey of data-driven methods. Computational Linguistics 3, 3, 341--387.
[21]
A. Max and G. Wisniewski. 2010. Mining naturally-occurring corrections and paraphrases from Wikipedia's revision history. In Proceedings of the Annual Conference on Language Resources and Evaluation. 3143--3148.
[22]
S. P. Mohanty and B. K. Bhargava. 2008. Invisible watermarking based on creation and robust insertion-extraction of image adaptive watermarks. ACM Trans. Multimedia Comput. Commun. Appl. 5, 2, Article 12, 22 pages.
[23]
R. Nelken and E. Yamangil. 2008. Mining Wikipedia's article revision history for training computational linguistics algorithms. In Proceedings of the AAAI Workshop on Wikipedia & Artificial Intelligence: An Evolving Synergy. 31--36.
[24]
M. H. Shirali-Shahreza and M. Shirali-Shahreza. 2008. A new synonym text steganography. In Proceedings of the International Conference on Intelligent Information Hiding and Multimedia Signal Processing. 1524--1526.
[25]
Spammimic.com. 2010. Spam mimic. http://www.spammimic.com.
[26]
R. Stutsman, C. Grothoff, M. Atallah, and K. Grothoff. 2006. Lost in just the translation. In Proceedings of the ACM Symposium on Applied Computing. 338--345.
[27]
W. L. Tai, C. M. Yeh, and C. C. Chang. 2009. Reversible data hiding based on histogram modification of pixel differences. IEEE Trans. Circuits Syst. Video Technol. 19, 6, 904--908.
[28]
F. B. Viégas, M. Wattenberg, and K. Dave. 2004. Studying cooperation and conflict between authors with history flow visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 575--582.
[29]
P. Wayner. 1992. Mimic functions. Crypt. XVI, 3, 193--214.
[30]
P. Wayner. 2002. Disappearing Cryptography: Information Hiding: Steganography and Watermarking. Morgan Kaufmann Publishers Inc., San Francisco, CA.

Cited By

View all
  • (2021)Towards a new era of mass data collection: Assessing pandemic surveillance technologies to preserve user privacyTechnological Forecasting and Social Change10.1016/j.techfore.2021.120681167(120681)Online publication date: Jun-2021
  • (2020)A high-capacity performance-preserving blind technique for reversible information hiding via MIDI files using delta timesMultimedia Tools and Applications10.1007/s11042-019-08526-979:25-26(17281-17302)Online publication date: 23-Jan-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 10, Issue 2
February 2014
142 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2579228
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2014
Accepted: 01 September 2013
Revised: 01 May 2013
Received: 01 February 2013
Published in TOMM Volume 10, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data hiding
  2. Huffman coding
  3. Wikipedia mining
  4. collaborative writing
  5. revision history

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Towards a new era of mass data collection: Assessing pandemic surveillance technologies to preserve user privacyTechnological Forecasting and Social Change10.1016/j.techfore.2021.120681167(120681)Online publication date: Jun-2021
  • (2020)A high-capacity performance-preserving blind technique for reversible information hiding via MIDI files using delta timesMultimedia Tools and Applications10.1007/s11042-019-08526-979:25-26(17281-17302)Online publication date: 23-Jan-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media