Abstract
A new data hiding method via collaboratively-written articles with forged revision history records on collaborative writing platforms is proposed. The hidden message is camouflaged as a stego-document consisting of a stego-article and a revision history created through a simulated process of collaborative writing. The revisions are forged using a database constructed by mining word sequences used in real cases from an English Wikipedia XML dump. Four characteristics of article revisions are identified and utilized to embed secret messages, including the author of each revision, the number of corrected word sequences, the content of the corrected word sequences, and the word sequences replacing the corrected ones. Related problems arising in utilizing these characteristics for data hiding are identified and solved skillfully, resulting in an effective multiway method for hiding secret messages into the revision history. To create more realistic revisions, Huffman coding based on the word sequence frequencies collected from Wikipedia is applied to encode the word sequences. Good experimental results show the feasibility of the proposed method.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, A new data hiding method via revision history records on collaborative writing platforms
- A. M. Alattar and O. M. Alattar. 2004. Watermarking electronic text documents containing justified paragraphs and irregular line spacing. In Proceedings of the Security, Steganography, and Watermarking of Multimedia Contents VI. E. J. Delp III and P. W. Wong, Eds., SPIE, vol. 5306, 685--695.Google Scholar
- K. Bennett. 2004. Linguistic steganography: Survey, analysis, and robustness concerns for hiding information in text, CERIAS Tech. rep. 2004-13, Purdue Univ., West Lafayette, IN.Google Scholar
- L. Bergroth, H. Hakonen, and T. Raita. 2000. A survey of longest common subsequence algorithms. In Proceedings of the 7th International Symposium on String Processing and Information Retrieval. 39--48. Google ScholarDigital Library
- A. Biryukov and D. Khovratovich. 2009. Related-key cryptanalysis of the full AES-192 and AES-256. In Proceedings of the 15th International Conference on the Theory and Application of Cryptology and Information Security. 1--18. Google ScholarDigital Library
- A. Bogdanov, D. Khovratovich, and C. Rechberger. 2011. Biclique cryptanalysis of the full AES. In Proceedings of the 17th International Conference on the Theory and Application of Cryptology and Information Security. 344--371. Google ScholarDigital Library
- I. Bolshakov. 2004. A method of linguistic steganography based on collocationally-verified synonymy. In Proceedings of the 6th International Workshop on Information Hiding Workshop. 180--191. Google ScholarDigital Library
- A. Bronner and C. Monz. 2012. User edits classification using document revision histories. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. 356--366. Google ScholarDigital Library
- A. Bronner, M. Negri, Y. Mehdad, A. Fahrni, and C. Monz. 2012. CoSyne: Synchronizing multilingual wiki content. In Proceedings of the 8th Annual International Symposium on Wikis and Open Collaboration. Article 33, 1--4. Google ScholarDigital Library
- M. Chapman, G. Davida, and M. Rennhard. 2001. A practical and effective approach to large-scale automated linguistic steganography. In Proceedings of the 4th Information Security Conference. 156--165. Google ScholarDigital Library
- A. Cheddad, J. Condell, K. Curran, and P. McKevitt. 2010. Digital image steganography: Survey and analysis of current methods. Signal Process. 90, 3, 727--752. Google ScholarDigital Library
- G. Doerr and J. L. Dugelay. 2003. A guide tour of video watermarking. Signal Process: Image Commun. 18, 4, 263--282.Google ScholarCross Ref
- C. Dutrey, D. Bernhard, H. Bouamor, and A. Max. 2010. Local modifications and paraphrases in Wikipedia's revision history. Procesamiento de Lenguaje Natural 46, 51--58.Google Scholar
- M. Erdmann, K. Nakayama, T. Hara, and S. Nishio. 2009. Improving the extraction of bilingual terminology from Wikipedia. ACM Trans. Multimedia Comput. Commun. Appl. 5, 4, Article 31. Google ScholarDigital Library
- A. Kerckhoffs. 1883. La cryptographie militaire. J. Sciences Militaires 9, 5--38.Google Scholar
- Y. W. Kim, K. A. Moon, and S. I. Oh. 2003. A text watermarking algorithm based on word classification and inter-word space statistics. In Proceedings of the 7th International Conference on Document Analysis & Recognition. 775--779. Google ScholarDigital Library
- I. S. Lee and W. H. Tsai. 2010. A new approach to covert communication via pdf files. Signal Process. 90, 2, 557--565. Google ScholarDigital Library
- W. N. Lie and L. C. Chang. 2006. Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Trans. Multimedia 8, 1, 46--59. Google ScholarDigital Library
- P. Y. Lin, J. S. Lee, and C. C. Chang. 2011. Protecting the content integrity of digital imagery with fidelity preservation. ACM Trans. Multimedia Comput. Commun. Appl. 7, 3, Article 15. Google ScholarDigital Library
- T. Y. Liu and W. H. Tsai. 2007. A new steganographic method for data hiding in Microsoft Word documents by a change tracking technique. IEEE Trans. Inf. Forensics Secur. 2, 1, 24--30. Google ScholarDigital Library
- N. Madnani and B. J. Dorr. 2010. Generating phrasal and sentential paraphrases: A survey of data-driven methods. Computational Linguistics 3, 3, 341--387. Google ScholarDigital Library
- A. Max and G. Wisniewski. 2010. Mining naturally-occurring corrections and paraphrases from Wikipedia's revision history. In Proceedings of the Annual Conference on Language Resources and Evaluation. 3143--3148.Google Scholar
- S. P. Mohanty and B. K. Bhargava. 2008. Invisible watermarking based on creation and robust insertion-extraction of image adaptive watermarks. ACM Trans. Multimedia Comput. Commun. Appl. 5, 2, Article 12, 22 pages. Google ScholarDigital Library
- R. Nelken and E. Yamangil. 2008. Mining Wikipedia's article revision history for training computational linguistics algorithms. In Proceedings of the AAAI Workshop on Wikipedia & Artificial Intelligence: An Evolving Synergy. 31--36.Google Scholar
- M. H. Shirali-Shahreza and M. Shirali-Shahreza. 2008. A new synonym text steganography. In Proceedings of the International Conference on Intelligent Information Hiding and Multimedia Signal Processing. 1524--1526. Google ScholarDigital Library
- Spammimic.com. 2010. Spam mimic. http://www.spammimic.com.Google Scholar
- R. Stutsman, C. Grothoff, M. Atallah, and K. Grothoff. 2006. Lost in just the translation. In Proceedings of the ACM Symposium on Applied Computing. 338--345. Google ScholarDigital Library
- W. L. Tai, C. M. Yeh, and C. C. Chang. 2009. Reversible data hiding based on histogram modification of pixel differences. IEEE Trans. Circuits Syst. Video Technol. 19, 6, 904--908. Google ScholarDigital Library
- F. B. Viégas, M. Wattenberg, and K. Dave. 2004. Studying cooperation and conflict between authors with history flow visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 575--582. Google ScholarDigital Library
- P. Wayner. 1992. Mimic functions. Crypt. XVI, 3, 193--214. Google ScholarDigital Library
- P. Wayner. 2002. Disappearing Cryptography: Information Hiding: Steganography and Watermarking. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarDigital Library
Index Terms
- A new data hiding method via revision history records on collaborative writing platforms
Recommendations
High-capacity index based data hiding method
In this paper, a high-capacity data hiding method based on the index function is presented. The cover image is divided into non-overlapping sub-blocks, and the basis pixel is calculated by the index function. Difference values with other pixel-pairs are ...
Data Hiding Using Flexible Multi-bit MER
ISBAST '13: Proceedings of the 2013 International Symposium on Biometrics and Security TechnologiesMulti-bit minimum error replacement (MER) is a method that can embed multi-bit logo/secret data into k least-significant bits (LSBs) of cover data only introduces minimum embedding error (MEE). However, k-LSBs MER suffers from weak robustness. Moreover, ...
Data hiding method using image interpolation
Data hiding is to conceal the existence of secret data. A reversible data hiding method can extract the cover image without any distortion from the stego-image after the hidden data have been extracted. This paper proposes a new interpolation and a data ...
Comments