Abstract
Reconstruction of Cross-Cut Shredded Text Documents (RCCSTD) plays an important role in both forensics and archeology. It is a special case of the square jigsaw puzzle problem and has attracted the attention of many researchers. In the light of the low accuracy of existing RCCSTD solutions, especially regarding row splicing, this paper proposes a high accuracy splicing solution by using both a combination strategy and a divide-and-conquer strategy. Unlike other approaches based on the Swarm Intelligence Algorithm, where the results and splicing accuracy are bound up with the defined cost function and the number of fragments, in this case a clustering algorithm was used to transform a single RCCSTD problem into several Reconstruction of Strip Shredded Text Document (RSSTD) problems. The dual combination and divide-and-conquer strategies proposed in this paper are designed to improve the splicing accuracy in a row and make the algorithm more stable as the number of fragments in a row increases. Experiments were carried out on 10 text documents (5 Chinese and 5 English), which were shredded into ten patterns. The returned accuracy measures were over 0.95 for the Chinese documents and over 0.85 for the English ones, across all patterns. A comparison is made between our approach and another recently proposed solution, and we conclude that our approach gives both higher splicing accuracy and greater stability regardless of the number of fragments in a row.
Similar content being viewed by others
References
China Undergraduate Mathematical Contest in Modelling (2013) CUMCM-2013 contest problems [WWW document]. URL http://en.mcm.edu.cn/problem/2013/2013_en.html. Accesses on 31 July 2017
Cho TS, Avidan S, Freeman WT (2010) A probabilistic image jigsaw puzzle solver[C]. Computer Vision and Pattern Recognition. IEEE, pp 183–190
Cui J, Liu Y, Xu Y et al (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43(4):996–1002
Goldberg D, Malon C, Bern M (2002) A global approach to automatic solution of jigsaw puzzles. Comput Geom Theory Appl 28(2):165–174
Gong YJ, Ge YF, Li JJ et al (2016) A splicing-driven memetic algorithm for reconstructing cross-cut shredded text documents. Appl Soft Comput 45:163–172
Huang HS (2005) Study on new methods to solve traveling salesman problem. Tianjin University, Tianjin (Chinese)
Justino E, Oliveira LS, Freitas C (2006) Reconstructing shredded documents through feature matching. Forensic Sci Int 160(2):140–147
Kosiba DA, Devaux PM, Balasubramanian S et al (2002) An automatic jigsaw puzzle solver[C]. Iapr International Conference on Pattern Recognition, 1994. Vol. 1 - Conference A: Computer Vision & Image Processing, vol 1. IEEE, pp 616–618
Lin HY, Fan-Chiang WC (2012) Reconstruction of shredded document based on image feature matching. Expert Syst Appl 39(3):3324–3332
Liu Y, Zhang X, Cui J et al (2010) Visual analysis of child-adult interactive behaviors in video sequences[C]. International Conference on Virtual Systems and Multimedia. IEEE, pp 26–33
Liu Y, Cui J, Zhao H et al (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking[C]. International Conference on Pattern Recognition. IEEE, pp 898–901
Liu Y, Nie L, Han L et al (2015) Action2Activity: recognizing complex activities from sensor data[C]. International Conference on Artificial Intelligence, pp 1617–1623
Liu Y, Nie L, Liu L et al (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Liu L, Cheng L, Liu Y et al (2016) Recognizing complex activities by a probabilistic interval-based model[C]. Thirtieth AAAI Conference on Artificial Intelligence, pp 1266–1272
Liu Y, Zheng Y, Liang Y et al (2016) Urban water quality prediction based on multi-task multi-view learning[C]. 25th International Joint Conference on Artificial Intelligence, pp 2576–2582
Liu Y, Zhang LM, Nie LQ, et al (2016) Fortune teller: predicting your career path[C]. Thirtieth AAAI Conference on Artificial Intelligence, pp 201–207
Lu Y, Wei Y, Liu L et al (2016) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl:1–19
Ma L (2002) Reviews on the algorithm of traveling salesman problem. Mathematics in Practice and Theory 30(2):156–165 (Chinese)
Ma J, Zhao J, Tian J et al (2014) Robust point matching via vector field consensus. IEEE Trans Image Process 23(4):1706–1721
Ma J, Qiu W, Zhao J et al (2015) Robust L2E, estimation of transformation for non-rigid registration. IEEE Trans Signal Process 63(5):1115–1129
Pomeranz D, Shemesh M, Benshahar O (2011) A fully automated greedy square jigsaw puzzle solver[C]. Computer Vision and Pattern Recognition. IEEE, pp 9–16
Prandtstetter M (2009) Hybrid optimization methods for warehouse logistics and the reconstruction of destroyed paper documents [D]. Vienna University of Technology
Prandtstetter M (2009) Meta-heuristics for reconstructing cross cut shredded text documents[C]. Genetic and Evolutionary Computation Conference, GECCO 2009, Proceedings, Montreal, Québec, Canada, July. DBLP, pp 349–356
Prandtstetter M, Raidl GR (2008) Combining forces to reconstruct strip shredded text documents[M]. Hybrid Metaheuristics. Springer, Berlin
Preotiuc-Pietro D, Liu Y, Hopkins DJ et al (2017) Beyond binary labels: political ideology prediction of twitter users[C]. The 55th annual meeting of the Association for Computational Linguistics, pp 1–12
Schauer C, Prandtstetter M (2010) A memetic algorithm for reconstructing cross-cut shredded text documents[C]. International Conference on Hybrid Metaheuristics. Springer-Verlag, pp 103–117
Sleit A (2013) An alternative clustering approach for reconstructing cross cut shredded text documents. Telecommun Syst 52(3):1491–1501
Ukovich A, Ramponi G, Doulaverakis H et al (2004) Shredded document reconstruction using MPEG-7 standard descriptors[C]. IEEE International Symposium on Signal Processing and Information Technology. IEEE, pp 334–337
Wang Y, Ji DC (2014) A two-stage approach for reconstruction of cross-cut shredded text documents[C]. Tenth International Conference on Computational Intelligence and Security. IEEE Computer Society, pp 12–16
Wolfson H, Schonberg E, Kalvin A et al (1988) Solving jigsaw puzzles by computer. Ann Oper Res 12(1):51–64
Xu HD, Zheng J, Zhuang ZW, Fan S (2014) A solution to reconstruct cross-cut shredded text documents based on character recognition and genetic algorithm. Abstr Appl Anal:1–12
Yan YN (2008) Parameter optimization of ant colony algorithm and its application. Nanjing University of Science and Technology, Nanjing (Chinese)
Zhao B, Zhou Y, Zhang Z et al (2014) Information quantity based automatic reconstruction of shredded Chinese documents[C]. IEEE, International Conference on TOOLS with Artificial Intelligence. IEEE Computer Society, pp 1016–1020
Zhou J (2007) Improved algorithm of median filter in image processing [D]. Beijing University of Posts and Telecommunications, Beijing (Chinese)
Acknowledgements
This research was supported by the National Science Foundation of China (11172016, 11472022, 11772016). The authors would like to thank the reviewers of this paper for their constructive and thoughtful comments. The authors thank Editsprings (www.editsprings.com) for its linguistic assistance.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, J., Ke, D., Wang, Z. et al. A high splicing accuracy solution to reconstruction of cross-cut shredded text document problem. Multimed Tools Appl 77, 19281–19300 (2018). https://doi.org/10.1007/s11042-017-5389-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5389-z