Skip to main content
Log in

A high splicing accuracy solution to reconstruction of cross-cut shredded text document problem

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Reconstruction of Cross-Cut Shredded Text Documents (RCCSTD) plays an important role in both forensics and archeology. It is a special case of the square jigsaw puzzle problem and has attracted the attention of many researchers. In the light of the low accuracy of existing RCCSTD solutions, especially regarding row splicing, this paper proposes a high accuracy splicing solution by using both a combination strategy and a divide-and-conquer strategy. Unlike other approaches based on the Swarm Intelligence Algorithm, where the results and splicing accuracy are bound up with the defined cost function and the number of fragments, in this case a clustering algorithm was used to transform a single RCCSTD problem into several Reconstruction of Strip Shredded Text Document (RSSTD) problems. The dual combination and divide-and-conquer strategies proposed in this paper are designed to improve the splicing accuracy in a row and make the algorithm more stable as the number of fragments in a row increases. Experiments were carried out on 10 text documents (5 Chinese and 5 English), which were shredded into ten patterns. The returned accuracy measures were over 0.95 for the Chinese documents and over 0.85 for the English ones, across all patterns. A comparison is made between our approach and another recently proposed solution, and we conclude that our approach gives both higher splicing accuracy and greater stability regardless of the number of fragments in a row.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. China Undergraduate Mathematical Contest in Modelling (2013) CUMCM-2013 contest problems [WWW document]. URL http://en.mcm.edu.cn/problem/2013/2013_en.html. Accesses on 31 July 2017

  2. Cho TS, Avidan S, Freeman WT (2010) A probabilistic image jigsaw puzzle solver[C]. Computer Vision and Pattern Recognition. IEEE, pp 183–190

  3. Cui J, Liu Y, Xu Y et al (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43(4):996–1002

    Article  Google Scholar 

  4. Goldberg D, Malon C, Bern M (2002) A global approach to automatic solution of jigsaw puzzles. Comput Geom Theory Appl 28(2):165–174

    MathSciNet  Google Scholar 

  5. Gong YJ, Ge YF, Li JJ et al (2016) A splicing-driven memetic algorithm for reconstructing cross-cut shredded text documents. Appl Soft Comput 45:163–172

    Article  Google Scholar 

  6. Huang HS (2005) Study on new methods to solve traveling salesman problem. Tianjin University, Tianjin (Chinese)

    Google Scholar 

  7. Justino E, Oliveira LS, Freitas C (2006) Reconstructing shredded documents through feature matching. Forensic Sci Int 160(2):140–147

    Article  Google Scholar 

  8. Kosiba DA, Devaux PM, Balasubramanian S et al (2002) An automatic jigsaw puzzle solver[C]. Iapr International Conference on Pattern Recognition, 1994. Vol. 1 - Conference A: Computer Vision & Image Processing, vol 1. IEEE, pp 616–618

  9. Lin HY, Fan-Chiang WC (2012) Reconstruction of shredded document based on image feature matching. Expert Syst Appl 39(3):3324–3332

    Article  Google Scholar 

  10. Liu Y, Zhang X, Cui J et al (2010) Visual analysis of child-adult interactive behaviors in video sequences[C]. International Conference on Virtual Systems and Multimedia. IEEE, pp 26–33

  11. Liu Y, Cui J, Zhao H et al (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking[C]. International Conference on Pattern Recognition. IEEE, pp 898–901

  12. Liu Y, Nie L, Han L et al (2015) Action2Activity: recognizing complex activities from sensor data[C]. International Conference on Artificial Intelligence, pp 1617–1623

  13. Liu Y, Nie L, Liu L et al (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115

    Article  Google Scholar 

  14. Liu L, Cheng L, Liu Y et al (2016) Recognizing complex activities by a probabilistic interval-based model[C]. Thirtieth AAAI Conference on Artificial Intelligence, pp 1266–1272

  15. Liu Y, Zheng Y, Liang Y et al (2016) Urban water quality prediction based on multi-task multi-view learning[C]. 25th International Joint Conference on Artificial Intelligence, pp 2576–2582

  16. Liu Y, Zhang LM, Nie LQ, et al (2016) Fortune teller: predicting your career path[C]. Thirtieth AAAI Conference on Artificial Intelligence, pp 201–207

  17. Lu Y, Wei Y, Liu L et al (2016) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl:1–19

  18. Ma L (2002) Reviews on the algorithm of traveling salesman problem. Mathematics in Practice and Theory 30(2):156–165 (Chinese)

    Google Scholar 

  19. Ma J, Zhao J, Tian J et al (2014) Robust point matching via vector field consensus. IEEE Trans Image Process 23(4):1706–1721

    Article  MathSciNet  MATH  Google Scholar 

  20. Ma J, Qiu W, Zhao J et al (2015) Robust L2E, estimation of transformation for non-rigid registration. IEEE Trans Signal Process 63(5):1115–1129

    Article  MathSciNet  Google Scholar 

  21. Pomeranz D, Shemesh M, Benshahar O (2011) A fully automated greedy square jigsaw puzzle solver[C]. Computer Vision and Pattern Recognition. IEEE, pp 9–16

  22. Prandtstetter M (2009) Hybrid optimization methods for warehouse logistics and the reconstruction of destroyed paper documents [D]. Vienna University of Technology

  23. Prandtstetter M (2009) Meta-heuristics for reconstructing cross cut shredded text documents[C]. Genetic and Evolutionary Computation Conference, GECCO 2009, Proceedings, Montreal, Québec, Canada, July. DBLP, pp 349–356

  24. Prandtstetter M, Raidl GR (2008) Combining forces to reconstruct strip shredded text documents[M]. Hybrid Metaheuristics. Springer, Berlin

    Google Scholar 

  25. Preotiuc-Pietro D, Liu Y, Hopkins DJ et al (2017) Beyond binary labels: political ideology prediction of twitter users[C]. The 55th annual meeting of the Association for Computational Linguistics, pp 1–12

  26. Schauer C, Prandtstetter M (2010) A memetic algorithm for reconstructing cross-cut shredded text documents[C]. International Conference on Hybrid Metaheuristics. Springer-Verlag, pp 103–117

  27. Sleit A (2013) An alternative clustering approach for reconstructing cross cut shredded text documents. Telecommun Syst 52(3):1491–1501

    Article  Google Scholar 

  28. Ukovich A, Ramponi G, Doulaverakis H et al (2004) Shredded document reconstruction using MPEG-7 standard descriptors[C]. IEEE International Symposium on Signal Processing and Information Technology. IEEE, pp 334–337

  29. Wang Y, Ji DC (2014) A two-stage approach for reconstruction of cross-cut shredded text documents[C]. Tenth International Conference on Computational Intelligence and Security. IEEE Computer Society, pp 12–16

  30. Wolfson H, Schonberg E, Kalvin A et al (1988) Solving jigsaw puzzles by computer. Ann Oper Res 12(1):51–64

    Article  MathSciNet  Google Scholar 

  31. Xu HD, Zheng J, Zhuang ZW, Fan S (2014) A solution to reconstruct cross-cut shredded text documents based on character recognition and genetic algorithm. Abstr Appl Anal:1–12

  32. Yan YN (2008) Parameter optimization of ant colony algorithm and its application. Nanjing University of Science and Technology, Nanjing (Chinese)

    Google Scholar 

  33. Zhao B, Zhou Y, Zhang Z et al (2014) Information quantity based automatic reconstruction of shredded Chinese documents[C]. IEEE, International Conference on TOOLS with Artificial Intelligence. IEEE Computer Society, pp 1016–1020

  34. Zhou J (2007) Improved algorithm of median filter in image processing [D]. Beijing University of Posts and Telecommunications, Beijing (Chinese)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the National Science Foundation of China (11172016, 11472022, 11772016). The authors would like to thank the reviewers of this paper for their constructive and thoughtful comments. The authors thank Editsprings (www.editsprings.com) for its linguistic assistance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youjun Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Ke, D., Wang, Z. et al. A high splicing accuracy solution to reconstruction of cross-cut shredded text document problem. Multimed Tools Appl 77, 19281–19300 (2018). https://doi.org/10.1007/s11042-017-5389-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5389-z

Keywords

Navigation