Skip to main content
Log in

An alternative clustering approach for reconstructing cross cut shredded text documents

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose a clustering approach for solving the problem of reconstructing cross-cut shredded documents. This problem is important in the field of forensic science. Unlike other clustering approaches which are applied as a preprocessing step before the actual reconstruction algorithms, our clustering approach is part of the reconstruction process itself. We define a new cost function which mainly relies on black pixels to measure the cost of pairing two shreds together. The reconstruction algorithm creates multiple clusters which grow by adding additional shreds based on the cost function. Adding a shred may result in merging two or more clusters to produce a larger cluster. We, also, propose a way to involve the user in the reconstruction process. We compare our approach with a recent proposal and conclude that our approach gives better solutions in less time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aradhye, H. B. (2005). A generic method for determining up/down orientation of text in roman and non-roman scripts. Pattern Recognition, 38(11), 2114–2131.

    Article  Google Scholar 

  2. Bose, P. & Kilani Ghoudi, J.D.C. (1998). Detection of text-line orientation. In Canadian conference on computational geometry.

    Google Scholar 

  3. Chung, M. G., Fleck, M., & Forsyth, D. (1998). Jigsaw puzzle solver using shape and color. In ICSP ’98. Fourth international conference on signal processing proceedings (Vol. 2, pp. 877–880).

    Google Scholar 

  4. Dorigo, M., & Blum, C. (2005). Ant colony optimization theory: a survey. Theoretical Computer Science, 344(2–3), 243–278.

    Article  Google Scholar 

  5. Faure, C., & Vincent, N. (2007). Document image analysis for active reading. In SADPI ’07: proceedings of the 2007 international workshop on semantically aware document processing and indexing (pp. 7–14). New York: ACM Press.

    Chapter  Google Scholar 

  6. Goldberg, D., Malon, C., & Bern, M. (2004). A global approach to automatic solution of jigsaw puzzles. Computational Geometry, 28(2–3), 165–174.

    Article  Google Scholar 

  7. Justino, E., Oliveira, L. S., & Freitas, C. (2006). Reconstructing shredded documents through feature matching. Forensic Science International, 160(2–3), 140–147.

    Article  Google Scholar 

  8. Likforman-Sulem, L., Zahour, A., & Taconet, B. (2007). Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition, 9(2), 123–138.

    Google Scholar 

  9. Lu, X., Kataria, S., Brouwer, W. J., Wang, J. Z., Mitra, P., & Giles, C. L. (2009). Automated analysis of images in documents for intelligent document search. International Journal on Document Analysis and Recognition, 12(2), 65–81.

    Article  Google Scholar 

  10. Marques, M. A. O., & Freitas, C. O. A. (2009). Reconstructing strip-shredded documents using color as feature matching. In SAC ’09: Proceedings of the 2009 ACM symposium on applied computing (pp. 893–894). New York: ACM Press.

    Chapter  Google Scholar 

  11. Mladenoviá, N., & Hansen, P. (1997). Variable neighborhood search. Computers & Operations Research, 24(11), 1097–1100.

    Article  Google Scholar 

  12. Ogier, J. M., & Tombre, K. (2006). Madonne: document image analysis techniques for cultural heritage documents. In International conference on digital cultural heritage, Vienna, Austria.

    Google Scholar 

  13. Prandtstetter, M., & Raidl, G. R. (2008). Combining forces to reconstruct strip shredded text documents. In HM ’08: proceedings of the 5th international workshop on hybrid metaheuristics (pp. 175–189). Berlin: Springer.

    Chapter  Google Scholar 

  14. Prandtstetter, M., & Raidl, G. R. (2009). Meta-heuristics for reconstructing cross cut shredded text documents. In GECCO ’09: proceedings of the 11th annual conference on genetic and evolutionary computation (pp. 349–356). New York: ACM Press.

    Chapter  Google Scholar 

  15. Tybon, R., & Kerr, D. (2009). Automated solutions to incomplete jigsaw puzzles. Artificial Intelligence Review, 32(1–4), 77–99.

    Article  Google Scholar 

  16. Ukovich, A., Ramponi, G., Doulaverakis, H., Kompatsiaris, Y., & Strintzis, M. (2004). Shredded document reconstruction using mpeg-7 standard descriptors. In Signal processing and information technology. Proceedings of the fourth IEEE international symposium (pp. 334–337).

    Chapter  Google Scholar 

  17. Ukovich, A., Zacchigna, A., Ramponi, G., & Schoier, G. (2006). Using clustering for document reconstruction. In E. R. Dougherty, J. T. Astola, K. O. Egiazarian, N. M. Nasrabadi, & S. A. Rizvi (Eds.), Society of photo-optical instrumentation engineers (SPIE) conference series (pp. 168–179). Bellingham: SPIE Press.

    Google Scholar 

  18. Wang, Y., & Wahl, F. M. (1996). Interactive multiobjective decision-making approach to image reconstruction from projections. Signal Processing, 48(1), 67–75.

    Article  Google Scholar 

  19. Wenyin, L., Zhang, W., & Yan, L. (2007). An interactive example-driven approach to graphics recognition in engineering drawings. International Journal on Document Analysis and Recognition, 9(1), 13–29.

    Article  Google Scholar 

  20. Zhang, S., Li, B., & Xue, X. (2010). Semi-automatic dynamic auxiliary-tag-aided image annotation. Pattern Recognition, 43(2), 470–477.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Azzam Sleit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sleit, A., Massad, Y. & Musaddaq, M. An alternative clustering approach for reconstructing cross cut shredded text documents. Telecommun Syst 52, 1491–1501 (2013). https://doi.org/10.1007/s11235-011-9626-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-011-9626-x

Keywords

Navigation