Skip to main content
Log in

Detection of damaged files and measurement of similarity to originals using entropy graph characteristics

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

An information entropy graph shows the probabilities of each piece of information being included in a dataset as entropy values using information entropy. Well-known filetypes exhibit different information entropy graph characteristics; hence, they can be detected and differentiated using these characteristics. In this paper, a method that detects damaged files using information entropy graphs is proposed. The proposed method expands on conventional proposals that use only information entropy values to facilitate differentiation of different filetypes that present the same entropy values. In experiments conducted, patterns that have significance for analysis and detection were shown in the information entropy graphs of well-known files. In addition, even when files had damaged header, footer, or body regions, the similarity of the graph pattern was preserved, even though the entropy values differed. The proposed method also enables quantitative comparison of the similarity of files before and after damage with their original versions through graph pattern similarity tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423 623–656

    Article  MathSciNet  Google Scholar 

  2. Harris R (2006) Arriving at an anti-forensics consensus: examining how to define and control the anti-forensics problem. In: DFRWS ’06 vol 3, pp 44–49

    Article  Google Scholar 

  3. Sparta RL, Hamrock J, Bradley M (2007) Using entropy analysis to find encrypted and packed malware. IEEE Secur Priv 5:40–45

    Google Scholar 

  4. Jeong G, Choo E, Lee J, Bat-Erdene M, Lee H (2010) Generic unpacking using entropy analysis. In: Proceedings of 2010 5th International Conference on Malicious and Unwanted Software (MALWARE), pp 98–105

  5. Garfinkel SL (2007) Carving contiguous and fragmented files with fast object validation. Digit Invest 4S:S2–S12

    Article  Google Scholar 

  6. Pal A, Sencar HT, Memon N (2008) Detecting file fragmentation point using sequential hypothesis testing. Digit Invest 5:S2–S13

    Article  Google Scholar 

  7. Shahabi C, Kim SH, Nocera L, Constantinou G, Lu Y, Cai Y, Medioni G, Nevatia R, Banaei-Kashani F (2014) Multi source event detection and collection system for effective surveillance of criminal activity. J Inf Process Syst 10:1–22

    Article  Google Scholar 

  8. Juneja M, Sandhu PS (2013) A new approach for information security using an improved steganography technique. J Inf Process Syst 9:405–424

    Article  Google Scholar 

  9. Seo JH, Park HB (2006) Data-hiding method using digital watermark in the public multimedia network. J Inf Process Syst 2:82–87

    Article  Google Scholar 

  10. Teelink S, Erbacher RF (2006) Improving the computer forensic analysis process through visualization. Commun ACM 49:71–75

    Article  Google Scholar 

  11. Stallard T, Levitt K (2003) Automated analysis for digital forensic science: semantic integrity checking. In: Proceedings of the 19th Annual Computer Security Applications Conference, pp 160–167

  12. Gloe T (2012) Forensic analysis of ordered data structures on the example of JPEG files. In: Proceedings of the 2012 IEEE International Workshop on Information Forensics and Security (WIFS), pp 139–144

  13. De Bock J, De Smet P (2016) JPGcarve: an advanced tool for automated recovery of fragmented JPEG files. IEEE Trans Inf Forensics Secur 11:19–34

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2016-0-00304) supervised by the IITP (Institute for Information & communications Technology Promotion).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoojae Won.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cho, C., Chung, K. & Won, Y. Detection of damaged files and measurement of similarity to originals using entropy graph characteristics. J Supercomput 74, 6719–6728 (2018). https://doi.org/10.1007/s11227-017-2121-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2121-8

Keywords

Navigation