Skip to main content

A Proposed Approach to Compound File Fragment Identification

  • Conference paper
Network and System Security (NSS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8792))

Included in the following conference series:

Abstract

One of the biggest challenges in file fragment classification is the low classification rate of compound files known as high entropy files that contain different types of data, such as images and compressed text. It is seen that current methods for file fragment classification may not work for classifying these compound files. In this paper we propose a novel approach based on detecting deflate-encoded data in compound file fragments then decompress that data before applying a machine learning technique for classification. We apply our proposed method to classify Adobe portable document format (PDF) file type. Experiments showed high classification rate for the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Roussev, V., Quates, C.: File fragment encoding classification—An empirical approach. Digital Investigation 10(suppl.), S69–S77 (2013)

    Google Scholar 

  2. Penrose, P., Macfarlane, R., Buchanan, W.J.: Approaches to the classification of high entropy file fragments. Digital Investigation 10, 372–384 (2013)

    Article  Google Scholar 

  3. Roussev, V., Garfinkel, S.L.: File Fragment Classification-The Case for Specialized Approaches. In: Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering. SADFE 2009, pp. 3–14 (2009)

    Google Scholar 

  4. Rentz, D.: OpenOffice.org’s documentation of the microsoft compound document (2007), http://sc.openoffice.org/compdocfileformat.pdf (The Spreadsheet Project, OpenOffice.org )

  5. Park, B., Park, J., Lee, S.: Data concealment and detection in Microsoft Office 2007 files. Digital Investigation 5, 104–114 (2009)

    Article  Google Scholar 

  6. Meehan, J., Rose, T.S.C.C.: PDF Reference. Adobe Portable Document Format, Version, 1, 1 (2001)

    Google Scholar 

  7. Axelsson, S.: The Normalised Compression Distance as a file fragment classifier. Digital Investigation 7(suppl.), S24–S31 (2010)

    Google Scholar 

  8. Fitzgerald, S., Mathews, G., Morris, C., Zhulyn, O.: Using NLP techniques for file fragment classification. Digital Investigation 9(suppl.), S44–S49 (2012)

    Google Scholar 

  9. Wei-Jen, L., Ke, W., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, IAW 2005, pp. 64–71 (2005)

    Google Scholar 

  10. Sportiello, L., Zanero, S.: File Block Classification by Support Vector Machine. In: 2011 Sixth International Conference on Availability, Reliability and Security (ARES), pp. 307–312 (2011)

    Google Scholar 

  11. Calhoun, W.C., Coles, D.: Predicting the types of file fragments. Digital Investigation 5(suppl.), S14–S20 (2008)

    Google Scholar 

  12. Garfinkel, S., Farrell, P., Roussev, V., Dinolt, G.: Bringing science to digital forensics with standardized forensic corpora. Digital Investigation 6(suppl.), S2–S11 (2009)

    Google Scholar 

  13. Li, Q., Ong, A., Suganthan, P., Thing, V.: A novel support vector machine approach to high entropy data fragment classification. In: Proceedings of the South African Information Security Multi-Conference, SAISMC 2010 (2010)

    Google Scholar 

  14. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 27 (2011)

    Google Scholar 

  15. Karresand, M., Shahmehri, N.: File Type Identification of Data Fragments by Their Binary Structure. In: 2006 IEEE Information Assurance Workshop, pp. 140–147 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, K., Tran, D., Ma, W., Sharma, D. (2014). A Proposed Approach to Compound File Fragment Identification. In: Au, M.H., Carminati, B., Kuo, CC.J. (eds) Network and System Security. NSS 2015. Lecture Notes in Computer Science, vol 8792. Springer, Cham. https://doi.org/10.1007/978-3-319-11698-3_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11698-3_38

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11697-6

  • Online ISBN: 978-3-319-11698-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics