Skip to main content

Hybrid Feature Selection Method for Improving File Fragment Classification

  • Conference paper
  • First Online:
Artificial Intelligence XXXVI (SGAI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11927))

  • 1020 Accesses

Abstract

Identifying types of file fragments in isolation from their context is an essential task in digital forensic analysis and can be done with several methods. One common approach is to extract various types of features from file fragments as inputs for classification algorithms. However, this approach suffers from dimensionality curse as the number of the extracted features is too high, which causes the learning and classification to be both inefficient and inaccurate. This paper proposes a hybrid method to address this issue by using filters and wrappers to significantly reduce the number of features and also improve the accuracy of file type classification. First, it uses and combines three appropriate filters to filter out a large number of irrelevant and/or less important features, and then some wrappers to reduce the number of features further to the most salient ones. Our method was tested on some benchmark datasets - GovDocs, and the experimental results indicated that our method was able to not only reduce the number of features from 66,313 to 11–32, but also improve the accuracy of the classification, compared with other methods that used all the features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. McDaniel, M., Heydari, M.H.: Content based file type detection algorithms. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 10-pp. IEEE (2003)

    Google Scholar 

  2. Li, W.-J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Information Assurance Workshop, IAW 2005. Proceedings from the Sixth Annual IEEE SMC, pp. 64–71. IEEE (2005)

    Google Scholar 

  3. Karresand, M., Shahmehri, N.: Oscar — file type identification of binary data in disk clusters and RAM pages. In: Fischer-Hübner, S., Rannenberg, K., Yngström, L., Lindskog, S. (eds.) SEC 2006. IIFIP, vol. 201, pp. 413–424. Springer, Boston (2006). https://doi.org/10.1007/0-387-33406-8_35

    Chapter  Google Scholar 

  4. Beebe, N.L., Maddox, L.A., Liu, L., Sun, M.: Sceadan: using concatenated n-gram vectors for improved file and data type classification. IEEE Trans. Inf. Forensics Secur. 8(9), 1519–1530 (2013)

    Article  Google Scholar 

  5. Li, B., Wang, Q., Luo, J.: Forensic analysis of document fragment based on SVM. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2006, pp. 236–239. IEEE (2006)

    Google Scholar 

  6. Fitzgerald, S., Mathews, G., Morris, C., Zhulyn, O.: Using NLP techniques for file fragment classification. Digit. Investig. 9, S44–S49 (2012)

    Article  Google Scholar 

  7. Garfinkel, S., Farrell, P., Roussev, V., Dinolt, G.: Bringing science to digital forensics with standardized forensic corpora. Digit. Investig. 6, S2–S11 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenjia Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Algurashi, A., Wang, W. (2019). Hybrid Feature Selection Method for Improving File Fragment Classification. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXVI. SGAI 2019. Lecture Notes in Computer Science(), vol 11927. Springer, Cham. https://doi.org/10.1007/978-3-030-34885-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34885-4_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34884-7

  • Online ISBN: 978-3-030-34885-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics