Hybrid Feature Selection Method for Improving File Fragment Classification

Algurashi, Alia; Wang, Wenjia

doi:10.1007/978-3-030-34885-4_29

Alia Algurashi¹⁰ &
Wenjia Wang¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11927))

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

1020 Accesses

Abstract

Identifying types of file fragments in isolation from their context is an essential task in digital forensic analysis and can be done with several methods. One common approach is to extract various types of features from file fragments as inputs for classification algorithms. However, this approach suffers from dimensionality curse as the number of the extracted features is too high, which causes the learning and classification to be both inefficient and inaccurate. This paper proposes a hybrid method to address this issue by using filters and wrappers to significantly reduce the number of features and also improve the accuracy of file type classification. First, it uses and combines three appropriate filters to filter out a large number of irrelevant and/or less important features, and then some wrappers to reduce the number of features further to the most salient ones. Our method was tested on some benchmark datasets - GovDocs, and the experimental results indicated that our method was able to not only reduce the number of features from 66,313 to 11–32, but also improve the accuracy of the classification, compared with other methods that used all the features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

McDaniel, M., Heydari, M.H.: Content based file type detection algorithms. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 10-pp. IEEE (2003)
Google Scholar
Li, W.-J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Information Assurance Workshop, IAW 2005. Proceedings from the Sixth Annual IEEE SMC, pp. 64–71. IEEE (2005)
Google Scholar
Karresand, M., Shahmehri, N.: Oscar — file type identification of binary data in disk clusters and RAM pages. In: Fischer-Hübner, S., Rannenberg, K., Yngström, L., Lindskog, S. (eds.) SEC 2006. IIFIP, vol. 201, pp. 413–424. Springer, Boston (2006). https://doi.org/10.1007/0-387-33406-8_35
Chapter Google Scholar
Beebe, N.L., Maddox, L.A., Liu, L., Sun, M.: Sceadan: using concatenated n-gram vectors for improved file and data type classification. IEEE Trans. Inf. Forensics Secur. 8(9), 1519–1530 (2013)
Article Google Scholar
Li, B., Wang, Q., Luo, J.: Forensic analysis of document fragment based on SVM. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2006, pp. 236–239. IEEE (2006)
Google Scholar
Fitzgerald, S., Mathews, G., Morris, C., Zhulyn, O.: Using NLP techniques for file fragment classification. Digit. Investig. 9, S44–S49 (2012)
Article Google Scholar
Garfinkel, S., Farrell, P., Roussev, V., Dinolt, G.: Bringing science to digital forensics with standardized forensic corpora. Digit. Investig. 6, S2–S11 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

University of East Anglia, Norwich, UK
Alia Algurashi & Wenjia Wang

Authors

Alia Algurashi
View author publications
You can also search for this author in PubMed Google Scholar
Wenjia Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenjia Wang .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, UK
Max Bramer
Middlesex University, London, UK
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Algurashi, A., Wang, W. (2019). Hybrid Feature Selection Method for Improving File Fragment Classification. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXVI. SGAI 2019. Lecture Notes in Computer Science(), vol 11927. Springer, Cham. https://doi.org/10.1007/978-3-030-34885-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-34885-4_29
Published: 19 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34884-7
Online ISBN: 978-3-030-34885-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics