Abstract
Identifying types of file fragments in isolation from their context is an essential task in digital forensic analysis and can be done with several methods. One common approach is to extract various types of features from file fragments as inputs for classification algorithms. However, this approach suffers from dimensionality curse as the number of the extracted features is too high, which causes the learning and classification to be both inefficient and inaccurate. This paper proposes a hybrid method to address this issue by using filters and wrappers to significantly reduce the number of features and also improve the accuracy of file type classification. First, it uses and combines three appropriate filters to filter out a large number of irrelevant and/or less important features, and then some wrappers to reduce the number of features further to the most salient ones. Our method was tested on some benchmark datasets - GovDocs, and the experimental results indicated that our method was able to not only reduce the number of features from 66,313 to 11–32, but also improve the accuracy of the classification, compared with other methods that used all the features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McDaniel, M., Heydari, M.H.: Content based file type detection algorithms. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 10-pp. IEEE (2003)
Li, W.-J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Information Assurance Workshop, IAW 2005. Proceedings from the Sixth Annual IEEE SMC, pp. 64–71. IEEE (2005)
Karresand, M., Shahmehri, N.: Oscar — file type identification of binary data in disk clusters and RAM pages. In: Fischer-Hübner, S., Rannenberg, K., Yngström, L., Lindskog, S. (eds.) SEC 2006. IIFIP, vol. 201, pp. 413–424. Springer, Boston (2006). https://doi.org/10.1007/0-387-33406-8_35
Beebe, N.L., Maddox, L.A., Liu, L., Sun, M.: Sceadan: using concatenated n-gram vectors for improved file and data type classification. IEEE Trans. Inf. Forensics Secur. 8(9), 1519–1530 (2013)
Li, B., Wang, Q., Luo, J.: Forensic analysis of document fragment based on SVM. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2006, pp. 236–239. IEEE (2006)
Fitzgerald, S., Mathews, G., Morris, C., Zhulyn, O.: Using NLP techniques for file fragment classification. Digit. Investig. 9, S44–S49 (2012)
Garfinkel, S., Farrell, P., Roussev, V., Dinolt, G.: Bringing science to digital forensics with standardized forensic corpora. Digit. Investig. 6, S2–S11 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Algurashi, A., Wang, W. (2019). Hybrid Feature Selection Method for Improving File Fragment Classification. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXVI. SGAI 2019. Lecture Notes in Computer Science(), vol 11927. Springer, Cham. https://doi.org/10.1007/978-3-030-34885-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-34885-4_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34884-7
Online ISBN: 978-3-030-34885-4
eBook Packages: Computer ScienceComputer Science (R0)