skip to main content
10.1145/1774088.1774431acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
poster

Fast file-type identification

Published: 22 March 2010 Publication History

Abstract

This paper proposes two techniques to reduce the classification time of content-based file type identification. The first is a feature selection technique, which uses a subset of highly-occurring byte patterns in building the representative model of a file type and classifying files. The second is a content sampling technique, which uses a subset of file content in obtaining its byte-frequency distribution. Our initial experiments show that the proposed approaches are promising even the simple 1-gram features are used for the classification.

References

[1]
Li, W. J., Wang, K., Stolfo, S., and Herzog, B. Fileprints: Identifying file types by n-gram analysis. In Workshop on Information Assurance and security (IAW'05) (United States Military Academy, West Point, New York, USA, June 2005), pp. 64--71.
[2]
McDaniel, M., and Heydari, M. H. Content based file type detection algorithms. In proceedings of the 36th Annual Hawaii International Conference on System Sciences (January 2003), vol. 9, p. 332a.

Cited By

View all
  • (2022)Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types IdentificationAdvances in Information and Communication10.1007/978-3-030-98015-3_21(314-325)Online publication date: 12-Mar-2022
  • (2020)Hierarchy-Based File Fragment ClassificationMachine Learning and Knowledge Extraction10.3390/make20300122:3(216-232)Online publication date: 3-Aug-2020
  • (2019)MOOCRec 2 for Humanities - Learning Style Based MOOC Recommender and Search Engine2019 International Conference on Advancements in Computing (ICAC)10.1109/ICAC49085.2019.9103397(470-475)Online publication date: Dec-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing
March 2010
2712 pages
ISBN:9781605586397
DOI:10.1145/1774088
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. byte frequency distribution
  2. file type identification

Qualifiers

  • Poster

Funding Sources

Conference

SAC'10
Sponsor:
SAC'10: The 2010 ACM Symposium on Applied Computing
March 22 - 26, 2010
Sierre, Switzerland

Acceptance Rates

SAC '10 Paper Acceptance Rate 364 of 1,353 submissions, 27%;
Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types IdentificationAdvances in Information and Communication10.1007/978-3-030-98015-3_21(314-325)Online publication date: 12-Mar-2022
  • (2020)Hierarchy-Based File Fragment ClassificationMachine Learning and Knowledge Extraction10.3390/make20300122:3(216-232)Online publication date: 3-Aug-2020
  • (2019)MOOCRec 2 for Humanities - Learning Style Based MOOC Recommender and Search Engine2019 International Conference on Advancements in Computing (ICAC)10.1109/ICAC49085.2019.9103397(470-475)Online publication date: Dec-2019
  • (2016)Data Type Classification: Hierarchical Class-to-Type ModelingAdvances in Digital Forensics XII10.1007/978-3-319-46279-0_17(325-343)Online publication date: 20-Sep-2016
  • (2014)Distributed autonomous Neuro-Gen Learning Engine for content-based document file type identification2014 International Conference on Cyber and IT Service Management (CITSM)10.1109/CITSM.2014.7042177(63-68)Online publication date: Nov-2014
  • (2014)Taxonomy of Data Fragment Classification TechniquesDigital Forensics and Cyber Crime10.1007/978-3-319-14289-0_6(67-85)Online publication date: 23-Dec-2014
  • (2013)Classification and Recovery of Fragmented Multimedia Files using the File Carving ApproachInternational Journal of Mobile Computing and Multimedia Communications10.4018/jmcmc.20130701045:3(50-67)Online publication date: 1-Jul-2013
  • (2013)SceadanIEEE Transactions on Information Forensics and Security10.1109/TIFS.2013.22747288:9(1519-1530)Online publication date: 1-Sep-2013
  • (2013)A Comprehensive Literature Review of File CarvingProceedings of the 2013 International Conference on Availability, Reliability and Security10.1109/ARES.2013.62(475-484)Online publication date: 2-Sep-2013
  • (2013)Feature-based Type Identification of File FragmentsSecurity and Communication Networks10.1002/sec.5536:1(115-128)Online publication date: 1-Jan-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media