skip to main content
10.1145/2459976.2460004acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsiirwConference Proceedingsconference-collections
research-article

Locating executable fragments with Concordia, a scalable, semantics-based architecture

Published: 08 January 2013 Publication History

Abstract

The amount of digital evidence that must be processed by forensic tools and analysts is growing rapidly. This makes automated analysis a critical activity; an activity where continuous improvement is crucial. Concordia is a platform for investigating code semantics. One of Concordia's functions is identification of unknown code fragments; attempting to elucidate the possible objectives and origination of this type of evidence is our ultimate goal. Here we provide a synopsis of a method that identifies and locates code fragments using n-gram and semantics-based features and a k nearest neighbors classifier. Our objective is to identify a set of candidate files that may contain the unknown and supply additional details to isolate it within this set. To accomplish this task, Concordia uses the MapReduce model to process a large set of invariants to provide forensic experts a more efficient and automated way to produce solid intelligence about a growing body of evidence.

References

[1]
T. Daly and L. Burns. Concurrent architecture for automated malware classification. In Proceedings of the 2010 43rd Hawaii International Conference on System Sciences, HICSS '10, pages 1--8, Washington, DC, USA, 2010. IEEE Computer Society.
[2]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51:107--113, January 2008.
[3]
R. Duda, P. Hart, and D. Stork. Pattern classification. Pattern Classification and Scene Analysis: Pattern Classification. Wiley, 2001.
[4]
S. Fitzgerald, G. Mathews, C. Morris, and O. Zhulyn. Using nlp techniques for file fragment classification. Digital Investigation, 9, Supplement(0):S44 -- S49, 2012. The Proceedings of the Twelfth Annual DFRWS Conference 12th Annual Digital Forensics Research Conference.
[5]
T. A. S. Foundation. The apache avro data serialization system. http://avro.apache.org. Last updated: 7/18/2012.
[6]
S. Gopal, Y. Yang, K. Salomatin, and J. Carbonell. Statistical learning for file-type identification. In 2011 10th International Conference on Machine Learning and Applications Workshops, pages 68--73. IEEE, 2011.
[7]
J. Hawkins. On Intelligence. Times Books, 1st edition, 2004.
[8]
J. Jang, D. Brumley, and S. Venkataraman. Bitshred: feature hashing malware for scalable triage and semantic analysis. In Proceedings of the 18th ACM conference on Computer and communications security, CCS '11, pages 309--320, New York, NY, USA, 2011. ACM.
[9]
W. Li, K. Wang, S. Stolfo, and B. Herzog. Fileprints: Identifying file types by n-gram analysis. In Information Assurance Workshop, 2005. IAW'05. Proceedings from the Sixth Annual IEEE SMC, pages 64--71. IEEE, 2005.
[10]
R. Linger, S. Prowell, and K. Sayre. Computing the behavior of malicious code with function extraction technology. In Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies, CSIIRW '09, pages 36:1--36:2, New York, NY, USA, 2009. ACM.
[11]
B. Lu, F. Liu, X. Ge, B. Liu, and X. Luo. A software birthmark based on dynamic opcode n-gram. In Semantic Computing, 2007. ICSC 2007. International Conference on, pages 37--44. Ieee, 2007.
[12]
R. C. Mayer. Filetype identification using long, summarized n-grams. Master's thesis, Naval Postgraduate School, March 2011.
[13]
G. Myles and C. Collberg. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, SAC '05, pages 314--318, New York, NY, USA, 2005. ACM.
[14]
V. Roussev and S. Garfinkel. File fragment classification-the case for specialized approaches. In Systematic Approaches to Digital Forensic Engineering, 2009. SADFE '09. Fourth International IEEE Workshop on, pages 3--14, May 2009.
[15]
W. Sung. Algorithms in Bioinformatics: A Practical Introduction. Chapman and Hall/CRC mathematical & computational biology series. CRC Press, 2009.
[16]
R. Wartell, Y. Zhou, K. W. Hamlen, M. Kantarcioglu, and B. Thuraisingham. Differentiating code from data in x86 binaries. In D. Gunopulos, T. Hofmann, D. Malerba, and M. Vazirgiannis, editors, Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), volume 3, pages 522--536, Athens, Greece, September 2011.

Cited By

View all
  • (2014)Taxonomy of Data Fragment Classification TechniquesDigital Forensics and Cyber Crime10.1007/978-3-319-14289-0_6(67-85)Online publication date: 23-Dec-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CSIIRW '13: Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop
January 2013
282 pages
ISBN:9781450316873
DOI:10.1145/2459976

Sponsors

  • Los Alamos National Labs: Los Alamos National Labs
  • Sandia National Labs: Sandia National Laboratories
  • DOE: Department of Energy
  • Oak Ridge National Laboratory
  • Lawrence Livermore National Lab.: Lawrence Livermore National Laboratory
  • BERKELEYLAB: Lawrence National Berkeley Laboratory
  • Argonne Natl Lab: Argonne National Lab
  • Idaho National Lab.: Idaho National Laboratory
  • Pacific Northwest National Laboratory
  • Nevada National Security Site: Nevada National Security Site

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 January 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

CSIIRW '13
Sponsor:
  • Los Alamos National Labs
  • Sandia National Labs
  • DOE
  • Lawrence Livermore National Lab.
  • BERKELEYLAB
  • Argonne Natl Lab
  • Idaho National Lab.
  • Nevada National Security Site
CSIIRW '13: Cyber Security and Information Intelligence
January 8 - 10, 2013
Tennessee, Oak Ridge, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Taxonomy of Data Fragment Classification TechniquesDigital Forensics and Cyber Crime10.1007/978-3-319-14289-0_6(67-85)Online publication date: 23-Dec-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media