Elsevier

Digital Investigation

Volume 4, Supplement, September 2007, Pages 105-113
Digital Investigation

Multi-resolution similarity hashing

https://doi.org/10.1016/j.diin.2007.06.011Get rights and content
Under a Creative Commons license
open access

Abstract

Large-scale digital forensic investigations present at least two fundamental challenges. The first one is accommodating the computational needs of a large amount of data to be processed. The second one is extracting useful information from the raw data in an automated fashion. Both of these problems could result in long processing times that can seriously hamper an investigation.

In this paper, we discuss a new approach to one of the basic operations that is invariably applied to raw data – hashing. The essential idea is to produce an efficient and scalable hashing scheme that can be used to supplement the traditional cryptographic hashing during the initial pass over the raw evidence. The goal is to retain enough information to allow binary data to be queried for similarity at various levels of granularity without any further pre-processing/indexing.

The specific solution we propose, called a multi-resolution similarity hash (or MRS hash), is a generalization of recent work in the area. Its main advantages are robust performance – raw speed comparable to a high-grade block-level crypto hash, scalability – ability to compare targets that vary in size by orders of magnitude, and space efficiency – typically below 0.5% of the size of the target.

Keywords

Hashing
Similarity hashing
Digital forensics
Multi-resolution hash
File correlation
Data correlation

Cited by (0)