Elsevier

Digital Investigation

Volume 11, Issue 4, December 2014, Pages 273-294
Digital Investigation

Impacts of increasing volume of digital forensic data: A survey and future research challenges

https://doi.org/10.1016/j.diin.2014.09.002Get rights and content

Abstract

A major challenge to digital forensic analysis is the ongoing growth in the volume of data seized and presented for analysis. This is a result of the continuing development of storage technology, including increased storage capacity in consumer devices and cloud storage services, and an increase in the number of devices seized per case. Consequently, this has led to increasing backlogs of evidence awaiting analysis, often many months to years, affecting even the largest digital forensic laboratories. Over the preceding years, there has been a variety of research undertaken in relation to the volume challenge. Solutions posed range from data mining, data reduction, increased processing power, distributed processing, artificial intelligence, and other innovative methods. This paper surveys the published research and the proposed solutions. It is concluded that there remains a need for further research with a focus on real world applicability of a method or methods to address the digital forensic data volume challenge.

Introduction

The increase in the number and volume of digital devices seized and lodged with digital forensic laboratories for analysis has been an issue raised over many years. This growth has contributed to lengthy backlogs of work (Gogolin, 2010, Parsonage, 2009). A significant growth in the size of storage media combined with the popularity of digital devices and the decrease in the price of these devices and storage media has led to a major issue affecting the timely process of justice. There is a growing volume of data seized and presented for analysis, often now consisting of many terabytes of data for individual investigations. This has resulted from;

  • (a)

    An increase in the number of devices seized per case.

  • (b)

    The number of cases with digital evidence is increasing (anecdotal information indicates the last case observed without digital evidence was at least 3 years old).

  • (c)

    The size of data on each individual item is increasing.

The increasing number of cases and devices seized is further compounded with the growing size of storage devices (Garfinkel, 2010). Existing forensic software solutions have evolved from the first generation of tools and are now beginning to address scalability issues. However, a gap remains in relation to analysis of large and disparate datasets. Every year the volume of data is increasing faster than the capability of processors and forensic tools can manage (Roussev et al., 2013).

Processing times are increasing with the increase in the amount of data required to be analysed. In the last decade, there have been many calls for research to focus on the timely analysis of large datasets (Garfinkel, 2010, Richard and Roussev, 2006a, Wiles et al., 2007) including the application of data mining techniques to digital forensic data in an endeavour to address the issue of the growing volume of information (Beebe and Clark, 2005, Palmer, 2001).

Serious implications relating to increasing backlogs include; reduced sentences for convicted defendants due to the length of time waiting for results of digital forensic analysis, suspects committing suicide whilst waiting for analysis, and suspects denied access to family and children whilst waiting for analysis (Shaw and Browne, 2013). In addition, employment can be affected for suspects under investigation for lengthy periods of time, and ongoing difficulties can be experienced by suspects and innocent persons when computers and other devices are seized, for example; the child of a suspect may have school assignments saved on a seized computer, or the partner of a suspect may have all their taxation or business information saved on a laptop.

In this paper we study literature examining the digital forensic data volume issue, including the volume of data, the growth of media, and research challenges. We review publications focussing on data mining, data reduction, triage, intelligence analysis, and other proposed methodologies. We then summarise the findings, and future directions for research are outlined in the conclusion.

We located material published in the last 15 years (i.e. 1/1/1999–14/6/2014) by searching various academic databases, including IEEE Xplore, ACM Digital Library, Google Scholar, and ScienceDirect using keywords such as; “Digital Forensic Data Volume”, “Computer Forensic Volume Problem”, “Forensic Data Mining”, “Digital Forensic Triage”, “Forensic Data Reduction”, “Digital Intelligence”, “Digital Forensic Growth”, and “Digital Forensic Challenges”. In addition, we browsed all papers published in Digital Investigation: The International Journal of Digital Forensics & Incident Response, and The Journal of Digital Forensics, Security and Law. A summary table of key papers and topics is listed in Table 4 (see Discussion section).

Section snippets

Volume of data (1999–2009)

Digital forensics plays a crucial role in society across justice, security and privacy (Casey, 2014). Concerns regarding the increasing volume of data to be analysed in a digital forensic examinations have been raised for many years. McKemmish (1999) stated that the rapid increase in the size of storage media is probably the single greatest challenge to digital forensic analysis. In 2001, Palmer published the results of the first Digital Forensic Research Workshop (DFRWS), which included a

Growth of media

In Survey section we outlined the many digital forensic papers discuss the problem of increasing volumes of data and devices, in this section we will focus on the growth of media. Moore's Law is the observation of an average doubling of the number of transistors on an integrated circuit every 18–24 months, which assists in predicting development of computer technology (Wiles et al., 2007). Kryder (as quoted by Walter, 2005) made the observation that in the space of 15 years, the storage density

Data mining

Spafford (as cited in Palmer, 2001) listed data mining as a field of specialty which may assist in digital forensic analysis. Beebe (2009) also stated that the use of data mining techniques may be another solution to the volume challenge, and also that data mining has the potential to locate trends and information that may otherwise be undetected by human observation. Beebe also raised a list of topics for further research, including;

  • a method for implementing subset collection,

  • how data mining

Discussion

As outlined, there has been much discussion regarding the data volume challenge, and many calls for research into the application of data mining and other techniques to address the problem. Nevertheless, there has been very little published work in relation to a method or framework to apply data mining techniques, or other methods, to reduce or analyse the large volume of real-world data. In addition, the value of extracting or using intelligence from digital forensic data has had minimal

Conclusion

Our literature survey identified that research gaps still remain in relation to the digital forensic data volume challenge. For example, there remains a need for research to be undertaken into data reduction techniques, data mining, intelligence analysis, and the use of open and closed source information. This should include research into the application of processes in a real world environment, and its acceptance in Courts and other tribunals.

Data Mining offers a potential solution to

Acknowledgements

The views and opinions expressed in this article are those of the authors alone and not the organisations with whom the authors are or have been associated or supported.

References (119)

  • E. Casey

    Digital dust: evidence in every nook and cranny

    Digit Investig

    (2010)
  • E. Casey

    Growing societal impact of digital forensics and incident response

    Digit Investig

    (2014)
  • E. Casey et al.

    Honing digital forensic processes

    Digit Investig

    (2013)
  • M.M. Ferraro et al.

    Current issues confronting well-established computer-assisted child exploitation and computer crime task forces

    Digit Investig

    (2004)
  • S. Garfinkel

    Forensic feature extraction and cross-drive analysis

    Digit Investig

    (2006)
  • S. Garfinkel

    Digital forensics research: the next 10 years

    Digit Investig

    (2010)
  • S. Garfinkel

    Digital forensics XML and the DFXML toolset

    Digit Investig

    (2012)
  • S. Garfinkel

    Lessons learned writing digital forensics tools and managing a 30TB digital evidence corpus

    Digit Investig

    (2012)
  • G. Gogolin

    The digital crime tsunami

    Digit Investig

    (2010)
  • F. Iqbal et al.

    Mining writeprints from anonymous e-mails for forensic investigation

    Digit Investig

    (2010)
  • F. Iqbal et al.

    A novel approach of mining write-prints for authorship attribution in e-mail forensics

    Digit Investig

    (2008)
  • B. Jones et al.

    The use of random sampling in investigations involving child abuse material

    Digit Investig

    (2012)
  • E. Kenneally et al.

    Risk sensitive digital evidence collection

    Digit Investig

    (2005)
  • M. Khan et al.

    A framework for post-event timeline reconstruction using neural networks

    Digit Investig

    (2007)
  • M.B. Koopmans et al.

    Automated network triage

    Digit Investig

    (2013)
  • C. LaVelle et al.

    FriendlyRoboCopy: a GUI to RoboCopy for computer forensic investigators

    Digit Investig

    (2007)
  • J. Lee et al.

    High-speed search using Tarari content processor in digital forensics

    Digit Investig

    (2008)
  • A. Marrington et al.

    CAT detect (computer activity timeline detection): a tool for detecting inconsistency in computer activity timelines

    Digit Investig

    (2011)
  • F. Marturana et al.

    A machine learning-based triage methodology for automated categorization of digital media

    Digit Investig

    (2013)
  • L. Marziale et al.

    Massive threading: using GPUs to increase the performance of digital forensics tools

    Digit Investig

    (2007)
  • V. Mee et al.

    The Windows Registry as a forensic artefact: illustrating evidence collection for internet usage

    Digit Investig

    (2006)
  • G.E. Noel et al.

    Applicability of latent Dirichlet allocation to multi-disk search

    Digit Investig

    (2014)
  • N. Nykodym et al.

    Criminal profiling and insider cyber crime

    Digit Investig

    (2005)
  • O. O'Connor

    Deploying forensic tools via PXE

    Digit Investig

    (2004)
  • J.S. Okolica et al.

    Using author topic to detect insider threats from email traffic

    Digit Investig

    (2007)
  • J. Olsson et al.

    Computer forensic timeline visualization tool

    Digit Investig

    (2009)
  • R.E. Overill et al.

    Triage template pipelines in digital forensic investigations

    Digit Investig

    (2013)
  • M.M. Pollitt

    Triage: a practical solution or admission of failure

    Digit Investig

    (2013)
  • N. Pringle et al.

    Information assurance in a distributed forensic cluster

    Digit Investig

    (2014)
  • D. Quick et al.

    Dropbox analysis: data remnants on user machines

    Digit Investig

    (2013)
  • D. Quick et al.

    Data reduction and data mining framework for digital forensic evidence: storage, intelligence, review and archive

    Trends Issues Crime Crim Justice

    (September 17, 2014)
  • A. Reyes et al.

    Digital forensics and analyzing data. Cyber crime investigations

    (2007)
  • O. Ribaux et al.

    Intelligence-led crime scene processing. Part I: forensic intelligence

    Forensic Sci Int

    (2010)
  • O. Ribaux et al.

    The contribution of forensic science to crime analysis and investigation: forensic intelligence

    Forensic Sci Int

    (2006)
  • M.K. Rogers

    The future of computer forensics: a needs analysis survey

    Comput Secur

    (2004)
  • V. Roussev et al.

    Content triage with similarity digests: the M57 case study

    Digit Investig

    (2012)
  • V. Roussev et al.

    Real-time digital forensics and triage

    Digit Investig

    (2013)
  • B. Schatz et al.

    A correlation method for establishing provenance of timestamps in digital evidence

    Digit Investig

    (2006)
  • T. Abraham

    Event sequence mining to develop profiles for computer forensic investigation purposes

  • ACC
  • Cited by (179)

    • ECo-Bag: An elastic container based on merkle tree as a universal digital evidence bag

      2024, Forensic Science International: Digital Investigation
    View all citing articles on Scopus
    View full text