Impacts of increasing volume of digital forensic data: A survey and future research challenges
Introduction
The increase in the number and volume of digital devices seized and lodged with digital forensic laboratories for analysis has been an issue raised over many years. This growth has contributed to lengthy backlogs of work (Gogolin, 2010, Parsonage, 2009). A significant growth in the size of storage media combined with the popularity of digital devices and the decrease in the price of these devices and storage media has led to a major issue affecting the timely process of justice. There is a growing volume of data seized and presented for analysis, often now consisting of many terabytes of data for individual investigations. This has resulted from;
- (a)
An increase in the number of devices seized per case.
- (b)
The number of cases with digital evidence is increasing (anecdotal information indicates the last case observed without digital evidence was at least 3 years old).
- (c)
The size of data on each individual item is increasing.
The increasing number of cases and devices seized is further compounded with the growing size of storage devices (Garfinkel, 2010). Existing forensic software solutions have evolved from the first generation of tools and are now beginning to address scalability issues. However, a gap remains in relation to analysis of large and disparate datasets. Every year the volume of data is increasing faster than the capability of processors and forensic tools can manage (Roussev et al., 2013).
Processing times are increasing with the increase in the amount of data required to be analysed. In the last decade, there have been many calls for research to focus on the timely analysis of large datasets (Garfinkel, 2010, Richard and Roussev, 2006a, Wiles et al., 2007) including the application of data mining techniques to digital forensic data in an endeavour to address the issue of the growing volume of information (Beebe and Clark, 2005, Palmer, 2001).
Serious implications relating to increasing backlogs include; reduced sentences for convicted defendants due to the length of time waiting for results of digital forensic analysis, suspects committing suicide whilst waiting for analysis, and suspects denied access to family and children whilst waiting for analysis (Shaw and Browne, 2013). In addition, employment can be affected for suspects under investigation for lengthy periods of time, and ongoing difficulties can be experienced by suspects and innocent persons when computers and other devices are seized, for example; the child of a suspect may have school assignments saved on a seized computer, or the partner of a suspect may have all their taxation or business information saved on a laptop.
In this paper we study literature examining the digital forensic data volume issue, including the volume of data, the growth of media, and research challenges. We review publications focussing on data mining, data reduction, triage, intelligence analysis, and other proposed methodologies. We then summarise the findings, and future directions for research are outlined in the conclusion.
We located material published in the last 15 years (i.e. 1/1/1999–14/6/2014) by searching various academic databases, including IEEE Xplore, ACM Digital Library, Google Scholar, and ScienceDirect using keywords such as; “Digital Forensic Data Volume”, “Computer Forensic Volume Problem”, “Forensic Data Mining”, “Digital Forensic Triage”, “Forensic Data Reduction”, “Digital Intelligence”, “Digital Forensic Growth”, and “Digital Forensic Challenges”. In addition, we browsed all papers published in Digital Investigation: The International Journal of Digital Forensics & Incident Response, and The Journal of Digital Forensics, Security and Law. A summary table of key papers and topics is listed in Table 4 (see Discussion section).
Section snippets
Volume of data (1999–2009)
Digital forensics plays a crucial role in society across justice, security and privacy (Casey, 2014). Concerns regarding the increasing volume of data to be analysed in a digital forensic examinations have been raised for many years. McKemmish (1999) stated that the rapid increase in the size of storage media is probably the single greatest challenge to digital forensic analysis. In 2001, Palmer published the results of the first Digital Forensic Research Workshop (DFRWS), which included a
Growth of media
In Survey section we outlined the many digital forensic papers discuss the problem of increasing volumes of data and devices, in this section we will focus on the growth of media. Moore's Law is the observation of an average doubling of the number of transistors on an integrated circuit every 18–24 months, which assists in predicting development of computer technology (Wiles et al., 2007). Kryder (as quoted by Walter, 2005) made the observation that in the space of 15 years, the storage density
Data mining
Spafford (as cited in Palmer, 2001) listed data mining as a field of specialty which may assist in digital forensic analysis. Beebe (2009) also stated that the use of data mining techniques may be another solution to the volume challenge, and also that data mining has the potential to locate trends and information that may otherwise be undetected by human observation. Beebe also raised a list of topics for further research, including;
- •
a method for implementing subset collection,
- •
how data mining
Discussion
As outlined, there has been much discussion regarding the data volume challenge, and many calls for research into the application of data mining and other techniques to address the problem. Nevertheless, there has been very little published work in relation to a method or framework to apply data mining techniques, or other methods, to reduce or analyse the large volume of real-world data. In addition, the value of extracting or using intelligence from digital forensic data has had minimal
Conclusion
Our literature survey identified that research gaps still remain in relation to the digital forensic data volume challenge. For example, there remains a need for research to be undertaken into data reduction techniques, data mining, intelligence analysis, and the use of open and closed source information. This should include research into the application of processes in a real world environment, and its acceptance in Courts and other tribunals.
Data Mining offers a potential solution to
Acknowledgements
The views and opinions expressed in this article are those of the authors alone and not the organisations with whom the authors are or have been associated or supported.
References (119)
- et al.
Mining criminal networks from unstructured text documents
Digit Investig
(2012) - et al.
XIRAF – XML-based indexing and querying for digital forensics
Digit Investig
(2006) A second generation computer forensic analysis system
Digit Investig
(2009)- et al.
Engineering an online computer forensic service
Digit Investig
(2012) Time and date issues in forensic computing – a case study
Digit Investig
(2004)- et al.
On the database lookup problem of approximate matching
Digit Investig
(2014) - et al.
Automated evaluation of approximate matching algorithms on real data
Digit Investig
(2014) - et al.
A brief study of time
Digit Investig
(2007) - et al.
Automated digital evidence discovery and correlation
Digit Investig
(2008) “Dawn raids” bring a new form in incident response
Digit Investig
(2009)
Digital dust: evidence in every nook and cranny
Digit Investig
Growing societal impact of digital forensics and incident response
Digit Investig
Honing digital forensic processes
Digit Investig
Current issues confronting well-established computer-assisted child exploitation and computer crime task forces
Digit Investig
Forensic feature extraction and cross-drive analysis
Digit Investig
Digital forensics research: the next 10 years
Digit Investig
Digital forensics XML and the DFXML toolset
Digit Investig
Lessons learned writing digital forensics tools and managing a 30TB digital evidence corpus
Digit Investig
The digital crime tsunami
Digit Investig
Mining writeprints from anonymous e-mails for forensic investigation
Digit Investig
A novel approach of mining write-prints for authorship attribution in e-mail forensics
Digit Investig
The use of random sampling in investigations involving child abuse material
Digit Investig
Risk sensitive digital evidence collection
Digit Investig
A framework for post-event timeline reconstruction using neural networks
Digit Investig
Automated network triage
Digit Investig
FriendlyRoboCopy: a GUI to RoboCopy for computer forensic investigators
Digit Investig
High-speed search using Tarari content processor in digital forensics
Digit Investig
CAT detect (computer activity timeline detection): a tool for detecting inconsistency in computer activity timelines
Digit Investig
A machine learning-based triage methodology for automated categorization of digital media
Digit Investig
Massive threading: using GPUs to increase the performance of digital forensics tools
Digit Investig
The Windows Registry as a forensic artefact: illustrating evidence collection for internet usage
Digit Investig
Applicability of latent Dirichlet allocation to multi-disk search
Digit Investig
Criminal profiling and insider cyber crime
Digit Investig
Deploying forensic tools via PXE
Digit Investig
Using author topic to detect insider threats from email traffic
Digit Investig
Computer forensic timeline visualization tool
Digit Investig
Triage template pipelines in digital forensic investigations
Digit Investig
Triage: a practical solution or admission of failure
Digit Investig
Information assurance in a distributed forensic cluster
Digit Investig
Dropbox analysis: data remnants on user machines
Digit Investig
Data reduction and data mining framework for digital forensic evidence: storage, intelligence, review and archive
Trends Issues Crime Crim Justice
Digital forensics and analyzing data. Cyber crime investigations
Intelligence-led crime scene processing. Part I: forensic intelligence
Forensic Sci Int
The contribution of forensic science to crime analysis and investigation: forensic intelligence
Forensic Sci Int
The future of computer forensics: a needs analysis survey
Comput Secur
Content triage with similarity digests: the M57 case study
Digit Investig
Real-time digital forensics and triage
Digit Investig
A correlation method for establishing provenance of timestamps in digital evidence
Digit Investig
Event sequence mining to develop profiles for computer forensic investigation purposes
Cited by (179)
ECo-Bag: An elastic container based on merkle tree as a universal digital evidence bag
2024, Forensic Science International: Digital InvestigationAn abstract model for digital forensic analysis tools - A foundation for systematic error mitigation analysis
2024, Forensic Science International: Digital InvestigationChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (local) large language models
2024, Forensic Science International: Digital InvestigationA comprehensive analysis of the role of artificial intelligence and machine learning in modern digital forensics and incident response
2024, Forensic Science International: Digital InvestigationForensic detection of heterogeneous activity in data using deep learning methods
2024, Intelligent Systems with ApplicationsSecDedup: Secure data deduplication with dynamic auditing in the cloud
2023, Information Sciences