Skip to main content

Advertisement

Log in

Big forensic data reduction: digital forensic images and electronic evidence

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

An issue that continues to impact digital forensics is the increasing volume of data and the growing number of devices. One proposed method to deal with the problem of “big digital forensic data”: the volume, variety, and velocity of digital forensic data, is to reduce the volume of data at either the collection stage or the processing stage. We have developed a novel approach which significantly improves on current practice, and in this paper we outline our data volume reduction process which focuses on imaging a selection of key files and data such as: registry, documents, spreadsheets, email, internet history, communications, logs, pictures, videos, and other relevant file types. When applied to test cases, a hundredfold reduction of original media volume was observed. When applied to real world cases of an Australian Law Enforcement Agency, the data volume further reduced to a small percentage of the original media volume, whilst retaining key evidential files and data. The reduction process was applied to a range of real world cases reviewed by experienced investigators and detectives and highlighted that evidential data was present in the data reduced forensic subset files. A data reduction approach is applicable in a range of areas, including: digital forensic triage, analysis, review, intelligence analysis, presentation, and archiving. In addition, the data reduction process outlined can be applied using common digital forensic hardware and software solutions available in appropriately equipped digital forensic labs without requiring additional purchase of software or hardware. The process can be applied to a wide variety of cases, such as terrorism and organised crime investigations, and the proposed data reduction process is intended to provide a capability to rapidly process data and gain an understanding of the information and/or locate key evidence or intelligence in a timely manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Gartner. IT Glossary: Big Data. http://www.gartner.com/it-glossary/big-data/ (2013). Accessed 21 July 2013

  2. Garfinkel, S.: Digital forensics research: the next 10 years. Digit. Investig. 7, S64–S73 (2010)

    Article  Google Scholar 

  3. Raghavan, S.: Digital forensic research: current state of the art. CSI Trans. ICT 1(1), 91–114 (2013)

    Article  Google Scholar 

  4. FBI_RCFL: FBI Regional Computer Forensic Laboratory Annual Reports 2003–2012. 2003–2012; http://www.rcfl.gov/downloads

  5. Australia, C.o., National plan to combat cybercrime, A.C. Commission, Editor 2013: Canberra

  6. Palmer, G.: A road map for digital forensic research. Report from the First Digital Forensic Research Workshop (DFRWS) (2001)

  7. Richard, G., Roussev, V.: Digital Forensics Tools: The Next Generation. Digital Crime and Forensic Science in Cyberspace, p. 75, 2006

  8. Beebe, N.: Digital Forensic Research: The Good, the Bad and the Unaddressed. Advances in Digital Forensics, pp. 17–36. Springer, Berlin (2009)

  9. Kenneally, E., Brown, C.: Risk sensitive digital evidence collection. Digit. Investig. 2(2), 101–119 (2005)

    Article  Google Scholar 

  10. Greiner, L.: Sniper Forensics. netWorker 13(4), 8–10 (2009)

    Article  Google Scholar 

  11. Beebe, N., Clark, J.: Dealing with terabyte data sets in digital investigations. Advances in Digital Forensics, pp. 3–16. Springer, Berlin (2005)

    Google Scholar 

  12. Alzaabi, M., Jones, A., Martin, T.A.: An Ontology-Based Forensic Analysis Tool. Journal of Digital Forensics, Security & Law, 2013. In: 2013 Conference Supplement, pp. 121–135

  13. van Baar, R.B., van Beek, H.M.A., van Eijk, E.J.: Digital forensics as a service: a game changer. Digit. Investig. 11, S54–S62 (2014)

    Article  Google Scholar 

  14. Casey, E., Ferraro, M., Nguyen, L.: Investigation delayed is justice denied: proposals for expediting forensic examinations of digital evidence. J. Forensic Sci. 54(6), 1353–1364 (2009)

    Article  Google Scholar 

  15. Casey, E., Katz, G., Lewthwaite, J.: Honing digital forensic processes. Digit. Investig. 10(2), 138–147 (2013)

    Article  Google Scholar 

  16. Vidas, T., Kaplan, B., Geiger, M.: OpenLV: empowering investigators and first-responders in the digital forensics process. Digit. Investig. 11, S45–S53 (2014)

    Article  Google Scholar 

  17. Noel, G.E., Peterson, G.L.: Applicability of latent Dirichlet allocation to multi-disk search. Digit. Investig. 11(1), 43–56 (2014)

    Article  Google Scholar 

  18. Xu, Z., et al.: Knowle: a semantic link network based system for organizing large scale online news events. Future Gener. Comput. Syst. 43, 40–50 (2015)

    Article  Google Scholar 

  19. Xu, Z., et al.: Crowdsourcing based social media data analysis of urban emergency events. In: Multimedia Tools and Applications, pp. 1–18, 2015

  20. Xu, Z., et al.: Crowdsourcing based description of urban emergency events using social media big data. In: IEEE Transactions on Cloud Computing, PP(99): pp. 1–1, 2016

  21. Brown, R., Pham, B., de Vel, O.: Design of a digital forensics image mining system. In: Knowledge-Based Intelligent Information and Engineering Systems, pp. 395–404, 2005

  22. Pollitt, M.M.: Triage: a practical solution or admission of failure. Digit. Investig. 10(2), 87–88 (2013)

    Article  Google Scholar 

  23. Ferraro, M.M., Russell, A.: Current issues confronting well-established computer-assisted child exploitation and computer crime task forces. Digit. Investig. 1(1), 7–15 (2004)

    Article  Google Scholar 

  24. Turner, P.: Applying a forensic approach to incident response, network investigation and system administration using Digital Evidence Bags. Digit. Investig. 4(1), 30–35 (2007)

    Article  Google Scholar 

  25. Parsonage, H.: Computer Forensics Case Assessment and Triage—some ideas for discussion, 2009. http://computerforensics.parsonage.co.uk/triage/triage.htm. Accessed 4 Aug 2013

  26. Shiaeles, S., Chryssanthou, A., Katos, V.: On-scene triage open source forensic tool chests: are they effective? Digit. Investig. 10(2), 99–115 (2013)

    Article  Google Scholar 

  27. Roussev, V., Richard, G.: Breaking the performance wall: The case for distributed digital forensics, 2004. In: Proceedings of the 2004 Digital Forensics Research Workshop, Vol. 94

  28. Lee, J., Un, S., Hong, D.: High-speed search using Tarari content processor in digital forensics. Digit. Investig. 5, S91–S95 (2008)

    Article  Google Scholar 

  29. Pringle, N., Sutherland, I.: Is a Computational Grid a Suitable Platform for High Performance Digital Forensics? In: Proceedings of the 7th European Conference on Information Warfare and Security 2008, Academic Conferences Limited, p. 175

  30. Sheldon, A.: The future of forensic computing. Digit. Investig. 2(1), 31–35 (2005)

    Article  Google Scholar 

  31. Alink, W., et al.: XIRAF—XML-based indexing and querying for digital forensics. Digit. Investig. 3, 50–58 (2006)

    Article  Google Scholar 

  32. Bhoedjang, R.A.F., et al.: Engineering an online computer forensic service. Digit. Investig. 9(2), 96–108 (2012)

    Article  Google Scholar 

  33. Ribaux, O., Walsh, S.J., Margot, P.: The contribution of forensic science to crime analysis and investigation: forensic intelligence. Forensic Sci. Int. 156(2), 171–181 (2006)

    Article  Google Scholar 

  34. Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, New York (2011)

    Book  MATH  Google Scholar 

  35. Pyle, D.: Data Preparation for Data Mining, vol. 1. Morgan Kaufmann, Burlington (1999)

    Google Scholar 

  36. Fayyad, U., Piatetsky-Shapiro, G.: Knowledge discovery and data mining: towards a unifying framework. In: KDD, pp. 82–88, 1996

  37. Shannon, M.: Forensic relative strength scoring: ASCII and entropy scoring. Int. J. Digit. Evid. 2(4), 151–169 (2004)

    Google Scholar 

  38. Wang, L., et al.: Particle swarm optimization based dictionary learning for remote sensing big data. Knowl. Based Syst. 79, 43–50 (2015)

    Article  Google Scholar 

  39. Wang, L., et al.: IK-SVD: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41–52 (2014)

    Article  Google Scholar 

  40. Ma, Y., et al.: Towards building a data-intensive index for big data computing—a case study of remote sensing data processing. In: Information Sciences, 2014

  41. Stüttgen, J.: Selective imaging: creating efficient forensic images by selecting content first. Mannheim University, 2011

  42. Garfinkel, S.L.: Forensic feature extraction and cross-drive analysis. Digit. Investig. 3, 71–81 (2006)

    Article  Google Scholar 

  43. Shaw, A., Browne, A.: A practical and robust approach to coping with large volumes of data submitted for digital forensic examination. Digit. Investig. 10(2), 116–128 (2013)

    Article  Google Scholar 

  44. Grier, J., Richard III, G.G.: Rapid forensic acquisition of large media with sifting collectors. Digit. Investig. 2015(14), S34–S44 (2015)

    Article  Google Scholar 

  45. Quick, D., Choo, K.-K.R.: Data reduction and data mining framework for digital forensic evidence: storage, intelligence, review and archive. Trends Issues Crime Crim. Justice 480, 1–11 (2014)

  46. ISO/IEC, 27037:2012 Guidelines for identification, collection, acquisition and preservation of digital evidence, in Information technology—Security techniques. ISO, Geneva (2012)

  47. ACPO: Good Practice Guidelines for Computer Based Evidence v4.0. 2006. www.7safe.com/electronic_evidence. Accessed 5 Mar 2014

  48. NIJ: Forensic Examination of Digital Evidence: A Guide for Law Enforcement, 2004. http://nij.gov/nij/pubs-sum/199408.htm

  49. Alqahtany, S., et al.: A forensic acquisition and analysis system for IaaS. In: Cluster Computing, pp. 1–15, 2015

  50. Hu, C., et al.: Semantic link network-based model for organizing multimedia big data. IEEE Trans. Emerg. Top. Comput. 2(3), 376–387 (2014)

    Article  Google Scholar 

  51. Xu, Z., et al.: Semantic based representing and organizing surveillance big data using video structural description technology. J. Syst. Softw. 102, 217–225 (2015)

    Article  Google Scholar 

  52. Hu, C., et al.: Video structural description technology for the new generation video surveillance systems. Front. Comput. Sci. 9(6), 980–989 (2015)

    Article  Google Scholar 

  53. Xu, Z., et al.: Semantic enhanced cloud environment for surveillance data management using video structural description. In: Computing, pp. 1–20, 2014

  54. Alhussein, M.: Automatic facial emotion recognition using weber local descriptor for e-Healthcare system. In: Cluster Computing, pp. 1–10, 2016

  55. Jones, B., Pleno, S., Wilkinson, M.: The use of random sampling in investigations involving child abuse material. Digit. Investig. 9, S99–S107 (2012)

    Article  Google Scholar 

  56. Garfinkel, S., et al.: Bringing science to digital forensics with standardized forensic corpora. Digit. Investig. 6, S2–S11 (2009)

    Article  Google Scholar 

  57. Ribaux, O., et al.: Intelligence-led crime scene processing. Part I: Forensic intelligence. Forensic Sci. Int. 195(1–3), 10–16 (2010)

    Article  Google Scholar 

  58. Luo, X., et al.: Building association link network for semantic link on web resources. IEEE Trans. Autom. Sci. Eng. 8(3), 482–494 (2011)

    Article  Google Scholar 

  59. Xu, Z., et al.: Measuring the semantic discrimination capability of association relations. Concurr. Comput. 26(2), 380–395 (2014)

    Article  Google Scholar 

  60. Xu, Z., et al.: Generating temporal semantic context of concepts using web search engines. J. Netw. Comput. Appl. 43, 42–55 (2014)

    Article  Google Scholar 

  61. Wei, X., et al.: Online comment-based hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map. IEEE Trans. Fuzzy Syst. 23(1), 72–84 (2015)

    Article  Google Scholar 

  62. Xu, Z., et al.: Mining temporal explicit and implicit semantic relations between entities using web search engines. Future Gener. Comput. Syst. 37, 468–477 (2014)

    Article  Google Scholar 

  63. Xuan, J., et al.: Uncertainty analysis for the keyword system of web events, 2015

  64. Zhao, L., et al.: Geographical information system parallelization for spatial big data processing: a review. In: Cluster Computing, pp. 1–14, 2015

  65. Punithavathani, D.S., Sujatha, K., Jain, J.M.: Surveillance of anomaly and misuse in critical networks to counter insider threats using computational intelligence. Clust. Comput. 18(1), 435–451 (2015)

    Article  Google Scholar 

  66. Ghaleb, T.A.: Techniques and countermeasures of website/wireless traffic analysis and fingerprinting. In: Cluster Computing, pp. 1–12, 2015

Download references

Acknowledgments

The views and opinions expressed in this article are those of the authors alone and not the organizations with whom the authors are or have been associated/supported. The authors thank the anonymous reviewers and their colleagues at the Electronic Crime Section of the South Australia Police for their assistance and providing constructive and generous feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kim-Kwang Raymond Choo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Quick, D., Choo, KK.R. Big forensic data reduction: digital forensic images and electronic evidence. Cluster Comput 19, 723–740 (2016). https://doi.org/10.1007/s10586-016-0553-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0553-1

Keywords

Navigation