Skip to main content

Big Data Forensics: Hadoop Distributed File Systems as a Case Study

  • Chapter
  • First Online:

Abstract

Big Data has fast become one of the most adopted computer paradigms within computer science and is considered an equally challenging paradigm for forensics investigators. The Hadoop Distributed File System (HDFS) is one of the most favourable big data platforms within the market, providing an unparalleled service with regards to parallel processing and data analytics. However, HDFS is not without its risks, having been reportedly targeted by cyber criminals as a means of stealing and exfiltrating confidential data. Using HDFS as a case study, we aim to detect remnants of malicious users’ activities within the HDFS environment. Our examination involves a thorough analysis of different areas of the HDFS environment, including a range of log files and disk images. Our experimental environment was comprised of a total of four virtual machines, all running Ubuntu. This HDFS research provides a thorough understanding of the types of forensically relevant artefacts that are likely to be found during a forensic investigation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. S. Tahir and W. Iqbal, “Big Data-An evolving concern for forensic investigators,” in 2015 1st International Conference on Anti-Cybercrime, ICACC 2015, 2015.

    Google Scholar 

  2. W. Yang, G. Wang, K.-K. R. Choo, and S. Chen, “HEPart: A balanced hypergraph partitioning algorithm for big data applications,” Futur. Gener. Comput. Syst., Jan. 2018.

    Google Scholar 

  3. W. A. Günther, M. H. Rezazade Mehrizi, M. Huysman, and F. Feldberg, “Debating big data: A literature review on realizing value from big data,” J. Strateg. Inf. Syst., 2017.

    Google Scholar 

  4. T. H. Davenport and J. Dyche, “Big Data in Big Companies,” Int. Inst. Anal., no. May, pp. 1–31, 2013.

    Google Scholar 

  5. B. Fang and P. Zhang, “Big data in finance,” in Big Data Concepts, Theories, and Applications, 2016, pp. 391–412.

    Google Scholar 

  6. S. Sharma, U. S. Tim, J. Wong, S. Gadia, and S. Sharma, “A Brief Review on Leading Big Data Models,” Data Sci. J., vol. 13, no. December, pp. 138–157, 2014.

    Google Scholar 

  7. S. Yu and S. Guo, Big Data Concepts, Theories, and Applications, 1st ed. 20. Cham: Springer International Publishing, 2016.

    Google Scholar 

  8. X. Wu, X. Zhu, G. Q. Wu, and W. Ding, “Data mining with big data,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 1, pp. 97–107, 2014.

    Google Scholar 

  9. C. Vorapongkitipun and N. Nupairoj, “Improving performance of small-file accessing in Hadoop,” in 2014 11th Int. Joint Conf. on Computer Science and Software Engineering: “Human Factors in Computer Science and Software Engineering” - e-Science and High Performance Computing: eHPC, JCSSE 2014, 2014, pp. 200–205.

    Google Scholar 

  10. Y. Y. Teing, A. Dehghantanha, and K. K. R. Choo, “CloudMe forensics: A case of big data forensic investigation,” Concurrency Computation, 2017.

    Google Scholar 

  11. X. Fu, Y. Gao, B. Luo, X. Du, and M. Guizani, “Security Threats to Hadoop: Data Leakage Attacks and Investigation,” IEEE Netw., vol. 31, no. 2, pp. 67–71, 2017.

    Google Scholar 

  12. A. Azmoodeh, A. Dehghantanha, M. Conti, and K.-K. R. Choo, “Detecting crypto-ransomware in IoT networks based on energy consumption footprint,” J. Ambient Intell. Humaniz. Comput., pp. 1–12, Aug. 2017.

    Google Scholar 

  13. J. Baldwin and A. Dehghantanha, Leveraging support vector machine for opcode density based detection of crypto-ransomware, vol. 70. 2018.

    Google Scholar 

  14. A. D. James Baldwin, Omar Alhawi, Leveraging Machine Learning Techniques for Windows Ransomware Network Traffic Detection. Cyber Threat Intelligence- Springer Book, 2017.

    Google Scholar 

  15. D. Kiwia, A. Dehghantanha, K.-K. R. Choo, and J. Slaughter, “A cyber kill chain based taxonomy of banking Trojans for evolutionary computational intelligence,” J. Comput. Sci., Nov. 2017.

    Google Scholar 

  16. O. Osanaiye, H. Cai, K.-K. R. Choo, A. Dehghantanha, Z. Xu, and M. Dlodlo, “Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing,” Eurasip J. Wirel. Commun. Netw., vol. 2016, no. 1, 2016.

    Google Scholar 

  17. F. Daryabar, A. Dehghantanha, and K.-K. R. Choo, “Cloud storage forensics: MEGA as a case study,” Aust. J. Forensic Sci., pp. 1–14, Apr. 2016.

    Google Scholar 

  18. M. Shariati, A. Dehghantanha, and K.-K. R. Choo, “SugarSync forensic analysis,” Aust. J. Forensic Sci., vol. 48, no. 1, pp. 95–117, Apr. 2015.

    Google Scholar 

  19. S. Almulla, Y. Iraqi, and A. Jones, “Cloud forensics: A research perspective,” in 2013 9th International Conference on Innovations in Information Technology, IIT 2013, 2013, pp. 66–71.

    Google Scholar 

  20. O. Tabona and A. Blyth, “A forensic cloud environment to address the big data challenge in digital forensics,” in 2016 SAI Computing Conference (SAI), 2016, pp. 579–584.

    Google Scholar 

  21. Y. Gao and B. Li, “A forensic method for efficient file extraction in HDFS based on three-level mapping,” Wuhan Univ. J. Nat. Sci., vol. 22, no. 2, pp. 114–126, 2017.

    Google Scholar 

  22. A. Guarino, “Digital Forensics as a Big Data Challenge,” in ISSE 2013 Securing Electronic Business Processes, 2013, pp. 197–203.

    Google Scholar 

  23. S. Zawoad and R. Hasan, “Digital Forensics in the Age of Big Data: Challenges, Approaches, and Opportunities,” in 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, 2015, pp. 1320–1325.

    Google Scholar 

  24. B. Agrawal, R. Hansen, C. Rong, and T. Wiktorski, “SD-HDFS: Secure deletion in hadoop distributed file system,” in Proceedings - 2016 IEEE International Congress on Big Data, BigData Congress 2016, 2016, pp. 181–189.

    Google Scholar 

  25. J. Baldwin, O. M. K. Alhawi, S. Shaughnessy, A. Akinbi, and A. Dehghantanha, “Emerging from the Cloud: A Bibliometric Analysis of Cloud Forensics Studies,” Springer, Cham, 2018, pp. 311–331.

    Google Scholar 

  26. F. Daryabar, A. Dehghantanha, B. Eterovic-Soric, and K.-K. R. Choo, “Forensic investigation of OneDrive, Box, GoogleDrive and Dropbox applications on Android and iOS devices,” Aust. J. Forensic Sci., pp. 1–28, Mar. 2016.

    Google Scholar 

  27. F. Norouzizadeh Dezfouli, A. Dehghantanha, B. Eterovic-Soric, and K.-K. R. Choo, “Investigating Social Networking applications on smartphones detecting Facebook, Twitter, LinkedIn and Google+ artefacts on Android and iOS platforms,” Aust. J. Forensic Sci., pp. 1–20, Aug. 2015.

    Google Scholar 

  28. S. H. Mohtasebi, A. Dehghantanha, and K.-K. R. Choo, Cloud Storage Forensics: Analysis of Data Remnants on SpiderOak, JustCloud, and pCloud. 2016.

    Google Scholar 

  29. A. Dehghantanha and T. Dargahi, Residual Cloud Forensics: CloudMe and 360Yunpan as Case Studies. 2016.

    Google Scholar 

  30. M. N. Yusoff, A. Dehghantanha, and R. Mahmod, Network Traffic Forensics on Firefox Mobile OS: Facebook, Twitter, and Telegram as Case Studies. 2016.

    Google Scholar 

  31. H. Haughey, G. Epiphaniou, H. Al-Khateeb, and A. Dehghantanha, Adaptive traffic fingerprinting for darknet threat intelligence, vol. 70. 2018.

    Google Scholar 

  32. Y.-Y. Teing, D. Ali, K. Choo, M. T. Abdullah, and Z. Muda, “Greening Cloud-Enabled Big Data Storage Forensics: Syncany as a Case Study,” IEEE Trans. Sustain. Comput., pp. 1–1, 2017.

    Google Scholar 

  33. B. Martini and K. K. R. Choo, “Distributed filesystem forensics: XtreemFS as a case study,” Digit. Investig., vol. 11, no. 4, pp. 295–313, 2014.

    Google Scholar 

  34. S. A. Thanekar, K. Subrahmanyam, and A. B. Bagwan, “A study on digital forensics in hadoop,” Int. J. Control Theory Appl., vol. 9, no. 18, pp. 8927–8933, 2016.

    Google Scholar 

  35. P. Leimich, J. Harrison, and W. J. Buchanan, “A RAM triage methodology for Hadoop HDFS forensics,” Digit. Investig., vol. 18, pp. 96–109, 2016.

    Google Scholar 

  36. Y. Gao, X. Fu, B. Luo, X. Du, and M. Guizani, “Haddle: A framework for investigating data leakage attacks in hadoop,” in 2015 IEEE Global Communications Conference, GLOBECOM 2015, 2015.

    Google Scholar 

  37. S. Dinesh, S. Rao, and K. Chandrasekaran, “Traceback: A Forensic Tool for Distributed Systems,” Proc. 3rd Int. Conf. Adv. Comput. Netw. Informatics, pp. 17–27, 2016.

    Google Scholar 

  38. E. Alshammari, G. Al-Naymat, and A. Hadi, “A New Technique for File Carving on Hadoop Ecosystem,” in The International Conference on new Trends in Computing Sciences (ICTCS’2017), At Jordan-Amman, 2017.

    Google Scholar 

  39. Y.-Y. Teing, A. Dehghantanha, K.-K. R. Choo, T. Dargahi, and M. Conti, “Forensic Investigation of Cooperative Storage Cloud Service: Symform as a Case Study,” J. Forensic Sci., vol. 62, no. 3, pp. 641–654, May 2017.

    Google Scholar 

  40. Y. Y. Teing, A. Dehghantanha, K. K. R. Choo, and L. T. Yang, “Forensic investigation of P2P cloud storage services and backbone for IoT networks: BitTorrent Sync as a case study,” Comput. Electr. Eng., vol. 58, pp. 350–363, 2017.

    Google Scholar 

  41. M. Kohn, J. H. P. Eloff, and M. S. Olivier, “Framework for a Digital Forensic Investigation,” Communications, no. March, pp. 1–7, 2006.

    Google Scholar 

  42. M. E. Alex and R. Kishore, “Forensics framework for cloud computing,” Comput. Electr. Eng., vol. 60, pp. 193–205, 2017.

    Google Scholar 

  43. B. Martini and K. K. R. Choo, “An integrated conceptual digital forensic framework for cloud computing,” Digit. Investig., vol. 9, no. 2, pp. 71–80, 2012.

    Google Scholar 

  44. M. Rathbone, “A Beginner’s Guide to Hadoop Storage Formats (or File Formats).”

    Google Scholar 

  45. P. Zeyliger, “Hadoop Default Ports Quick Reference – Cloudera Engineering Blog.”

    Google Scholar 

  46. Apache Hadoop, “Apache Hadoop 2.9.0 – MapReduce Tutorial.”

    Google Scholar 

  47. M. Conti, A. Dehghantanha, K. Franke, and S. Watson, “Internet of Things security and forensics: Challenges and opportunities,” Futur. Gener. Comput. Syst., vol. 78, pp. 544–546, Jan. 2018.

    Google Scholar 

  48. S. Watson and A. Dehghantanha, “Digital forensics: the missing piece of the Internet of Things promise,” Comput. Fraud Secur., vol. 2016, no. 6, pp. 5–8, Jun. 2016.

    Google Scholar 

  49. N. Milosevic, A. Dehghantanha, and K.-K. R. Choo, “Machine learning aided Android malware classification,” Comput. Electr. Eng.

    Google Scholar 

  50. S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, S. Hashemi, and R. Khayami, “Know Abnormal, Find Evil: Frequent Pattern Mining for Ransomware Threat Hunting and Intelligence,” IEEE Trans. Emerg. Top. Comput., pp. 1–1, 2017.

    Google Scholar 

  51. H. H. Pajouh, A. Dehghantanha, R. Khayami, and K. K. R. Choo, “Intelligent OS X malware threat detection with code inspection,” Journal of Computer Virology and Hacking Techniques, pp. 1–11, 2017.

    Google Scholar 

Download references

Acknowledgement

We would like to thank the editor and anonymous reviewers for their constructive comments. The views and opinions expressed in this article are those of the authors and not the organisation with whom the authors are or have been associated with or supported by.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Dehghantanha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Asim, M., McKinnel, D.R., Dehghantanha, A., Parizi, R.M., Hammoudeh, M., Epiphaniou, G. (2019). Big Data Forensics: Hadoop Distributed File Systems as a Case Study. In: Dehghantanha, A., Choo, KK. (eds) Handbook of Big Data and IoT Security. Springer, Cham. https://doi.org/10.1007/978-3-030-10543-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-10543-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-10542-6

  • Online ISBN: 978-3-030-10543-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics