Abstract
Big Data has fast become one of the most adopted computer paradigms within computer science and is considered an equally challenging paradigm for forensics investigators. The Hadoop Distributed File System (HDFS) is one of the most favourable big data platforms within the market, providing an unparalleled service with regards to parallel processing and data analytics. However, HDFS is not without its risks, having been reportedly targeted by cyber criminals as a means of stealing and exfiltrating confidential data. Using HDFS as a case study, we aim to detect remnants of malicious users’ activities within the HDFS environment. Our examination involves a thorough analysis of different areas of the HDFS environment, including a range of log files and disk images. Our experimental environment was comprised of a total of four virtual machines, all running Ubuntu. This HDFS research provides a thorough understanding of the types of forensically relevant artefacts that are likely to be found during a forensic investigation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
S. Tahir and W. Iqbal, “Big Data-An evolving concern for forensic investigators,” in 2015 1st International Conference on Anti-Cybercrime, ICACC 2015, 2015.
W. Yang, G. Wang, K.-K. R. Choo, and S. Chen, “HEPart: A balanced hypergraph partitioning algorithm for big data applications,” Futur. Gener. Comput. Syst., Jan. 2018.
W. A. Günther, M. H. Rezazade Mehrizi, M. Huysman, and F. Feldberg, “Debating big data: A literature review on realizing value from big data,” J. Strateg. Inf. Syst., 2017.
T. H. Davenport and J. Dyche, “Big Data in Big Companies,” Int. Inst. Anal., no. May, pp. 1–31, 2013.
B. Fang and P. Zhang, “Big data in finance,” in Big Data Concepts, Theories, and Applications, 2016, pp. 391–412.
S. Sharma, U. S. Tim, J. Wong, S. Gadia, and S. Sharma, “A Brief Review on Leading Big Data Models,” Data Sci. J., vol. 13, no. December, pp. 138–157, 2014.
S. Yu and S. Guo, Big Data Concepts, Theories, and Applications, 1st ed. 20. Cham: Springer International Publishing, 2016.
X. Wu, X. Zhu, G. Q. Wu, and W. Ding, “Data mining with big data,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 1, pp. 97–107, 2014.
C. Vorapongkitipun and N. Nupairoj, “Improving performance of small-file accessing in Hadoop,” in 2014 11th Int. Joint Conf. on Computer Science and Software Engineering: “Human Factors in Computer Science and Software Engineering” - e-Science and High Performance Computing: eHPC, JCSSE 2014, 2014, pp. 200–205.
Y. Y. Teing, A. Dehghantanha, and K. K. R. Choo, “CloudMe forensics: A case of big data forensic investigation,” Concurrency Computation, 2017.
X. Fu, Y. Gao, B. Luo, X. Du, and M. Guizani, “Security Threats to Hadoop: Data Leakage Attacks and Investigation,” IEEE Netw., vol. 31, no. 2, pp. 67–71, 2017.
A. Azmoodeh, A. Dehghantanha, M. Conti, and K.-K. R. Choo, “Detecting crypto-ransomware in IoT networks based on energy consumption footprint,” J. Ambient Intell. Humaniz. Comput., pp. 1–12, Aug. 2017.
J. Baldwin and A. Dehghantanha, Leveraging support vector machine for opcode density based detection of crypto-ransomware, vol. 70. 2018.
A. D. James Baldwin, Omar Alhawi, Leveraging Machine Learning Techniques for Windows Ransomware Network Traffic Detection. Cyber Threat Intelligence- Springer Book, 2017.
D. Kiwia, A. Dehghantanha, K.-K. R. Choo, and J. Slaughter, “A cyber kill chain based taxonomy of banking Trojans for evolutionary computational intelligence,” J. Comput. Sci., Nov. 2017.
O. Osanaiye, H. Cai, K.-K. R. Choo, A. Dehghantanha, Z. Xu, and M. Dlodlo, “Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing,” Eurasip J. Wirel. Commun. Netw., vol. 2016, no. 1, 2016.
F. Daryabar, A. Dehghantanha, and K.-K. R. Choo, “Cloud storage forensics: MEGA as a case study,” Aust. J. Forensic Sci., pp. 1–14, Apr. 2016.
M. Shariati, A. Dehghantanha, and K.-K. R. Choo, “SugarSync forensic analysis,” Aust. J. Forensic Sci., vol. 48, no. 1, pp. 95–117, Apr. 2015.
S. Almulla, Y. Iraqi, and A. Jones, “Cloud forensics: A research perspective,” in 2013 9th International Conference on Innovations in Information Technology, IIT 2013, 2013, pp. 66–71.
O. Tabona and A. Blyth, “A forensic cloud environment to address the big data challenge in digital forensics,” in 2016 SAI Computing Conference (SAI), 2016, pp. 579–584.
Y. Gao and B. Li, “A forensic method for efficient file extraction in HDFS based on three-level mapping,” Wuhan Univ. J. Nat. Sci., vol. 22, no. 2, pp. 114–126, 2017.
A. Guarino, “Digital Forensics as a Big Data Challenge,” in ISSE 2013 Securing Electronic Business Processes, 2013, pp. 197–203.
S. Zawoad and R. Hasan, “Digital Forensics in the Age of Big Data: Challenges, Approaches, and Opportunities,” in 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, 2015, pp. 1320–1325.
B. Agrawal, R. Hansen, C. Rong, and T. Wiktorski, “SD-HDFS: Secure deletion in hadoop distributed file system,” in Proceedings - 2016 IEEE International Congress on Big Data, BigData Congress 2016, 2016, pp. 181–189.
J. Baldwin, O. M. K. Alhawi, S. Shaughnessy, A. Akinbi, and A. Dehghantanha, “Emerging from the Cloud: A Bibliometric Analysis of Cloud Forensics Studies,” Springer, Cham, 2018, pp. 311–331.
F. Daryabar, A. Dehghantanha, B. Eterovic-Soric, and K.-K. R. Choo, “Forensic investigation of OneDrive, Box, GoogleDrive and Dropbox applications on Android and iOS devices,” Aust. J. Forensic Sci., pp. 1–28, Mar. 2016.
F. Norouzizadeh Dezfouli, A. Dehghantanha, B. Eterovic-Soric, and K.-K. R. Choo, “Investigating Social Networking applications on smartphones detecting Facebook, Twitter, LinkedIn and Google+ artefacts on Android and iOS platforms,” Aust. J. Forensic Sci., pp. 1–20, Aug. 2015.
S. H. Mohtasebi, A. Dehghantanha, and K.-K. R. Choo, Cloud Storage Forensics: Analysis of Data Remnants on SpiderOak, JustCloud, and pCloud. 2016.
A. Dehghantanha and T. Dargahi, Residual Cloud Forensics: CloudMe and 360Yunpan as Case Studies. 2016.
M. N. Yusoff, A. Dehghantanha, and R. Mahmod, Network Traffic Forensics on Firefox Mobile OS: Facebook, Twitter, and Telegram as Case Studies. 2016.
H. Haughey, G. Epiphaniou, H. Al-Khateeb, and A. Dehghantanha, Adaptive traffic fingerprinting for darknet threat intelligence, vol. 70. 2018.
Y.-Y. Teing, D. Ali, K. Choo, M. T. Abdullah, and Z. Muda, “Greening Cloud-Enabled Big Data Storage Forensics: Syncany as a Case Study,” IEEE Trans. Sustain. Comput., pp. 1–1, 2017.
B. Martini and K. K. R. Choo, “Distributed filesystem forensics: XtreemFS as a case study,” Digit. Investig., vol. 11, no. 4, pp. 295–313, 2014.
S. A. Thanekar, K. Subrahmanyam, and A. B. Bagwan, “A study on digital forensics in hadoop,” Int. J. Control Theory Appl., vol. 9, no. 18, pp. 8927–8933, 2016.
P. Leimich, J. Harrison, and W. J. Buchanan, “A RAM triage methodology for Hadoop HDFS forensics,” Digit. Investig., vol. 18, pp. 96–109, 2016.
Y. Gao, X. Fu, B. Luo, X. Du, and M. Guizani, “Haddle: A framework for investigating data leakage attacks in hadoop,” in 2015 IEEE Global Communications Conference, GLOBECOM 2015, 2015.
S. Dinesh, S. Rao, and K. Chandrasekaran, “Traceback: A Forensic Tool for Distributed Systems,” Proc. 3rd Int. Conf. Adv. Comput. Netw. Informatics, pp. 17–27, 2016.
E. Alshammari, G. Al-Naymat, and A. Hadi, “A New Technique for File Carving on Hadoop Ecosystem,” in The International Conference on new Trends in Computing Sciences (ICTCS’2017), At Jordan-Amman, 2017.
Y.-Y. Teing, A. Dehghantanha, K.-K. R. Choo, T. Dargahi, and M. Conti, “Forensic Investigation of Cooperative Storage Cloud Service: Symform as a Case Study,” J. Forensic Sci., vol. 62, no. 3, pp. 641–654, May 2017.
Y. Y. Teing, A. Dehghantanha, K. K. R. Choo, and L. T. Yang, “Forensic investigation of P2P cloud storage services and backbone for IoT networks: BitTorrent Sync as a case study,” Comput. Electr. Eng., vol. 58, pp. 350–363, 2017.
M. Kohn, J. H. P. Eloff, and M. S. Olivier, “Framework for a Digital Forensic Investigation,” Communications, no. March, pp. 1–7, 2006.
M. E. Alex and R. Kishore, “Forensics framework for cloud computing,” Comput. Electr. Eng., vol. 60, pp. 193–205, 2017.
B. Martini and K. K. R. Choo, “An integrated conceptual digital forensic framework for cloud computing,” Digit. Investig., vol. 9, no. 2, pp. 71–80, 2012.
M. Rathbone, “A Beginner’s Guide to Hadoop Storage Formats (or File Formats).”
P. Zeyliger, “Hadoop Default Ports Quick Reference – Cloudera Engineering Blog.”
Apache Hadoop, “Apache Hadoop 2.9.0 – MapReduce Tutorial.”
M. Conti, A. Dehghantanha, K. Franke, and S. Watson, “Internet of Things security and forensics: Challenges and opportunities,” Futur. Gener. Comput. Syst., vol. 78, pp. 544–546, Jan. 2018.
S. Watson and A. Dehghantanha, “Digital forensics: the missing piece of the Internet of Things promise,” Comput. Fraud Secur., vol. 2016, no. 6, pp. 5–8, Jun. 2016.
N. Milosevic, A. Dehghantanha, and K.-K. R. Choo, “Machine learning aided Android malware classification,” Comput. Electr. Eng.
S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, S. Hashemi, and R. Khayami, “Know Abnormal, Find Evil: Frequent Pattern Mining for Ransomware Threat Hunting and Intelligence,” IEEE Trans. Emerg. Top. Comput., pp. 1–1, 2017.
H. H. Pajouh, A. Dehghantanha, R. Khayami, and K. K. R. Choo, “Intelligent OS X malware threat detection with code inspection,” Journal of Computer Virology and Hacking Techniques, pp. 1–11, 2017.
Acknowledgement
We would like to thank the editor and anonymous reviewers for their constructive comments. The views and opinions expressed in this article are those of the authors and not the organisation with whom the authors are or have been associated with or supported by.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Asim, M., McKinnel, D.R., Dehghantanha, A., Parizi, R.M., Hammoudeh, M., Epiphaniou, G. (2019). Big Data Forensics: Hadoop Distributed File Systems as a Case Study. In: Dehghantanha, A., Choo, KK. (eds) Handbook of Big Data and IoT Security. Springer, Cham. https://doi.org/10.1007/978-3-030-10543-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-10543-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10542-6
Online ISBN: 978-3-030-10543-3
eBook Packages: Computer ScienceComputer Science (R0)