Abstract
There is a growing demand for information exchange in the age of the Internet of Things. One common scenario involves transferring data from distributed devices in the field to central servers or cloud environments. However, little research has been done on the possibilities for forensic investigation of supporting infrastructure such as Apache Kafka, which plays a crucial role in modern big data architectures.
In this paper, we present our work on the forensic investigation of Apache Kafka. We use methodologies of reverse engineering to infer the data formats that Apache Kafka uses server-side. The results help us to implement a new module that is able to read Apache Kafka log files. An investigator can load the module in the open-source forensic platform “Autopsy”. We highlight possibilities and limitations regarding encryption and data retention in Apache Kafka and suggest to store data decentralized when it comes to sensitive data. As a result of these measures, applications become more resilient to attacks and are able to provide increased security, ethical standards, and freedom for the application users. This can be a unique selling point in future data driven applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fu, G., Zhang, Y., Yu, G.: A fair comparison of message queuing systems. IEEE Access 9, 421–432 (2021)
Zelenin, A., Kropp, A.: Apache Kafka: Von den Grundlagen bis zum Produktiveinsatz, 1st edition. Carl Hanser Verlag (2021)
Wagner, J., Rasin, A., Grier, J.: Database forensic analysis through internal structure carving. Digit. Investig. 14, S106–S115 (2015)
Pereira, M.: Forensic analysis of the Firefox 3 Internet history and recovery of deleted SQLite records. Digit. Investig. 5, 93–103 (2009)
Kaur, K., Adane, D.: A framework for database forensic analysis. Comput. Sci. Eng.: Int. J. 2, 27–41 (2012)
Beyers, H., Olivier, M., Hancke, G.: Assembling metadata for database forensics. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2011. IAICT, vol. 361, pp. 89–99. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24212-0_7
Chivers, H., Hargreaves, C.: Forensic data recovery from the windows search database. Digit. Investig. 7, 114–126 (2011)
Yoon, J., Jeong, D., Kang, C., Lee, S.: Forensic investigation framework for the document store NoSQL DBMS: MongoDB as a case study. Digit. Investig. 17, 53–65 (2016)
Carrier, B., et al.: The Sleuth Kit/Autopsy (2021). https://www.sleuthkit.org
Lin, X.: Introductory Computer Forensics: A Hands-On Practical Approach. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00581-8
Arnes, A.: Digital Forensics. Wiley, Hoboken (2018)
Kim, H., Kim, S., Shin, Y., Jo, W., Lee, S., Shon, T.: Ext4 and XFS file system forensic framework based on TSK. Electronics 2021, 10 (2021)
Lee, S., Shon, T.: Improved deleted file recovery technique for Ext2/3 filesystem. J. Supercomput. 2014(70), 20–30 (2014)
Fairbanks, K.: An analysis of Ext4 for digital forensics. Digit. Investig. 9, S118–S13 (2012)
Lee, S., Jo, W., Eo, S., Shon, T.: ExtSFR: scalable file recovery framework based on an Ext file system. Multimed. Tools Appl. 79, 16093–16111 (2019)
Ahn, J., Park, J., Lee, S.: The research on the recovery techniques of deleted files in the XFS filesystem. JKII Secur. Cryptol. 24, 885–896 (2014)
Park, Y., Chang, H., Shon, T.: Data investigation based on XFS file system metadata. Multimed. Tools Appl. 75, 14721–14743 (2015)
Majore, S., Lee, C., Shon, T.: XFS file system and file recovery tools. Int. J. Smart Home 7 (2013)
Lim, E., et al.: Hadoop Distributed File System Forensics Toolkit (HDFS FTK) (2018). https://github.com/edison0xyz/hadoop_ftk
Big Data Community Forum (2017). https://archive.today/qpHBt
Bhathal, G., Singh, A.: Big Data: Hadoop framework vulnerabilities, security issues and attacks. Array 1-2 (2019)
Zuber, N., Kacianka, S., Gogoll, J.: Big data ethics, machine ethics or information ethics? Navigating the maze of applied ethics in IT. arXiv:2203.13494 (2022)
Agrawal, B., Hansen, R., Rong, C., Wiktorski, T.: SD-HDFS: secure deletion in hadoop distributed file system. In: IEEE BigData Congress 2016, pp. 181–189 (2016)
Mallmann, G.L., de Vargas Pinto, A., Maçada, A.C.G.: Shedding light on shadow IT: definition, related concepts, and consequences. In: Ramos, I., Quaresma, R., Silva, P., Oliveira, T. (eds.) Information Systems for Industry 4.0. LNISO, vol. 31, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14850-8_5
Asim, M., McKinnel, D.R., Dehghantanha, A., Parizi, R.M., Hammoudeh, M., Epiphaniou, G.: Big data forensics: hadoop distributed file systems as a case study. In: Dehghantanha, A., Choo, K.-K.R. (eds.) Handbook of Big Data and IoT Security, pp. 179–210. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10543-3_8
Apache Kafka Contributors. Apache Kafka Git Repository (2021). https://github.com/a0x8o/kafka
Apache Software Foundation. Apache Kafka Online Documentation (2022). https://kafka.apache.org/documentation/
Confluent Inc. Confluent Developer Portal (2023). https://developer.confluent.io/learn-kafka/apache-kafka/partitions/
Raptis, T., Passarella, A.: On efficiently partitioning a topic in apache Kafka. In: CITS 2022, pp. 1–8 (2022)
Vielberth, M., Pernul, G.: A security information and event management pattern. SLPLoP 2018 Chile (2018)
Choudhary, C., Singh, I., Kumar, M.: A real-time fault tolerant and scalable recommender system design based on Kafka. In: IEEE I2CT 2022 India (2022)
Kreps, J., Narkhede, N., Rao, J.: Kafka: a distributed messaging system for log processing. In: NetDB Workshop 2011 (2011)
Dobbelaere, P., Esmaili, K.: Kafka versus RabbitMQ: a comparative study of two industry reference publish/subscribe implementations: industry Paper. In: ACM DEBS 2017, Spain, pp. 19–23 (2017)
Silva, I., Valle, J., Souza, G., Budke, J.: Using micro-services and artificial intelligence to analyze images in criminal evidences. In: DFRWS 2021 USA, Digital Investigation, vol. 37 (2021)
Braunisch, N., Schlesinger, S., Lehmann, R.: Adaptive industrial IoT gateway using Kafka streaming platform. In: INDIN 2022 Australia (2022)
Narkhede, N., Shapira, G., Palino, T.: Kafka: The Definitive Guide: Real-time Data and Stream Processing at Scale. O’Reilly (2017)
Google Brain Team. Robust machine learning on streaming data using Kafka and Tensorflow-IO (2022). https://www.tensorflow.org/io/tutorials/kafka
Apache Software Foundation. Apache Spark - Kafka Integration Guide (2022). https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
Kamaraju, A., Ali, A., Deepak, R.: Best practices for cloud data protection and key management. In: Arai, K. (ed.) FTC 2021. LNNS, vol. 360, pp. 117–131. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-89912-7_10
Alouffi, B., Hasnain, M., Alharbi, A., et al.: A systematic literature review on cloud computing security: threats and mitigation strategies. IEEE Access 9, 57792–57807 (2021)
Giblin, C., Rooney, S., Vetsch, P., Preston, A.: Securing Kafka with encryption-at-rest. In: IEEE International Conference on Big Data (2021)
Barnes, R.: Kafka Summit 2021. https://www.confluent.io/events/kafka-summit-europe-2021/encrypting-kafka-messages-at-rest-to-secure-applications/
Hashemi, S., Zarei, M.: Internet of Things backdoors: resource management issues, security challenges, and detection methods. Trans. Emerg. Telecommun. Technol. 32(2), e4142 (2021)
FISA Report of the Research Section of the German Federal Parliament 2020. https://www.bundestag.de/resource/blob/796102/ea53ffe8e08a9ab11e270719263d8c53/WD-3-181-20-pdf-data.pdf
Apache Software Foundation. Apache Ambari (2022). https://ambari.apache.org
Kleppmann, M.: Is Kafka a database. Kafka Summit London 2019 Keynote (2019). https://www.youtube.com/watch?v=BuE6JvQE_CY
Confluent Inc., Blog. It’s Okay to Store Data in Kafka (2017). https://www.confluent.io/blog/okay-store-data-apache-kafka/
Confluent Inc., Blog. Infinite Storage in Confluent Platform (2020). https://www.confluent.io/blog/infinite-kafka-storage-in-confluent-platform/
Apache Software Foundation. Apache Kafka Online Documentation on Compaction (2023). https://kafka.apache.org/documentation/#compaction
Ismael Juma (Confluent Inc.) on Twitter (2019). https://twitter.com/StephaneMaarek/status/1161173028627202049
Cloudera Inc.: Kafka Security (2021). https://docs.cloudera.com/documentation/enterprise/latest/topics/kafka_security.html
Official Journal of the European Union. Right to be forgotten. GDPR, Chapter 3, Section 2 (2016). https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679#d1e2606-1-1
National Privacy Commission Philippines. Data Privacy Act of 2012 (2016). https://privacy.gov.ph/implementing-rules-regulations-data-privacy-act-2012/#34
Acknowledgement
We would like to extend our heartfelt appreciation to all those who have supported us throughout this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mager, T. (2023). Big Data Forensics on Apache Kafka. In: Muthukkumarasamy, V., Sudarsan, S.D., Shyamasundar, R.K. (eds) Information Systems Security. ICISS 2023. Lecture Notes in Computer Science, vol 14424. Springer, Cham. https://doi.org/10.1007/978-3-031-49099-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-49099-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49098-9
Online ISBN: 978-3-031-49099-6
eBook Packages: Computer ScienceComputer Science (R0)