Skip to main content

Big Data Forensics on Apache Kafka

  • Conference paper
  • First Online:
Information Systems Security (ICISS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14424))

Included in the following conference series:

  • 742 Accesses

Abstract

There is a growing demand for information exchange in the age of the Internet of Things. One common scenario involves transferring data from distributed devices in the field to central servers or cloud environments. However, little research has been done on the possibilities for forensic investigation of supporting infrastructure such as Apache Kafka, which plays a crucial role in modern big data architectures.

In this paper, we present our work on the forensic investigation of Apache Kafka. We use methodologies of reverse engineering to infer the data formats that Apache Kafka uses server-side. The results help us to implement a new module that is able to read Apache Kafka log files. An investigator can load the module in the open-source forensic platform “Autopsy”. We highlight possibilities and limitations regarding encryption and data retention in Apache Kafka and suggest to store data decentralized when it comes to sensitive data. As a result of these measures, applications become more resilient to attacks and are able to provide increased security, ethical standards, and freedom for the application users. This can be a unique selling point in future data driven applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Fu, G., Zhang, Y., Yu, G.: A fair comparison of message queuing systems. IEEE Access 9, 421–432 (2021)

    Article  Google Scholar 

  2. Zelenin, A., Kropp, A.: Apache Kafka: Von den Grundlagen bis zum Produktiveinsatz, 1st edition. Carl Hanser Verlag (2021)

    Google Scholar 

  3. Wagner, J., Rasin, A., Grier, J.: Database forensic analysis through internal structure carving. Digit. Investig. 14, S106–S115 (2015)

    Article  Google Scholar 

  4. Pereira, M.: Forensic analysis of the Firefox 3 Internet history and recovery of deleted SQLite records. Digit. Investig. 5, 93–103 (2009)

    Article  Google Scholar 

  5. Kaur, K., Adane, D.: A framework for database forensic analysis. Comput. Sci. Eng.: Int. J. 2, 27–41 (2012)

    Google Scholar 

  6. Beyers, H., Olivier, M., Hancke, G.: Assembling metadata for database forensics. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2011. IAICT, vol. 361, pp. 89–99. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24212-0_7

    Chapter  Google Scholar 

  7. Chivers, H., Hargreaves, C.: Forensic data recovery from the windows search database. Digit. Investig. 7, 114–126 (2011)

    Article  Google Scholar 

  8. Yoon, J., Jeong, D., Kang, C., Lee, S.: Forensic investigation framework for the document store NoSQL DBMS: MongoDB as a case study. Digit. Investig. 17, 53–65 (2016)

    Article  Google Scholar 

  9. Carrier, B., et al.: The Sleuth Kit/Autopsy (2021). https://www.sleuthkit.org

  10. Lin, X.: Introductory Computer Forensics: A Hands-On Practical Approach. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00581-8

    Book  Google Scholar 

  11. Arnes, A.: Digital Forensics. Wiley, Hoboken (2018)

    Google Scholar 

  12. Kim, H., Kim, S., Shin, Y., Jo, W., Lee, S., Shon, T.: Ext4 and XFS file system forensic framework based on TSK. Electronics 2021, 10 (2021)

    Google Scholar 

  13. Lee, S., Shon, T.: Improved deleted file recovery technique for Ext2/3 filesystem. J. Supercomput. 2014(70), 20–30 (2014)

    Article  Google Scholar 

  14. Fairbanks, K.: An analysis of Ext4 for digital forensics. Digit. Investig. 9, S118–S13 (2012)

    Article  Google Scholar 

  15. Lee, S., Jo, W., Eo, S., Shon, T.: ExtSFR: scalable file recovery framework based on an Ext file system. Multimed. Tools Appl. 79, 16093–16111 (2019)

    Article  Google Scholar 

  16. Ahn, J., Park, J., Lee, S.: The research on the recovery techniques of deleted files in the XFS filesystem. JKII Secur. Cryptol. 24, 885–896 (2014)

    Google Scholar 

  17. Park, Y., Chang, H., Shon, T.: Data investigation based on XFS file system metadata. Multimed. Tools Appl. 75, 14721–14743 (2015)

    Article  Google Scholar 

  18. Majore, S., Lee, C., Shon, T.: XFS file system and file recovery tools. Int. J. Smart Home 7 (2013)

    Google Scholar 

  19. Lim, E., et al.: Hadoop Distributed File System Forensics Toolkit (HDFS FTK) (2018). https://github.com/edison0xyz/hadoop_ftk

  20. Big Data Community Forum (2017). https://archive.today/qpHBt

  21. Bhathal, G., Singh, A.: Big Data: Hadoop framework vulnerabilities, security issues and attacks. Array 1-2 (2019)

    Google Scholar 

  22. Zuber, N., Kacianka, S., Gogoll, J.: Big data ethics, machine ethics or information ethics? Navigating the maze of applied ethics in IT. arXiv:2203.13494 (2022)

  23. Agrawal, B., Hansen, R., Rong, C., Wiktorski, T.: SD-HDFS: secure deletion in hadoop distributed file system. In: IEEE BigData Congress 2016, pp. 181–189 (2016)

    Google Scholar 

  24. Mallmann, G.L., de Vargas Pinto, A., Maçada, A.C.G.: Shedding light on shadow IT: definition, related concepts, and consequences. In: Ramos, I., Quaresma, R., Silva, P., Oliveira, T. (eds.) Information Systems for Industry 4.0. LNISO, vol. 31, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14850-8_5

    Chapter  Google Scholar 

  25. Asim, M., McKinnel, D.R., Dehghantanha, A., Parizi, R.M., Hammoudeh, M., Epiphaniou, G.: Big data forensics: hadoop distributed file systems as a case study. In: Dehghantanha, A., Choo, K.-K.R. (eds.) Handbook of Big Data and IoT Security, pp. 179–210. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10543-3_8

    Chapter  Google Scholar 

  26. Apache Kafka Contributors. Apache Kafka Git Repository (2021). https://github.com/a0x8o/kafka

  27. Apache Software Foundation. Apache Kafka Online Documentation (2022). https://kafka.apache.org/documentation/

  28. Confluent Inc. Confluent Developer Portal (2023). https://developer.confluent.io/learn-kafka/apache-kafka/partitions/

  29. Raptis, T., Passarella, A.: On efficiently partitioning a topic in apache Kafka. In: CITS 2022, pp. 1–8 (2022)

    Google Scholar 

  30. Vielberth, M., Pernul, G.: A security information and event management pattern. SLPLoP 2018 Chile (2018)

    Google Scholar 

  31. Choudhary, C., Singh, I., Kumar, M.: A real-time fault tolerant and scalable recommender system design based on Kafka. In: IEEE I2CT 2022 India (2022)

    Google Scholar 

  32. Kreps, J., Narkhede, N., Rao, J.: Kafka: a distributed messaging system for log processing. In: NetDB Workshop 2011 (2011)

    Google Scholar 

  33. Dobbelaere, P., Esmaili, K.: Kafka versus RabbitMQ: a comparative study of two industry reference publish/subscribe implementations: industry Paper. In: ACM DEBS 2017, Spain, pp. 19–23 (2017)

    Google Scholar 

  34. Silva, I., Valle, J., Souza, G., Budke, J.: Using micro-services and artificial intelligence to analyze images in criminal evidences. In: DFRWS 2021 USA, Digital Investigation, vol. 37 (2021)

    Google Scholar 

  35. Braunisch, N., Schlesinger, S., Lehmann, R.: Adaptive industrial IoT gateway using Kafka streaming platform. In: INDIN 2022 Australia (2022)

    Google Scholar 

  36. Narkhede, N., Shapira, G., Palino, T.: Kafka: The Definitive Guide: Real-time Data and Stream Processing at Scale. O’Reilly (2017)

    Google Scholar 

  37. Google Brain Team. Robust machine learning on streaming data using Kafka and Tensorflow-IO (2022). https://www.tensorflow.org/io/tutorials/kafka

  38. Apache Software Foundation. Apache Spark - Kafka Integration Guide (2022). https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html

  39. Kamaraju, A., Ali, A., Deepak, R.: Best practices for cloud data protection and key management. In: Arai, K. (ed.) FTC 2021. LNNS, vol. 360, pp. 117–131. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-89912-7_10

    Chapter  Google Scholar 

  40. Alouffi, B., Hasnain, M., Alharbi, A., et al.: A systematic literature review on cloud computing security: threats and mitigation strategies. IEEE Access 9, 57792–57807 (2021)

    Article  Google Scholar 

  41. Giblin, C., Rooney, S., Vetsch, P., Preston, A.: Securing Kafka with encryption-at-rest. In: IEEE International Conference on Big Data (2021)

    Google Scholar 

  42. Barnes, R.: Kafka Summit 2021. https://www.confluent.io/events/kafka-summit-europe-2021/encrypting-kafka-messages-at-rest-to-secure-applications/

  43. Hashemi, S., Zarei, M.: Internet of Things backdoors: resource management issues, security challenges, and detection methods. Trans. Emerg. Telecommun. Technol. 32(2), e4142 (2021)

    Article  Google Scholar 

  44. FISA Report of the Research Section of the German Federal Parliament 2020. https://www.bundestag.de/resource/blob/796102/ea53ffe8e08a9ab11e270719263d8c53/WD-3-181-20-pdf-data.pdf

  45. Apache Software Foundation. Apache Ambari (2022). https://ambari.apache.org

  46. Kleppmann, M.: Is Kafka a database. Kafka Summit London 2019 Keynote (2019). https://www.youtube.com/watch?v=BuE6JvQE_CY

  47. Confluent Inc., Blog. It’s Okay to Store Data in Kafka (2017). https://www.confluent.io/blog/okay-store-data-apache-kafka/

  48. Confluent Inc., Blog. Infinite Storage in Confluent Platform (2020). https://www.confluent.io/blog/infinite-kafka-storage-in-confluent-platform/

  49. Apache Software Foundation. Apache Kafka Online Documentation on Compaction (2023). https://kafka.apache.org/documentation/#compaction

  50. Ismael Juma (Confluent Inc.) on Twitter (2019). https://twitter.com/StephaneMaarek/status/1161173028627202049

  51. Cloudera Inc.: Kafka Security (2021). https://docs.cloudera.com/documentation/enterprise/latest/topics/kafka_security.html

  52. Official Journal of the European Union. Right to be forgotten. GDPR, Chapter 3, Section 2 (2016). https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679#d1e2606-1-1

  53. National Privacy Commission Philippines. Data Privacy Act of 2012 (2016). https://privacy.gov.ph/implementing-rules-regulations-data-privacy-act-2012/#34

Download references

Acknowledgement

We would like to extend our heartfelt appreciation to all those who have supported us throughout this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Mager .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mager, T. (2023). Big Data Forensics on Apache Kafka. In: Muthukkumarasamy, V., Sudarsan, S.D., Shyamasundar, R.K. (eds) Information Systems Security. ICISS 2023. Lecture Notes in Computer Science, vol 14424. Springer, Cham. https://doi.org/10.1007/978-3-031-49099-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49099-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49098-9

  • Online ISBN: 978-3-031-49099-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics