Apache Kafka

Sax, Matthias J.

doi:10.1007/978-3-319-63962-8_196-1

Apache Kafka

Matthias J. Sax³

Living reference work entry
Later version available View entry history
First Online: 10 February 2018

2340 Accesses
17 Citations
11 Altmetric

Definitions

Apache Kafka (Apache Software Foundation 2017b; Kreps et al. 2011; Goodhope et al. 2012; Wang et al. 2015; Kleppmann and Kreps 2015) is a scalable, fault-tolerant, and highly available distributed streaming platform that can be used to store and process data streams.

Kafka consists of three main components:

the Kafka cluster,
the Connect framework (Connect API),
and the Streams programming library (Streams API).

The Kafka cluster stores data streams, which are sequences of messages/events continuously produced by applications and sequentially and incrementally consumed by other applications. The Connect API is used to ingest data into Kafka and export data streams to external systems like distributed file systems, databases, and others. For data stream processing, the Streams API allows developers to specify sophisticated stream processing pipelines that read input streams from the Kafka cluster and write results back to Kafka.

Kafka supports many different use cases...

This is a preview of subscription content, log in via an institution.

References

Apache Software Foundation (2017a) Apache Hadoop project web page. https://hadoop.apache.org/
Apache Software Foundation (2017b) Apache Kafka project web page. https://kafka.apache.org/
Apache Software Foundation (2017c) Apache Samza project web page. https://samza.apache.org/
Apache Software Foundation (2017d) Apache ZooKeeper project web page. https://zookeeper.apache.org/
Facebook Inc (2017) RocksDB project web page. http://rocksdb.org/
Goodhope K, Koshy J, Kreps J, Narkhede N, Park R, Rao J, Ye VY (2012) Building Linkedin’s real-time activity data pipeline. IEEE Data Eng Bull 35(2):33–45. http://sites.computer.org/debull/A12june/pipeline.pdf
Hunt P, Konar M, Junqueira FP, Reed B (2010) ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIX ATC’10. USENIX Association, Berkeley, p 11. http://dl.acm.org/citation.cfm?id=1855840.1855851
Kleppmann M (2016) Making sense of stream processing, 1st edn. O’Reilly Media Inc., 183 pages
Google Scholar
Kleppmann M (2017) Designing data-intensive applications. O’Reilly Media Inc., Sebastopol
Google Scholar
Kleppmann M, Kreps J (2015) Kafka, Samza and the Unix philosophy of distributed data. IEEE Data Eng Bull 38(4):4–14. http://sites.computer.org/debull/A15dec/p4.pdf
Kreps J, Narkhede N, Rao J (2011) Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, pp 1–7
Google Scholar
Noghabi SA, Paramasivam K, Pan Y, Ramesh N, Bringhurst J, Gupta I, Campbell RH (2017) Samza: stateful scalable stream processing at LinkedIn. Proc VLDB Endow 10(12):1634–1645. https://doi.org/10.14778/3137765.3137770
Article Google Scholar
Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache Hadoop YARN: yet another resource negotiator. In: 4th ACM symposium on cloud computing (SoCC). https://doi.org/10.1145/2523616.2523633
Wang G, Koshy J, Subramanian S, Paramasivam K, Zadeh M, Narkhede N, Rao J, Kreps J, Stein J (2015) Building a replicated logging system with Apache Kafka. PVLDB 8(12):1654–1655. http://www.vldb.org/pvldb/vol8/p1654-wang.pdf
Google Scholar

Download references

Author information

Authors and Affiliations

Confluent Inc., Palo Alto, CA, USA
Matthias J. Sax

Authors

Matthias J. Sax
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias J. Sax .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

Politecnico di Milano http://home.deib.polimi.it/margara/
Alessandro Margara
Database Systems and Information Management Group, Technische Universität Berlin, Einsteinufer 17, 10587, Berlin, Germany
Tilmann Rabl

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Sax, M.J. (2018). Apache Kafka. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_196-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_196-1
Published: 10 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

Latest
Apache Kafka

Published:

10 March 2022

DOI: https://doi.org/10.1007/978-3-319-63962-8_196-2
Original
Apache Kafka

Published:

10 February 2018

DOI: https://doi.org/10.1007/978-3-319-63962-8_196-1