Apache Samza

Kleppmann, Martin

doi:10.1007/978-3-319-63962-8_197-2

Martin Kleppmann³

775 Accesses
1 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Calisi L (2016) How to convert legacy Hadoop Map/Reduce ETL systems to Samza streaming. https://www.youtube.com/watch?v=KQ5OnL2hMBY
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache flink: stream and batch processing in a single engine. IEEE Data Eng Bull 38(4):28–38. http://sites.computer.org/debull/A15dec/p28.pdf
Chen S (2016) Scalable complex event processing on Samza @Uber. https://www.slideshare.net/ShuyiChen2/scalable-complex-event-processing-on-samza-uber
Das S, Botev C, Surlaker K, Ghosh B, Varadarajan B, Nagaraj S, Zhang D, Gao L, Westerman J, Ganti P, Shkolnik B, Topiwala S, Pachev A, Somasundaram N, Subramaniam S (2012) All aboard the Databus! LinkedIn’s scalable consistent change data capture platform. In: 3rd ACM symposium on cloud computing (SoCC). https://doi.org/10.1145/2391229.2391247
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: 6th USENIX symposium on operating system design and implementation (OSDI)
Google Scholar
Feng T (2015) Benchmarking apache Samza: 1.2 million messages per second on a single node. http://engineering.linkedin.com/performance/benchmarking-apache-samza-12-million-messages-second-single-node
Goodhope K, Koshy J, Kreps J, Narkhede N, Park R, Rao J, Ye VY (2012) Building LinkedIn’s real-time activity data pipeline. IEEE Data Eng Bull 35(2):33–45. http://sites.computer.org/debull/A12june/A12JUN-CD.pdf
Hermann J, Balso MD (2017) Meet michelangelo: uber’s machine learning platform. https://eng.uber.com/michelangelo/
Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph AD, Katz R, Shenker S, Stoica I (2011) Mesos: a platform for fine-grained resource sharing in the data center. In: 8th USENIX symposium on networked systems design and implementation (NSDI)
Google Scholar
Junqueira FP, Reed BC, Serafini M (2011) Zab: high-performance broadcast for primary-backup systems. In: 41st IEEE/IFIP international conference on dependable systems and networks (DSN), pp 245–256. https://doi.org/10.1109/DSN.2011.5958223
Kleppmann M (2017) Designing data-intensive applications. O’Reilly Media. ISBN:978-1-4493-7332-0
Google Scholar
Kleppmann M, Kreps J (2015) Kafka, Samza and the Unix philosophy of distributed data. IEEE Data Eng Bull 38(4):4–14. http://sites.computer.org/debull/A15dec/p4.pdf
Google Scholar
Kreps J (2014) Why local state is a fundamental primitive in stream processing. https://www.oreilly.com/ideas/why-local-state-is-a-fundamental-primitive-in-stream-processing
Kreps J, Narkhede N, Rao J (2011) Kafka: a distributed messaging system for log processing. In: 6th international workshop on networking meets databases (NetDB)
Google Scholar
Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S (2015) Twitter heron: stream processing at scale. In: ACM international conference on management of data (SIGMOD), pp 239–250. https://doi.org/10.1145/2723372.2723374
Netflix Technology Blog (2016) Kafka inside Keystone pipeline. http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html
Noghabi SA, Paramasivam K, Pan Y, Ramesh N, Bringhurst J, Gupta I, Campbell RH (2017) Samza: stateful scalable stream processing at LinkedIn. Proc VLDB Endow 10(12):1634–1645. https://doi.org/10.14778/3137765.3137770
Article Google Scholar
Paramasivam K (2016) Stream processing with Apache Samza – current and future. https://engineering.linkedin.com/blog/2016/01/whats-new-samza
Pathirage M, Hyde J, Pan Y, Plale B (2016) SamzaSQL: scalable fast data management with streaming SQL. In: IEEE international workshop on high-performance big data computing (HPBDC), pp 1627–1636. https://doi.org/10.1109/IPDPSW.2016.141
Qiao L, Auradar A, Beaver C, Brandt G, Gandhi M, Gopalakrishna K, Ip W, Jgadish S, Lu S, Pachev A, Ramesh A, Surlaker K, Sebastian A, Shanbhag R, Subramaniam S, Sun Y, Topiwala S, Tran C, Westerman J, Zhang D, Das S, Quiggle T, Schulman B, Ghosh B, Curtis A, Seeliger O, Zhang Z (2013) On brewing fresh Espresso: LinkedIn’s distributed data serving platform. In: ACM international conference on management of data (SIGMOD), pp 1135–1146. https://doi.org/10.1145/2463676.2465298
Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache Hadoop YARN: yet another resource negotiator. In: 4th ACM symposium on cloud computing (SoCC). https://doi.org/10.1145/2523616.2523633
Wang G, Koshy J, Subramanian S, Paramasivam K, Zadeh M, Narkhede N, Rao J, Kreps J, Stein J (2015) Building a replicated logging system with Apache Kafka. Proc VLDB Endow 8(12):1654–1655. https://doi.org/10.14778/2824032.2824063
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Cambridge, Cambridge, UK
Martin Kleppmann

Authors

Martin Kleppmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Kleppmann .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

Politecnico di Milano http://home.deib.polimi.it/margara/
Alessandro Margara
Database Systems and Information Management Group, Technische Universität Berlin, Einsteinufer 17, 10587, Berlin, Germany
Tilmann Rabl

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Kleppmann, M. (2018). Apache Samza. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_197-2

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_197-2
Published: 20 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

Latest
Apache Samza

Published:

20 March 2018

DOI: https://doi.org/10.1007/978-3-319-63962-8_197-2
Original
Samza

Published:

19 February 2018

DOI: https://doi.org/10.1007/978-3-319-63962-8_197-1