UMF-0504 Scalable Stream Processing: A Survey of Storm, Samza, Spark and Flink | Devoxx
bigdata Big Data & Machine Learning

Room C

Thursday from 10:50 AM til 11:40 AM

Batch-oriented systems have done the heavy lifting in data-intensive applications for decades, but they do not reflect the unbounded and continuous nature of data as it is produced in many real-world applications. Stream-oriented systems, on the other hand, process data as it arrives and thus are oftentimes the more natural fit. A great number of stream processors have emerged over the last years and all are advertised as highly available, fault-tolerant and horizontally scalable. But where do these systems differ and which is the right one for a given use case?

In this talk, we give an overview of the state of the art of stream processors for low-latency Big Data analytics and conduct a qualitative comparison of the most popular contenders, namely Storm and its abstraction layer Trident, Samza, Flink and Spark Streaming.

We first cover how stream processing frameworks differ from batch-oriented systems (e.g. Hadoop and Spark) and how they are typically employed (Lambda & Kappa Architecture). We then go into detail on each system and inspect their respective rationales, guarantees, and trade-offs. As an illustrative example we will cover real-time machine learning use-cases.

 Stream Processing    Distributed Systems    learning algorithms    NoSQL    performance  
Felix Gessert Felix Gessert

Felix Gessert (28) is CEO and co-founder of Baqend. Baqend develops a cloud backend to help programmers build instantly-loading websites with a novel caching algorithm.

Felix received his master of computer science from the University of Hamburg and founded Baqend in 2014 with fellow students. His PhD thesis is concerned with the technical foundations of Baqend. His major interests are scalable database systems, transactions, web technologies for cloud data management and steaks.

Felix is passionate about leveraging and improving NoSQL systems for web applications. He frequently talks and writes about the related challenges and organizes a conference series on cloud databases.