SESSION

No shard left behind: Dynamic work rebalancing and Autoscaling in Apache Beam

The Apache Beam programming model is designed to support several advanced data processing features such as autoscaling and dynamic work rebalancing. In this talk, we will first explain how dynamic work rebalancing not only provides a general and robust solution to the problem of stragglers in traditional data processing pipelines, but also how it allows autoscaling to be truly effective. We will then present how dynamic work rebalancing works as implemented in the Google Cloud Dataflow runner and which path other Apache Beam runners link Apache Flink can follow to benefit from it.