《04-龙三-圆桌Autoscaling Flink at Netflix - Timothy Farkas, Netflix.pdf》由会员分享,可在线阅读,更多相关《04-龙三-圆桌Autoscaling Flink at Netflix - Timothy Farkas, Netflix.pdf(49页珍藏版)》请在三个皮匠报告上搜索。
1、Autoscaling Timothy Farkas Senior Software Engineer Netflix Problem Definition Our Pain Thousands of stateless single source and single sink Flink routers. All operators are chained. When lag for a router exceeds a threshold we are paged. Kafka Consumer Project Filter Sink Definitions Workload: Even
2、ts being produced to a kafka topic. Two main knobs to turn: Message Rate Message Size Lag: The time it would take for a router to process all the remaining unprocessed events that are buffered in its kafka topic. Healthy Router: A router is healthy if its lag is always under 5 minutes. Autoscaling S
3、olution: Adjust the number of nodes in the router dynamically based on the workload to keep the router healthy. Attempt to use the smallest number of nodes that are required to keep the pipeline healthy. Solution Space Claim: There is no perfect solution. Any autoscaling algorithm can be defeated by one or more workloads. Proof: Take any autoscaling algorithm A. Provide A with a workload W that do