《陈同杰-挑战双11实时数据洪峰的流计算实践(21页).pdf》由会员分享,可在线阅读,更多相关《陈同杰-挑战双11实时数据洪峰的流计算实践(21页).pdf(21页珍藏版)》请在三个皮匠报告上搜索。
1、挑战双11实时数据洪峰的流计算实践阿里巴巴数据技术及产品部陈同杰阿里流计算介绍面临的挑战我们是如何做的?2 31阿里巴巴流计算介绍双十一媒体直播大屏商家统一数据平台:生意参谋阿里数据的现状100M Events/s每秒一亿记录100B Events/day一天万亿记录PB EverydayEBTotal面临的挑战Low Latency低延时Exactly-Once高精准High Throughput高吞吐Strict SLA强保障技 术 难 点阿里巴巴如何做流计算业务系统DRCtailfile表日志回流DataHubFlinkDWD层DWS/ADS层FlinkHBase数 据 链 路流 计 算
2、 引 擎 对 比StormFlinkSpark Streaming=+FlinkAlibaba GroupBlinkWhat is Blink?Why Blink?-Stateful ProcessingSynchronous IOacross network同步网络IOAsynchronous writeacross network异步写All modificationsare local本地修改Classic ArchitectureBlink with rocksDBStateBackendHbaseHDFSRocksDBRocksDBWhy Blink?-Incremental Che
3、ckpoint1.sstCP-12.sst3.sstMF2.sstCP-23.sst4.sstMF2.sstCP-33.sst5.sstMFStorage1.sst2.sst3.sst2.sst3.sst4.sst2.sst3.sst5.sstBeforeAfter1.sstCP-12.sst3.sstMF2.sstCP-23.sst4.sstMF2.sstCP-33.sst5.sstMF1.sst2.sst3.sst2.sst3.sst4.sst2.sst3.sst5.sstFaster CPFaster RecoveryTimelineTimelineStorageWhy Blink?-A
4、synchronous IOaWabWbExternal ServiceSync.IOaacExternal ServicebcdbdAsync.IOSend RequestReceive RequestWaitWait for ResponseReduced ThroughputConcurrent ProcessingIncreased ThroughputAnd Many More纯流式引擎Checkpoint机制流控与反压实时监控大规模部署聚 合 组 件1 1大小维度合并减少网络传输50%以上2 2精简存储利用index来存储指标,state存储减少一半4 4批量写操作mini-batch sink,降低 HBase 压力5 5大幅减少网络传输与state大小多条件分支优化高性能排序top组件利用PriorityQueue+MapState,大幅减少序列化次数,性能提高10倍左右3流计算开发平台 赤兔Beam,TableAPI and SQLStreaming Into Future Portal,Stream Processing as a Service服务化平台BeamTableAPI&SQL语义层统一Machine Learningin real time实时智能Stream&BatchUnification实时离线统一