《2019年腾讯实时流计算平台演进之路.pdf》由会员分享,可在线阅读,更多相关《2019年腾讯实时流计算平台演进之路.pdf(39页珍藏版)》请在三个皮匠报告上搜索。
1、腾讯实时流计算平台演进之路腾讯高级工程师目目录录 Flink在腾讯实时计算概况简介 Oceanus平台简介 针对Flink的扩展与优化 Q&AFlink在在腾腾讯讯的的演演进进历历程程2017年上Flink框架预研,跟Storm进行对比2017年下Flink内部版本定制开发,Storm业务迁移,Standalone集群模式运行2018年上Flink产品化,打造一体化的实时流计算平台Oceanus,Flink on Yarn2018年下实时流计算平台规模化接入腾讯内部业务(覆盖所有BG)与外部客户并上线公有云2019年上完善场景化服务,上线在线ML,同时发力Flink批处理,打造SuperSQL
2、Flink赋赋能能腾腾讯讯实实时时计计算算Flink在在腾腾讯讯实实时时计计算算的的规规模模集群总核数34万峰值算力2.1亿/s日均消息量近20万亿目前,腾讯内部除广告的在线训练业务外,原先运行在Storm上的实时流计算业务都已逐步迁移至Flink平台,广告业务的迁移计划预期也在今年下半年完成。目目录录 Flink在腾讯实时计算概况简介 Oceanus平台简介 针对Flink的扩展与优化 Q&AOceanus平平台台整整体体技技术术架架构构Oceanus-应应用用列列表表Oceanus-画画布布详详情情Oceanus-指指标标统统计计Oceanus-在在线线机机器器学学习习Oceanus-在在
3、线线机机器器学学习习目目录录 Flink在腾讯实时计算简介 Oceanus平台简介 针对Flink的扩展与优化 Q&AFlink Web UI重重构构Flink Web UI重重构构JobManager Failover优优化化ZooKeeperZooKeeperJobManagerJobManagerTaskTaskTaskManagerTaskTaskTaskTask1.Grant LeadershipTaskTaskTaskManagerTaskTaskTaskTaskTaskTaskTaskManagerTaskTaskTaskTask3.Report Task State4.Rech
4、eck running state2.Notify Leadership Changed3.Report Task State检检查查点点重重构构-当当前前机机制制StatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerCheckpointCoordinatorJobManagerTriggerCheckpointTriggerCheckpointTriggerCheckpointTriggerCheckpointCheckpointBarr
5、ierCheckpointBarrierCheckpointBarrierCheckpointBarrier检检查查点点重重构构-当当前前机机制制StatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerCheckpointCoordinatorJobManagerTriggerCheckpointTriggerCheckpointTriggerCheckpointTriggerCheckpointCheckpointBarrierCheckp
6、ointBarrierCheckpointBarrierCheckpointBarrier检检查查点点重重构构-当当前前机机制制StatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerCheckpointCoordinatorJobManagerTriggerCheckpointTriggerCheckpointTriggerCheckpointTriggerCheckpointCheckpointBarrierCheckpointBarri
7、erCheckpointBarrierCheckpointBarrierFail Job检检查查点点重重构构-当当前前机机制制StatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerCheckpointCoordinatorJobManagerTriggerCheckpointTriggerCheckpointTriggerCheckpointTriggerCheckpointCheckpointBarrierCheckpointBarrie
8、rCheckpointBarrierCheckpointBarrierFail Job引引入入CheckpointFailureManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerCheckpointCoordinatorJobManagerTriggerCheckpointTriggerCheckpointTriggerCheckpointTriggerCheckpointCheckpointBarrierCheckpoi
9、ntBarrierCheckpointBarrierCheckpointBarrier引引入入CheckpointFailureManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerCheckpointCoordinatorJobManagerTriggerCheckpointTriggerCheckpointTriggerCheckpointTriggerCheckpointCheckpointBarrierCheckpoi
10、ntBarrierCheckpointBarrierCheckpointBarrier类类重重构构:删删除除CheckpointTriggerResult统统一一检检查查点点异异常常类类CheckpointDeclineReason-CheckpointFailureReasonPendingCheckpoint#abortXXX引引入入CheckpointFailureManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerC
11、heckpointCoordinatorJobManagerTriggerCheckpointTriggerCheckpointTriggerCheckpointTriggerCheckpointCheckpointBarrierCheckpointBarrierCheckpointBarrierCheckpointBarrierCheckpointFailureManagerFailure counter类类重重构构:删删除除CheckpointTriggerResult统统一一检检查查点点异异常常类类CheckpointDeclineReason-CheckpointFailureReas
12、onPendingCheckpoint#abortXXX引引入入CheckpointFailureManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTaskTaskManagerCheckpointCoordinatorJobManagerTriggerCheckpointTriggerCheckpointTriggerCheckpointTriggerCheckpointCheckpointBarrierCheckpointBarrierCheckp
13、ointBarrierCheckpointBarrierReport Checkpoint FailureCheckpointFailureManagerFailure counter类类重重构构:删删除除CheckpointTriggerResult统统一一检检查查点点异异常常类类CheckpointDeclineReason-CheckpointFailureReasonPendingCheckpoint#abortXXX引引入入CheckpointFailureManagerStatefulTaskStatefulTaskTaskManagerStatefulTaskStatefulTa
14、skTaskManagerStatefulTaskStatefulTaskTaskManagerCheckpointCoordinatorJobManagerTriggerCheckpointTriggerCheckpointTriggerCheckpointTriggerCheckpointCheckpointBarrierCheckpointBarrierCheckpointBarrierCheckpointBarrierReport Checkpoint FailureCheckpointFailureManagerFailure counter类类重重构构:删删除除Checkpoint
15、TriggerResult统统一一检检查查点点异异常常类类CheckpointDeclineReason-CheckpointFailureReasonPendingCheckpoint#abortXXXEnhanced Windowevent streamwindowcurrentwatermark大于小于(丢弃)ttttevent streamwindowcurrentwatermark大于小于ttttEnhanced WindowINSERT INTO t_minute_topic_cntSELECT topic,sum(cnt)AS sort_cnt,fixedTime(ENHANCE
16、D_START(pkgTime,INTERVAL 60 SECOND),yyyyMMddHHmm)FROM tdsort_packcnt_flinkGROUP BYENHANCED(pkgTime,INTERVAL 60 SECOND),topic指标统计场景Increment WindowEventeeee.eeeeeeee5min.5min5minR(n)=R(n-1)+deltaSink R(n)R(n+1)=R(n)+deltaSink R(n+1)R(n+m)=R(n+m-1)+deltaSink R(n+m)24hPurge R(n+m)Increment WindowSELECT
17、 userId,SUM(units),INCREMENT_TIME(true)FROM Consumes GROUP BY INCREMENT(consumeTime,INTERVAL 1 DAY,INTERVAL 1 hour),userId一天中游戏用户钻石消耗小时粒度的增长趋势val input=env.addSource(new SourceFunctionTuple2String,Long().).assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks(String,Long).).toTable(tEnv,a
18、,b,c.rowtime)val windowedTable=input.window(Increment over 1.day every 1.hour on c as w).groupBy(w,a).select(b.sum,a.count,incrementTime(true)SQL用用法法:Table API用用法法:LocalKeyBy缓缓解解数数据据倾倾斜斜455455Source-KeyBy-Count-Sink322133323Source-LocalKeyBy-Window-Count-KeyBy-Sum-Sink132455455LocalKeyBy VS KeyByAss
19、ign Watermark优优化化:下下游游算算子子检检测测IdleSourceSourceTaskTask 1 1SourceSourceTaskTask N NKafka partition1Kafka partitionNKeyByKeyBy/Window/WindowTaskTask 1 1KeyByKeyBy/Window/WindowTaskTask N N.activeidleassign watermarkSourceSourceTaskTask 1 1SourceSourceTaskTask N NKafka partition1Kafka partitionNMapMapT
20、askTask 1 1MapMapTaskTask N N.assign watermarkFilterFilterTaskTask 1 1FilterFilterTaskTask N N.activeidleKeybyKeyby/WindowWindowTaskTask 1 1KeybyKeyby/WindowWindowTaskTask N NNo data,No watermarkAssign Watermark优优化化:下下游游算算子子检检测测Idle日日志志重重构构:日日志志展展示示与与日日志志分分离离UserClassLoader重重构构与与日日志志分分离离ParentParent
21、 ClassloaderClassloaderorg.slf4j.Loggerorg.slf4j.Loggerorg.apache.log4jorg.apache.log4jUserUser Classloader1Classloader1Task ManagerUserUser Classloader2Classloader2Task for job2Task for job1Logger LOG=LoggerFactory.getLogger();Logger LOG=LoggerFactory.getLogger();-Dlog4j.configuration=-Dlog4j.file=
22、UserClassLoader重重构构与与日日志志分分离离ParentParent ClassloaderClassloaderorg.slf4j.Loggerorg.slf4j.Loggerorg.apache.log4jorg.apache.log4jUserUserClassloader1Classloader1org.slf4j.Loggerorg.slf4j.Loggerorg.apache.log4jorg.apache.log4jTask ManagerUserUserClassloader2Classloader2org.slf4j.Loggerorg.slf4j.Logger
23、org.apache.log4jorg.apache.log4jTask for job2Task for job1Logger LOG=LoggerFactory.getLogger();Logger LOG=LoggerFactory.getLogger();-Dlog4j.configuration=-Dlog4j.file=log4j.propertylogger.appender.xxx.File=userlogpath1dynamic.log4j.propertylogger.appender.xxx.File=userlogpath2dynamic.贡献情况:5名contribu
24、tor,1名committer;团队总commits数:近300(两人的commits数排名在贡献列表前25名内);已反馈给社区的特性:(1)Flink universal Kafka connector(2)15+Table/SQL functions(3)AsyncOperator timeout支持(4)Leader election ZK相关的优化(5)检查点相关处理逻辑重构(2/3)(6)Plugin/Loading System(PR review进行中)(7)Queryable state improvement(discussing)Flink社社区区参参与与情情况况目目录录 Flink在腾讯实时计算概况简介 Oceanus平台简介 针对Flink的扩展与优化 Q&A