上海品茶

祝佳俊_网易严选_datafun_20240309.pdf

编号:157206 PDF 24页 9.25MB 下载积分:VIP专享
下载报告请您先登录!

祝佳俊_网易严选_datafun_20240309.pdf

1、易严选湖仓体建设实践祝佳俊易严选01现状&问题02实践&效果03未来规划录 CONTENT01现状与问题1 数据架构现状2022:Aapche Iceberg在易严选批流体的实践存在的问题T+1d ODS原始数据准实时湖时调度准实时 ODST+1d DWD每天调度T+h DWD每天调度311T+1DWD产出链路3不同时效不同任务需要等待T+1 ODS数据制作任务完成后才能开始运任务产出T+1 DWD层数据,频繁的IO和任务调度,影响数据产出22时效性依赖任务调度频率时级时效的DWD数据时效依赖调度任务的频率,时效性越,调度越频繁,需更多计算资源天级/时级时效的DWD数据,虽然仅仅是查询的表不同

2、,但需要开发和维护不同的任务,处理逻辑要改动时需要修改多个任务,不易维护。02建设实践1.增强Iceberg Time Travel特性,持查询精确时间点数据2.实现Iceberg物化视图,提升DWD数据时效Time Travel增强:精确时间点数据查询T+1d ODS原始数据实时湖准实时 ODS凌晨调度time00:08idnametime1张三00:00update name=张 where id=1idnametime1张00:08insert name value(李四)00:09idnametime1张00:082李四00:09delete where id=100:1100:100

3、0:12idnametime2李斯00:11直接查询表得到的结果是idnametime1张00:082李四00:09我们期望的结果是在00:12时,要查询时间点00:10时的数据idnametime2李斯00:11update name=李斯 where id=2 T+1d ODS表:1.由Spark Jar任务产出2.保存了00:10准点快照数据凌晨调度T+1d DWDIceberg Time Travel特性增强SELECT*FROM tableTIMESTAMP AS OF 00:10:00idnametime1张00:08找到最近个commit时间=00:10:00的snapshot进

4、MOR查询,即00:08:00分提交的snapshot_1Iceberg原的Time Traveltime00:08insert name value(张)idnametime1张00:08insert name value(李四)00:09update name=李斯 where id=200:1100:1000:12commit snapshot_1commit snapshot_2idnametime2李四00:092李斯00:11idtime100:11数据件等值删除件delete where id=1d_file_ad_file_bfilerowd_file_b1位置删除件e_fil

5、e_ap_file_a00:12进查询Time Travel增强:精确时间点数据查询time00:08insert name value(张)idnametime1张00:08insert name value(李四)00:09update name=李斯 where id=200:1100:1000:12commit snapshot_1/设置time travel模式为精准模式 SET spark.sql.iceberg.time-travel-mode=exactly;/设置时间字段 SET spark.sql.iceberg.exact-read-combine-field=time;

6、SELECT*FROM table TIMESTAMP AS OF 00:10:001.回溯历史snapshot找到max(time)=00:10:00的快照snapshot_1作为全量快照2.从全量快照开始往后找到min(time)00:10:00的数据4.对全量快照和过滤数据后的增量快照执MOR查询commit snapshot_2idnametime2李四00:092李斯00:11idtime100:11数据件等值删除件idnametime1张00:082李四00:09delete where id=1d_file_ad_file_bfilerowd_file_b1位置删除件e_file

7、_ap_file_a00:12进查询Time Travel增强:精确时间点数据查询INSERT OVERWRITE TABLE dw.dwd_xxxx_p partition(ds=current_date()-1)SELECT a.col_x+b.col_x AS new_col_x,nvl(a.col_y,b.col_y)AS new_col_y,/其他字段加逻辑 from 离线T+1DWD数据$ODS T+1离线表 X AS a WHERE a.ds=current_date()-1LEFT JOIN$ODS T+1离线表 Y AS b WHERE b.ds=current_date()

8、-1ON a.id=b.idSET spark.sql.iceberg.time-travel-mode=exactly;SET spark.sql.iceberg.exact-read-combine-field=binlogtime;INSERT OVERWRITE TABLE dw.dwd_xxxx_p partition(ds=current_date()-1)SELECT a.col_x+b.col_x AS new_col_x,nvl(a.col_y,b.col_y)AS new_col_y,/其他字段加逻辑 from$iceberg 准实时表 X AS a TIMESTAMP A

9、S OF concat(current_date(),00:10:00)LEFT JOIN$iceberg 准实时表 Y AS b TIMESTAMP AS OF concat(current_date(),00:10:00)ON a.id=b.id离线T+1DWD数据SELECT*FROM$ODS T+1离线表 where ds=current_date()-1 SET spark.sql.iceberg.time-travel-mode=exactly;SET spark.sql.iceberg.exact-read-combine-field=binlogtime;SELECT*FROM

10、$对应的Iceberg 准实时表 TIMESTAMP AS OF concat(current_date(),00:10:00)Time Travel增强:精确时间点数据查询原始数据实时湖时调度准实时 ODST+1 DWD每天调度T+h DWD2T+1DWD产出链路 直接查询Iceberg准实时表,不再依赖T+1 离线ODS表 减少了两次读写IO,和次任务调度,提升DWD数据产出效率 任务缝迁移,仅需修改ODS来源表3不同时效DWD不同任务2时效性依赖任务调度频率时级时效的DWD数据时效依赖调度任务的频率,时效性越,调度越频繁,需更多计算资源天级/时级时效的DWD数据,虽然仅仅是查询的表不同,

11、但需要开发和维护不同的任务,处理逻辑要改动时需要修改多个任务,不易维护。3Iceberg物化视图周期调度时效依赖调度任务的频率原始数据实时湖时调度准实时 ODST+h DWD2物化视图避免重复计算,提升性能时效性和基表保持致社区处于讨论阶段,未实现功能基于Iceberg实现物化视图create or replace materialized view iceberg_catalog.dwd.dwd_user_pay(user_id comment 户id,user_name comment 户名,ammout comment 付总额)comment 户付总额 tblproperties(key

12、=value)as select user_id,user_name,sum(o.payment)from iceberg_catalog.ods.order o left join iceberg_catalog.ods.user s on o.user_id=s.id group by o.user_id;CREATE OR REPLACE MATERIALIZED VIEW IF NOT EXISTS iceberg_catalog.db_name.view_name create_view_clauses AS query;创建物化视图刷新物化视图REFRESH MATERIALIZE

13、D VIEW view_name语法示例语法示例REFRESH MATERIALIZED VIEW iceberg_catalog.dwd.dwd_user_pay查询物化视图set spark.sql.iceberg.materialized.data.allow-stale=true;-是否允许过期数据,默认值为false,即永远查询最新的数据 select*from view_name;语法示例set spark.sql.iceberg.materialized.data.allow-stale=true;select*from iceberg_catalog.dwd.dwd_user_

14、pay;Iceberg物化视图-元数据件视图+存储表实现物化视图 format-version:1,view-uuid:a790fbc8-19ee-4e7c-89db-59d336d96bd1,location:“table_location”,schemas:,/表结构 current-version-id:1,versions:/版本定义 version-id:1,timestamp-ms:57,materialized:true,materialized-view-metadata:/物化视图元数据 format-version:1,storage-table-lo

15、cation:“location”,/存储表存储位置 “snapshot-id”:8093813134 ,summary:operation:create ,default-catalog:iceberg_catalog,default-namespace:ods,representations:type:“sql,/sql类型 sql:select*from db.table where id 100”/查询 dialect:spark/视图创建引擎 ,version-log:timestamp-ms:57,version-id:1 format-version:2,t

16、able-uuid:772279b5-5508-4cd1-b9a6-8d6f5516d459,location:hdfs path,current-snapshot-id:28124414,snapshots:sequence-number:1,snapshot-id:28124414,summary:operation:overwrite,spark.app.id:“application-54”,manifest-list:manifest_list_path,schema-id:0 视图元数据件$PATH_TO_VIEW/

17、metadata/xxxx.json存储表元数据depend-on-tables:/记录对应物化视图依赖的表 table-name:iceberg_catalog.ods.order_p,snapshot-id:8075546886094118526 原提交持版本设计致Iceberg物化视图-查询流程物化视图?结束开始执查询并返回结果allow stale?替换table为存储表获取存储表获取基表基表snapId最新?替换table为视图查询DDLiceberg.ods.order iceberg.ods.userbase表存储表iceberg.dwd.st_uuid查询set spark.s

18、ql.iceberg.materialized.data.allow-stale=true;select*from iceberg.dwd.dwd_user_pay;121select*from iceberg_catalog.dwd.st_uuid;2select*from(select user_id,user_name,sum(o.payment)from iceberg.ods.order o left join iceberg.ods.user s on o.user_id=s.id group by o.user_id)t;create materialized view iceb

19、erg.dwd.dwd_user_pay(user_id comment 户id,user_name comment 户名,ammout comment 付总额)comment 户付总额 tblproperties(key=value)as select user_id,user_name,sum(o.payment)from iceberg.ods.order o left join iceberg.ods.user s on o.user_id=s.id group by o.user_id;ynynynIceberg物化视图-刷新流程物化视图?结束开始报错构建insert语句DDLice

20、berg.ods.order iceberg.ods.userbase表存储表iceberg.dw.st_uuid刷新refresh materialized view iceberg.dwd.dwd_user_pay;11insert overwrite table iceberg.dw.st_uuid select user_id,user_name,sum(o.payment)from iceberg.ods.order o left join iceberg.ods.user s on o.user_id=s.id group by o.user_idcreate materializ

21、ed view iceberg.dwd.dwd_user_pay(user_id comment 户id,user_name comment 户名,ammout comment 付总额)comment 户付总额 tblproperties(key=value)as select user_id,user_name,sum(o.payment)from iceberg.ods.order o left join iceberg.ods.user s on o.user_id=s.id group by o.user_id;执并更新基表snapIdcommitIceberg物化视图原始数据实时湖准

22、实时 ODST+1 DWD准实时 DWD(物化视图)刷新视图每天调度T+1DWD产出链路 直接查询Iceberg准实时表,不再依赖离线ODS产出任务 减少了两次读写IO,和次任务调度,提升DWD数据产出效率 任务缝迁移,仅需修改ODS来源表3不同时效DWD不同任务时效性依赖任务调度频率时级时效提升分钟级时效天级/时级时效的DWD数据,虽然仅仅是查询的表不同,但需要开发和维护不同的任务,处理逻辑要改动时需要修改多个任务,不易维护。3物化视图-Time Travelcreate materialized view iceberg.dwd.dwd_user_pay(user_id comment 户

23、id,user_name comment 户名,ammout comment 付总额)comment 户付总额 tblproperties(key=value)as select user_id,user_name,sum(o.payment)from iceberg.ods.order o left join iceberg.ods.user s on o.user_id=s.id group by o.user_id;insert overwrite table hive.dwd.dwd_user_pay_p partition(ds=current_date()-1)select use

24、r_id,user_name,sum(o.payment)from iceberg.ods.order o timestamp as of concat(current_date(),00:10:00)left join iceberg.ods.user s timestamp as of concat(current_date(),00:10:00)on o.user_id=s.id group by o.user_id;DWD物化视图定义T+1d DWD产出逻辑查询的逻辑是样,仅基表需要访问精准时间点数据物化视图-Time Travel查询create materialized view

25、iceberg.dwd.dwd_user_pay(user_id comment 户id,user_name comment 户名,ammout comment 付总额)comment 户付总额 tblproperties(key=value)as select user_id,user_name,sum(o.payment)from iceberg.ods.order o left join iceberg.ods.user s on o.user_id=s.id group by o.user_id;DWD物化视图定义/设置视图查询模式为精准模式 SET spark.sql.iceberg

26、.materialized.query.mode=exactly;select*from iceberg.dwd.dwd_user_pay timestamp as of concat(current_date(),00:10:00)物化视图?结束开始执查询并返回结果exactly?替换视图定义中iceberg基表普通的查询流程1ynyn1select user_id,user_name,sum(o.payment)from iceberg.ods.order o timestamp as of concat(current_date(),00:10:00)left join iceberg.

27、ods.user s timestamp as of concat(current_date(),00:10:00)on o.user_id=s.id group by o.user_id;物化视图-Time Travelinsert overwrite table hive.dwd.dwd_user_pay_p partition(ds=current_date()-1)select user_id,user_name,sum(o.payment)from iceberg.ods.order o timestamp as of concat(current_date(),00:10:00)l

28、eft join iceberg.ods.user s timestamp as of concat(current_date(),00:10:00)on o.user_id=s.id group by o.user_id;T+1d DWD产出逻辑SET spark.sql.iceberg.materialized.query.mode=exactly;insert overwrite table hive.dwd.dwd_user_pay_p partition(ds=current_date()-1)select*from iceberg.dwd.dwd_user_pay timestam

29、p as of concat(current_date(),00:10:00 对物化视图进Time Travel查询 然后保存新数据架构原始数据准实时湖准实时 ODST+1d DWDDWD物化视图刷新视图分钟级的DWD数据的时效计算逻辑完全统增量模型100%使新架构存量模型可迁移40%+T+1d ODS原始数据准实时湖时调度准实时 ODST+1d DWD每天调度T+h DWD每天调度每天调度凌晨调度任务数减少50%+离线DWD任务总时间减少30%+03未来规划未来规划134贡献社区存量DWD任务100%迁移Agent of warehouse2使Iceberg新特性Branch/Puffin感谢观看

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(祝佳俊_网易严选_datafun_20240309.pdf)为本站 (stock) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
会员动态
会员动态 会员动态:

183**14... 升级为标准VIP   wei**n_... 升级为至尊VIP

微**...  升级为高级VIP wei**n_... 升级为至尊VIP  

Be**en 升级为至尊VIP   微**... 升级为高级VIP 

 186**86...  升级为高级VIP Ji**n方... 升级为至尊VIP 

188**48...  升级为标准VIP wei**n_...  升级为高级VIP

iam**in... 升级为至尊VIP  wei**n_... 升级为标准VIP 

 135**70...  升级为至尊VIP  199**28... 升级为高级VIP

 wei**n_...  升级为至尊VIP wei**n_... 升级为标准VIP

wei**n_...  升级为至尊VIP   火星**r... 升级为至尊VIP

139**13...  升级为至尊VIP  186**69...  升级为高级VIP

157**87...   升级为至尊VIP  鸿**...  升级为至尊VIP

 wei**n_... 升级为标准VIP  137**18...  升级为至尊VIP

wei**n_... 升级为至尊VIP wei**n_... 升级为标准VIP 

139**24...  升级为标准VIP 158**25...  升级为标准VIP

 wei**n_... 升级为高级VIP   188**60... 升级为高级VIP 

 Fly**g ... 升级为至尊VIP  wei**n_...  升级为标准VIP

186**52...  升级为至尊VIP  布**   升级为至尊VIP

186**69... 升级为高级VIP  wei**n_...  升级为标准VIP

139**98...  升级为至尊VIP 152**90...  升级为标准VIP

 138**98... 升级为标准VIP  181**96... 升级为标准VIP

 185**10... 升级为标准VIP wei**n_...  升级为至尊VIP 

  高兴 升级为至尊VIP wei**n_...  升级为高级VIP

 wei**n_... 升级为高级VIP  阿**... 升级为标准VIP 

wei**n_... 升级为高级VIP   lin**fe... 升级为高级VIP

wei**n_... 升级为标准VIP   wei**n_... 升级为高级VIP 

wei**n_... 升级为标准VIP wei**n_...  升级为高级VIP

wei**n_... 升级为高级VIP   wei**n_... 升级为至尊VIP

 wei**n_... 升级为高级VIP wei**n_... 升级为高级VIP

180**21...  升级为标准VIP  183**36...  升级为标准VIP 

 wei**n_... 升级为标准VIP  wei**n_...  升级为标准VIP

 xie**.g...  升级为至尊VIP 王**  升级为标准VIP 

172**75...  升级为标准VIP wei**n_... 升级为标准VIP

wei**n_... 升级为标准VIP wei**n_...  升级为高级VIP

 135**82... 升级为至尊VIP  130**18... 升级为至尊VIP

 wei**n_... 升级为标准VIP  wei**n_...  升级为至尊VIP

wei**n_...  升级为高级VIP 130**88... 升级为标准VIP 

 张川 升级为标准VIP  wei**n_... 升级为高级VIP 

叶** 升级为标准VIP wei**n_...  升级为高级VIP 

138**78... 升级为标准VIP  wu**i 升级为高级VIP 

 wei**n_...  升级为高级VIP  wei**n_... 升级为标准VIP

wei**n_... 升级为高级VIP 185**35... 升级为至尊VIP 

wei**n_... 升级为标准VIP    186**30... 升级为至尊VIP

 156**61... 升级为高级VIP 130**32...  升级为高级VIP 

 136**02...  升级为标准VIP wei**n_...  升级为标准VIP

133**46...  升级为至尊VIP wei**n_... 升级为高级VIP

 180**01... 升级为高级VIP 130**31...  升级为至尊VIP

 wei**n_... 升级为至尊VIP 微**...  升级为至尊VIP

 wei**n_... 升级为高级VIP wei**n_...  升级为标准VIP

 刘磊 升级为至尊VIP   wei**n_... 升级为高级VIP 

班长 升级为至尊VIP   wei**n_... 升级为标准VIP