《5. Apache Doris 在小米大数据场景的应用实践.pdf》由会员分享,可在线阅读,更多相关《5. Apache Doris 在小米大数据场景的应用实践.pdf(24页珍藏版)》请在三个皮匠报告上搜索。
1、Apache Doris 在小米大数据场景的应用实践魏 祚小米 数据库研发工程师Apache Doris PMC 成员Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202
2、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023目录2.Apache Doris 在小米的应用实践3.Apache Doris 在小米的优化实践1.小米 OLAP 的选型历史和应用现状4.Doris 在小米的未来规划Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
3、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
4、mit Asia 2023Doris Summit Asia 20231小米 OLAP 的选型历史和应用现状Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Dori
5、s Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023小米 OLAP 选型历史 在小米A/B实验场景,Doris 向量化版本(1.1.2版本)相比 Doris 0.13 非向量化版本的查询性能整体提升超过了1倍。其他部分场景查询性能提升达到35倍。Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
6、mit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D
7、oris Summit Asia 2023Doris Summit Asia 2023Doris 的优势 物化视图/Rollup加速 支持丰富的索引 向量化引擎.分布式能力强 扩/缩容操作方便 不依赖外部组件.支持标准SQL.社区活跃 商业化公司主导社区发展.Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asi
8、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米的应用现状 Doris在小米内部主要服务于BI看板和报表分析的业务场景。小米内部支持了数百个业务,支持的核心业务有数十个。集群数量有
9、数十个,机器规模达数百台。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D
10、oris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米的应用现状 单集群最大规模Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asi
11、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20232Apache Doris 在小米的应用实践Doris Summit Asia 2023Doris Summit Asia 2023Doris Su
12、mmit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023
13、Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米BI平台的应用实践 Doris在小米最重要的使用场景之一是作为BI平台的数据源。BI平台底层支持多种数据源:Mysql、Doris、Hive、Iceberg、Execl、Csv、飞书表格。通过SQL或拖拽组件创建看板,支持自定义指标、维度。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
14、2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米BI平台的应用实践Doris Summit As
15、ia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris S
16、ummit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米用户行为分析平台的应用实践 数据来源于各业务在网页或APP上的埋点数据。用户在网页或APP上的各种操作都会抽象成事件实体。基于事件进行建模实现用户行为分析。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D
17、oris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米用户行为分析平台的应用实践留存分析SELECT reten
18、tion_count(c.retention_info)FROM(SELECT distinct_id ,retention_info(00,day,timestamp,CASE WHEN event_name=view THEN 1 ELSE 0 END|CASE WHEN event_name=buy THEN 2 ELSE 0 END)AS retention_info FROM retention_analysis_test WHERE timestamp=00 GROUP BY distinct_id)c;Doris Summit Asia
19、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
20、mit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米用户行为分析平台的应用实践漏斗分析SELECT funnel_count(c.funnel_info)FROM(SELECT distinct_id ,funnel_info(00,604800000,CASE WHEN event_name=view THEN 1 ELSE 0 END|CASE WHEN event_name=open THEN 2 ELSE 0 END|CASE WHEN eve
21、nt_name=buy THEN 4 ELSE 0 END,timestamp)AS funnel_info FROM funnel_analysis_test WHERE timestamp=00 GROUP BY distinct_id)c;Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit
22、 Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023小米 Doris 数据作业管理实践小米数据生态Doris数据写入Doris Summit Asia 2023Doris Summit Asi
23、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Su
24、mmit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023小米Doris数据作业管理实践表数据原子更新分区数据原子更新Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A
25、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20233Apache Doris 在小米的优化实践Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
26、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20
27、23支持 Flink Exactly-Once 语义 问题:Doris 不支持 Stream Load 两阶段提交,在 Flink 重启之后,数据可能会出现重复写入。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2
28、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023支持 Flink Exactly-Once 语义 优化:优化数据写入事务流程,增加事务预提交状态,支持Stream Load两阶段提交。Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
29、mit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D
30、oris Summit Asia 2023Doris Summit Asia 2023支持 Flink Exactly-Once 语义 结果:通过 Stream Load 两阶段提交支持 Flink Exactly-Once 语义,保证多个 Stream Load 任务原子性。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris S
31、ummit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023支持单副本数据写入能力 副本角色:1Master副本+2 Slave副本。Master副本执行排序、聚合、编码、压缩等耗费资源的操
32、作,刷写文件。Slave副本同步Master副本文件,保证数据高可用。降低CPU和内存的使用量。三副本数据写入单副本数据写入 副本角色:3副本地位相同。3副本同时执行完全相同的排序、聚合、编码、压缩等操作,刷写文件。CPU和内存的使用量高。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Dori
33、s Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023支持单副本数据写入能力 高并发写入场景:数据写入作业执行效率提升1.5倍。单任务场景:内存使用量节省2/3,CPU使用量节省1/3。Doris Summit Asia
34、2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summ
35、it Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20234Doris 在小米的未来规划Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
36、mit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris 在小米的未来规划 引入社区 Apache Doris 新版本,支持租户隔离能力。开发元数据和用户使用监控平台,支持精细化服务监控和治理能力。Doris Summit Asia 2023Doris Summit
37、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
38、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023获取更多社区动态与最佳实践Doris Summit 峰会官网:doris- Doris Summit 峰会回放:https:/ Doris 官网:doris.apache.orgApache Doris GitHub: Doris 官方平台:Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023