上海品茶

您的当前位置:上海品茶 > 报告分类 > PDF报告下载

(已压缩)Apache Doris in 2023 与创新者同行.pdf

编号:155478 PDF 44页 5.96MB 下载积分:VIP专享
下载报告请您先登录!

(已压缩)Apache Doris in 2023 与创新者同行.pdf

1、Apache Doris in 2023与创新者同行衣国垒Apache Doris PMC 成员、飞轮科技技术副总裁Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023

2、Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023目录2.技术进展回顾3.走向实时分析的下一步1.Apache Doris in 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A

3、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Apache

4、 Doris in 2023重要版本迭代繁荣的社区生态用户规模1Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Dori

5、s Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023回顾 Apache Doris 的过去版本,一直在加速进化0.14 版本2021.052021.0559位Contributor增加ODBC外部表和SQL Cache等特性0.15 版本2021.112021.1199位Contributor引入Runtime Filter和 Join Reorder,查询性能显著提升1.0 版本2022.0

6、42022.04114位Contributor首次引入向量化执行引擎支持外部表访问Hive1.1 版本2022.072022.0790位Contributor全面引入向量化,查询性能在原有基础上提升2-3倍,支持Iceberg外部表;1.2 版本2022.122022.12118位Contributor,2400+项Commits向量化执行引擎全面启用,性能再提升3-11倍;支持Merge-on-write,更新查询效率提升3-6倍;引入Multi-Catalog,统一的数据湖对接框架;毫秒级Schema Change操作,自动同步DDL;Doris Summit Asia 2023Doris

7、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2

8、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20239000350Contributorsn 1.2 版本n 2.0 版本n 1.1 版本(参考)45024004030003500400045005000Commits 引入自适应的并行执行模型和全新查询优化器,盲测性能提升10倍,多表关联提升13倍,单表场景提升10倍、高并发点查询提升20倍;从报表和Ad-hoc等典型OLAP场景拓展到湖仓一体、高并发数据服务以及日志检索

9、与分析,支撑更统一多样的分析场景;支持实时数据高吞吐写入、秒级时延,对各类数据更新都有完备的支持,构建更高效易用且稳定的实时数据处理和分析链路;2.0 版本在 Apache Doris 发展历程中具有重要的里程碑意义:统计周期:2022.10-2023.1070.8%70.8%133.1%133.1%2023 年,Apache Doris 全面进入 2.0 时代Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Su

10、mmit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023统计周期:2022.10-2023.10建立

11、了更加成熟稳定的版本迭代机制2022.122023.022023.032023.042023.052023.062023.072023.082023.092023.10Apache Doris 1.21.2.01.2.11.2.21.2.31.2.41.2.52.0 Alpha2.0 Beta2.0.0Apache Doris 2.0 2.0.12023.112.0.2更稳定的版本体验:经历 Alpha、Beta 两个验证性版本并经过大规模邀测后,2.0 版本正式 GA,版本稳定性更加契合企业用户生产环境的要求;周期性发版:在大版本基础上以每月1个小版本的节奏稳定迭代,共发布 9 个小版本,累

12、计优化功能及修复问题超过 1969 个;20223.011.2.61.2.7Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20

13、23Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023重要版本迭代繁荣的社区生态用户规模Apache Doris in 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2

14、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023更加繁荣的社区生态,更大的开发者规模和更高的开发者活跃度社区贡献者

15、规模稳步增长,活跃贡献者数稳居全球前列 Contributor数量增长至 574人,较去年增长 67%近一年活跃贡献者规模稳居全球开源大数据项目第一位;统计周期:2022.10-2023.10Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris

16、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023更加繁荣的社区生态,更大的开发者规模和更高的开发者活跃度StarsForksCommitsIssuesl 2022年度l 2023年度9.8k5.8k2.8k1.8k14k6.8k6.1k3.0k GitHub St

17、ar 较去年增长 49%,增速位列全球数据库前五;近半年平均周合入Commits160+,活跃度位居OSSRank前四;Fork 较去年同期增长55%,更多开发者开始加入社区开发;平均PR/Issue相应周期较去年提速300%;Stars增长69%Commits 增长106%每周PRs160+活跃度全球Top4统计周期:2022.10-2023.10与去年同期相比,社区开发者活跃度全面提升Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asi

18、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023更加繁荣的社区生

19、态,更大的开发者规模和更高的开发者活跃度统计周期:2022.10-2023.10 贡献者来源更加多元化,覆盖数十行业的 100 余家企业 国内顶尖云厂商投入共建,相关产品几乎覆盖国内外所有主流云平台贡献者来源更加多元化,国内顶尖云厂商纷纷投入共建Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D

20、oris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023重要版本迭代繁荣的社区生态用户规模Apache Doris in 2023Doris Summit Asia 2023Doris Summit Asia 2023

21、Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A

22、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Apache Doris 已成为开源实时数据仓库领域的事实标准!企业用户规模已超过 4000 家,在众多中大型企业的核心分析业务中得到广泛应用。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023

23、Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20232我们如何应对实时分析的挑战Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023

24、Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A

25、sia 2023Doris Summit Asia 2023当我们重新思考 Apache Doris 的定位0202 融合统一0101 实时分析0303 云原生化Open-source Data Warehouse for Real-time Analytics在一套系统中提供对多种分析负载的支持,简化复杂架构带来的运维使用成本在大规模实时数据上实现极致的查询性能实时高吞吐写入实时存储与更新极致查询性能存储计算分离多计算集群面向云计算基础设施进行革新,利用云的极致弹性降低存储和计算成本K8s 容器化部署弹性扩缩容湖仓联邦分析日志存储与检索分析ETL/ELT加速高并发数据服务报表分析即席分析Do

26、ris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asi

27、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20232023 年聚焦开发方向关键词极致查询性能高可用/低成本自适应的并行执行模型基于代价的查询优化器多维度快速检索高并发点查询能力关键词数据实时写入/更新主键模型写时合并完备的数据更新支持导入性能优化更多分析场景支持湖仓一体 Lakehouse更高性价比的日志分析平台高并发数据服务场景冷热数据分层跨集群数据同步多租户资源隔离Doris Summit Asia 2023Doris Summit Asia 2023D

28、oris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit As

29、ia 2023Doris Summit Asia 2023Doris Summit Asia 2023技术进展回顾查询性能更多分析场景高可用、低成本实时写入/更新Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202

30、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023全新查询优化器与 Pipeline 并行执行模型小查询大查询单一集群高负载查询Pipeline并行执行模型大小查询都可以分配到CPU资源执行计算结果,混合负载下查询性能更高资源队列、查询排队资源组调度灵活的CPU调度策略并行化改造与资源池化Wor

31、kload Group 避免资源抢占阻塞操作异步化减少人工调优查询性能更高执行可观测CBO查询优化器基于代价的执行计划选择外部场景RF 选择Join ReorderPushdown复杂SQLDoris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris S

32、ummit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023在多表关联和单表场景下均取得10倍以上的查询性能提升8.030.80SSB-Flat 单表查询(s)223.3316.850500300TPC-H 多表关联查询(s)n 1.

33、1 版本n 2.0 版本n 0.15 版本n 1.2 版本 3 台 16 Core、64GB 云主机测试,SF100 多表关联复杂查询场景 Doris 2.0 性能相比 Doris 0.15 提升 13 倍,相比其他的MPP 数据库有明显优势 单表场景 Doris 2.0 性能相比 Doris 0.15 提升 10 倍,相比擅长单表的 ClickHouse 更有优势;在同等规模数据量和机器配置下,性能超越一众同类项目:查询性能提升1010倍查询性能提升1313倍Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D

34、oris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit As

35、ia 2023Doris Summit Asia 2023点查询并发能力提升20倍,单节点 3w+QPS0000000250003000035000QPS17.20.6024680查询延迟(s)n开启优化后n 开启优化前 引入行列混存,解决 IOPS 瓶颈;引入 PrepareStatement 解决FE 模块高并发下的解析规划瓶颈;引入点查询短路径优化,跳过执行引擎和查询优化器对于简单查询的框架开销;面向高并发 Data Serving 场景的优化方案,主键点查询能力提升 20 倍Doris Summit Asia 2023D

36、oris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit As

37、ia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023多维度检索并发性能最高提升90倍0.092.454.60.080.410.970.060.040.060123451 并发50并发100并发 在关键字模糊查询、等值查询和范围查询等场景中均取得了显著的查询性能和并发能力提升;开启倒排索引前,并发量的提升带来查询耗时的大幅上升,开启倒排索引后始终保持毫秒级;n 1.1 版本n 2.0 版本n 1.2 版本引入倒排索引来应对多维度快速检索的需求,在真实业务场景中取得性能最高90倍提升:Doris Summ

38、it Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Do

39、ris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023技术进展回顾查询性能更多分析场景高可用、低成本实时写入/更新Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Su

40、mmit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023 对导入过程中 MemTable 的攒批、排序和落盘等流程进行优化,增加 memtable 并行下刷,提高了上下游之间数据传输的效率;引入“单副本导入”的数据分发模式,多副本数

41、据导入时无需在多个 BE 上重复进行排序,直接复制主副本就行,有效提升集群计算和内存资源的利用率,提升导入的总吞吐量,导入性能提升2-8倍;870200030004000500060007000800090007.5G/1Tablet150G/48Tablet96G/960Tabletn 2.0 版本n 1.2 版本实时写入性能提升2-8倍Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2

42、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023主键模型写时合并,实现

43、高效的数据更新0200040006000800040000000q1.1 q1.2 q1.3 q2.1 q2.2 q2.3 q3.1 q3.2 q3.3 q3.4 q4.1 q4.2 q4.3merge-on-writemerge-on-readms 在写入时通过标记删除做轻量级 Merge,从而提高写入和查询效率;执行 upsert 写入操作,单节点实现 40w 行/s的峰值吞吐;引入写时合并模式,实现小批量数据的实时更新,查询性能提升5-10倍:Doris Summit Asia 2023Doris Summit Asia 2023Doris S

44、ummit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202

45、3Doris Summit Asia 2023Doris Summit Asia 2023主键模型写时合并,实现高效的数据更新对于各种类型的数据更新都有完备的支持Upsert条件更新条件删除部分列更新分区覆盖UPDATE test SET v1=v1+1 WHERE k1=1 UPDATE t1 SET t1.c1=t2.c1,t1.c3=t2.c3*100 FROM t2 INNER JOIN t3 ON t2.id=t3.id WHERE t1.id=t2.id;DELETE FROM my_table PARTITIONS(p1,p2)WHERE k1=3 AND k2=abc;DELE

46、TE FROM t1 USING t2 INNER JOIN t3 ON t2.id=t3.id WHERE t1.id=t2.id;Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit

47、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023查询性能更多分析场景高可用、低成本实时写入/更新技术进展回顾Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2

48、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023更加开放的湖仓一体解决

49、方案 湖仓查询加速:为数据湖、Elasticsearch 以及各类关系型数据库提供优秀的查询加速能力,相比 Hive、Presto、Spark 等查询引擎实现数倍的性能提升。数据导入与集成:基于可扩展的连接框架,增强 Apache Doris 在数据集成方面的能力,让数据更便捷的被消费和处理。用户可以通过 Apache Doris 对上游的多种数据源进行统一的增量、全量同步,并利用 Apache Doris 的数据处理能力对数据进行加工和展示,也可以将加工后的数据写回到数据源,或提供给下游系统进行消费。统一数据分析网关:利用 Apache Doris 构建完善可扩展的数据源连接框架,便于快速接

50、入多类数据源。提供基于各种异构数据源的快速查询和写入能力,将 Apache Doris 打造成统一的数据分析网关。可扩展的数据源连接框架和丰富的数据源支持,查询性能较 Trino/Presto 提升 3-10 倍,构建湖仓一体更得心应手:Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris

51、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023更高性价比的日志检索分析平台5倍写入吞吐提升利用CPU向量化指令,提升数据解析、构建索引的性能简化去掉正排等索引结构,降低构建索引开销80%存储成本降低简化去掉正排等索引

52、结构,减少倒排索引数据量30%列式存储与ZSTD压缩算法,提供5-10倍压缩比冷热分层,降低冷数据存储成本60%稳定性提升基于资源队列的隔离机制,解决负载间相互影响异常查询Kill机制,避免单个查询影响整个集群中间数据落盘,支持大查询内存不足运行失败00500600写入速度(MB/s)0510152025存储空间(GB)nApache DorisnElasticsearch0500查询耗时(s)Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit

53、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris

54、 Summit Asia 2023更精细化的多租户与资源隔离方案细化到进程内的资源隔离,实现更精细化的资源分配和调度,避免进程内的资源冲突和抢占;支持设置资源组优先级与超限使用,保证隔离性的同时实现了资源的充分利用;通过查询队列和任务排队机制进一步保证了在高工作负载场景下的系统稳定性。Resource Tag 资源硬隔离Workload Group 资源软限制基于 Workload Group 实现了更加精细化的资源隔离方案:Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2

55、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summi

56、t Asia 2023查询性能更多分析场景高可用、低成本实时写入/更新技术进展回顾Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia

57、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023基于对象存储实现冷热数据分层,存储成本降低70%根据将冷热数据分别存储在成本不同的存储介质上,从原本的 SSD-HDD 增加到 SSD-HDD-OS 三层;云磁盘的价格通常是对象存储的 5-10 倍,如果可以将 80%的冷数据保存到对象存储中,存储成本至少可降低 70%;通过冷数据 Compaction 实现数据的高效压缩,

58、提供冷数据 Cache 加速冷热数据查询,降低成本的同时保持性能不受影响;Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202

59、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023支持部署在公有云/私有云/K8s上Doris-operator(2.1)fe controllerbe controllercn controllercn controllerDeploy frontendsFe pod1Deploy backendsDeploy compute nodesFe pod2Fe pod3Fe pod1

60、Fe pod2Fe pod3autoscaleDeploy compute nodes支持 FE/BE/Compute Node/Broker 所有组件的部署、扩容、缩容、健康检查等运维工作支持 Compute Node 根据机器负载自动扩容缩容;支持 Prometheus 监控;支持服务滚动升级;支持多种persistent Volume的管理能力;更简易的K8s 部署运维模式Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202

61、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023跨集群数据复制,集群可用性

62、保证的利器容灾备份:将企业的数据备份到另一个集群与机房中,当突发事件导致业务中断或丢失时,可以从备份中恢复数据或快速进行主备切换。一般在对 SLA 要求比较高的场景中,都需要进行容灾备份,比如在金融、医疗、电子商务等领域中比较常见。读写分离:读写分离是将数据的查询操作和写入操作进行分离,目的是降低读写操作的相互影响,保证数据库的性能及稳定性。隔离升级:当对集群升级时,为了避免不兼容和未知Bug,提前构建备集群进行双跑验证。性能数据:最快可以做到分钟别数据延时,数据同步速度可以触及网卡和磁盘上限。Doris Summit Asia 2023Doris Summit Asia 2023Doris

63、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20

64、23Doris Summit Asia 2023Doris Summit Asia 20233走向实时分析的下一步Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D

65、oris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023实时分析极致分析性能实时写入实时更新写入语义统一,All is Relation!写入更加便捷Insert into select 统一多种写入方式主键模型优化MySQL协议HTTP协议Kafka LoadS3/HDFS/数据湖更简易、更稳定、更高性能的实时数据写入服务端攒批,毫秒级实时Mem

66、table下刷前置内置流式TP数据库同步写时合并/灵活数据更新行列混存更加通用的查询prefix scan更加智能的Cache机制Batch read行级cacheBlock级cache淘汰策略优化自适应的查询引擎多表物化视图全量/增量构建透明改写聚合改写与上卷智能运维小查询性能提升Join速度提升2倍数据自适应大查询落盘全自动统计信息收集TPC-DS 性能丰富hint语法Union 算子并行执行Key列与排序列分离,任意列排序建表制定Cluster by写时合并默认开启灵活的任意列更新基于MOW的数据模型统一Doris Summit Asia 2023Doris Summit Asia 20

67、23Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit

68、 Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023服务端攒批,高并发写入更快更稳定服务端自动攒批,毫秒级高并发写入写入性能进一步提升2倍引入memtable前移,简化导入路径、节省RPC序列化、压缩、解压开销。更稳定,降低高并发写入的小文件问题;更智能,减少写入端攒批带来的维护成本;在8c16g单机上使用服务端攒批,写入吞吐可以达到10w行/s,秒级数据可见;Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Do

69、ris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asi

70、a 2023多表物化视图+内置调度,加速查询、简化数据建模与加工逻辑加速多表关联查询、简化数据建模空间换时间,通过预计算在高频查询或代价昂贵的查询中显著提升查询速度;提供数据仓库分层建模、透明改写能力,减少查询加速的人工成本;内置任务调度模块,减少外部依赖轻量级 ETL 任务编排简化数据加工逻辑Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20

71、23Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Profile全面优化,更易读、可视化展示、动态更新 Profile 很大(几MB几十MB),几百上千行,FE

72、内存占用太多;跟查询执行逻辑强绑定,用户理解成本过高,不便分析性能瓶颈;算子的详细信息 合并的 Instance 的统计信息 更加贴近用户的指标,OutputRows,TotalTime 可视化的展示方式,更容易发现瓶颈 根据执行过程动态更新Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Dor

73、is Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023融合统一灵活的混合负载管理倒排索引优化Count、Join等加速自适应选择索引Variant类型更加彻底的Schemafree任意嵌套JSON子字段类型变化嵌套结构变

74、化自动拆分子列支持倒排索引高速数据读取数据直接从BE 传递到Pandas 客户端列式数据传输,数据吞吐性能提升100倍!支持更多数据类型arraymapvariantgeo更加通用查询加速多表mv+内置调度,简化数据建模与加工逻辑数据导出与写回,实现数据加工闭环Parquet/ORC文件Hive/Iceberg/Hudi/PaimonRDBMS异步构建mv,实现分层建模/透明改写/查询加速轻量级ETL作业编排与数据加工通过SQL 来创建和管理Workload Group,调整资源配置;一个集群可以建立多个 Workload Group;可实现查询并发控制、排队,计算资源软限/硬限管理数据湖分析

75、负载管理日志分析与检索Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Dor

76、is Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20231.12214.790500300数据吞吐性能对比(s)高速数据读取,数据吞吐提高100倍 MySQL 具有良好兼容性和广泛的工具支持,但在数据科学、大规模数据导出场景,FE 容易成为瓶颈、文本协议效率差;引入 Arrow Flight 实现高速数据读取、数据直接通过 BE 传递到 Pandas 客户端,列式数据传输;Pandas 测试数据吞吐提升 100 倍!

77、Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A

78、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023湖仓加速:高速的湖上数据查询加速;统一数据分析网关:各类异构数据源的查询和写入能力;统一数据集成:多数据源的数据同步、加工处理、数据导出;统一的湖仓分析和数据集成Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit

79、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Schema less 的 Variant 类型CRE

80、ATE TABLE IF NOT EXISTS$table_name(k bigint,v variant)DUPLICATE KEY(k)DISTRIBUTED BY RANDOM BUCKETS 5 properties(replication_num=1);/insert into 写入或是stream loadinsert into$table_name values(1,a:1)/查询需要带上cast,存储层尽可能消除cast加速查询select cast(v:a as int)from$table_name where cast(v:a as int)1Schema Less可以支

81、持任意类型、任意形状的json格式文档数据自动动态地处理列增加、类型变更不需要繁琐的DDL操作以及Schema Change操作高性能根据数据类型自动推断类型进行列式存储与普通字段一样的查询效率Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris

82、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023云原生化,存储计算分离,实现极致弹性云服务层应用系统/客户端读写读写仓库1(Warehouse 1)仓库2(Warehouse 2)对象存储(S3/OSS/COS/OBS)对象存储(S3/OSS/COS/OBS)读

83、写读写读写读写读写读写集群2(Cluster 2)计算缓存(vCPU,RAM)(Cache)集群1(Cluster 1)计算缓存(vCPU,RAM)(Cache)集群3(Cluster 3)计算缓存(vCPU,RAM)(Cache)集群4(Cluster 4)计算缓存(vCPU,RAM)(Cache)集群5(Cluster 5)计算缓存(vCPU,RAM)(Cache)集群6(Cluster 6)计算缓存(vCPU,RAM)(Cache)计算集群层共享存储层 在 2.1 版本中完成代码结构调整,在 2.2 版本正式面向社区可用;多计算集群元数据与数据共享,计算负载隔离计算弹性扩缩容手动/自动扩

84、缩容,集群自动启停共享存储与本地缓存共享存储系统,热数据本地手动/自动缓存Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202

85、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit

86、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023获取更多社区动态与最佳实践Doris Summit 峰会官网:doris- Doris Summit 峰会回放:https:/ Doris 官

87、网:doris.apache.orgApache Doris GitHub: Doris 官方平台:Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文((已压缩)Apache Doris in 2023 与创新者同行.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
会员购买
客服

专属顾问

商务合作

机构入驻、侵权投诉、商务合作

服务号

三个皮匠报告官方公众号

回到顶部