上海品茶

您的当前位置:上海品茶 > 报告分类 > PDF报告下载

2017年新一代数据仓库-Apache HAWQ.pdf

编号:92461 PDF 34页 13.69MB 下载积分:VIP专享
下载报告请您先登录!

2017年新一代数据仓库-Apache HAWQ.pdf

1、Copyright 2016.All rights reserved新一代数据仓库:HAWQCopyright 2016.All rights reserved目录公司简介HAWQ成功案例Copyright 2016.All rights reserved数据生态系统应用用户行为分析、反欺诈、用户画像、信用模型BIQlik,PowerBI分析挖掘机器学习/AISAS,SPSS,TensorflowETLInformaticaTalendKettleOLAP数据仓库数据仓库(Data Warehouse)MPP,SQL-on-Hadoop,NewDataWarehouse数据治理数据安全OLTP

2、关系数据库,NoSQL,NewSQL全球数据仓库市场规模2016年达数百亿美金Cloud(公有云和私有云)9202507072781027052.50%49.04%57.91%53.54%43.55%41.11%0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%0200040006000800000020?535362626.47%63.01%46.87%41.53%67.96%58.69%45.36%0%20%40%6

3、0%80%0500000020?Copyright 2016.All rights reserved数据库:55年Database:1962年出现InvertedFileDatabaseSystemSystemDevelopmentCorporation数据库的几个阶段1960s:NavigationalDBMS(网状&层次模型)IntegratedDataStore(IDS)InformationManagementSystem(IMS)1970s-1990s:SQL/RelationalDBMSOLTP,Datawareho

4、use,MPP2000s-Present:PostRelationalNoSQL(XML,KV,Graph,Tree),NewSQL,NewDWCopyright 2016.All rights reserved数据库的核心 数据模型&查询语言 查询优化和执行 索引与存储 事务处理Copyright 2016.All rights reserved关系模型EdgarF.Codd1981 TuringAwardJimGray1998 TuringAwardMichaelStonebraker2014 TuringAward找出住在Harrison的所有客户Select customer_name

5、FromcustomerWherecustomer_city=Harrison;A Relational Model of Data for Large Shared Data Banks.Copyright 2016.All rights reservedGraph/Tree/KV模型Key-ValueCassandra:CQLHBase:APIGraphModelNeo4jGiraph/PregelTreeXMLDatabaseMongoDBStreamingCopyright 2016.All rights reserved其他分类方法 事务处理 vs 分析分析处理处理 并行 vs 串行

6、 硬件:CPU vsGPU vsFPGAvsMemory 云数据库 vs 非云数据库?Copyright 2016.All rights reserved数据仓库的演进MPPDB实例2DB实例1DB实例4DB实例3磁盘磁盘磁盘磁盘share-nothing硬件/软件架构传统数仓传统数仓DB实例2DB实例1DB实例4DB实例3share-storage硬件/软件架构共享存储新一代数仓(New Data Warehouse)DB实例2DB实例1DB实例3分布式文件系统share-nothing硬件架构+软件实现distributed shared-storage磁盘磁盘磁盘硬件配置架构可扩展性缺乏

7、弹性不易调整大多工业标准的x86服务器面向传统BI分析复杂的计算需求几十个节点工业标准的x86服务器面向大数据和人工智能支持数据湖弹性伸缩,支持CaaS平台灵活配置上千个节点缺乏弹性不易调整适用场景大多专有硬件平台面向传统的BI分析十几个节点Oracle,DB2Teradata,Vertica,Greenplum,RedshiftHive,HAWQ,SparkSQL,Snowflake数仓代表Copyright 2016.All rights reserved数据仓库引擎比较开源开源&开放开放&线性可线性可扩展扩展私有软件&闭源&非线性可扩展受限的性能受限的性能及及SQL兼容性兼容性高性能及高

8、性能及SQL兼容性兼容性SQLAmazonAthenaCopyright 2016.All rights reservedNewDW的细分类别 SQLonHadoop SparkSQL,Hive,HAWQ2.x,Presto SQLonObjectStore Snowflake(onS3),AmazonAthena(onS3)Hybrid:有自己的存储,对外部存储可插拔 HAWQ3.x,Oushu Database ImpalaCopyright 2016.All rights reservedNewDW特性比较SQLonHadoopSQL onObjectStoreSQL onHybridS

9、torageFeaturesHiveSparkSQLPrestoSnowflakeAthenaHAWQOushuImpala性能lowmiddlelowlowlowhightopmiddle可扩展性highhighhighhighhighhighhighhighUpdate/DeletebadN/AN/AweakN/AN/AGoodweak索引badN/AN/AN/AN/AN/AYesweakSQL兼容性middlemiddlebadmiddlebadgoodgoodmiddle高并发查询nonononononoyesnoCopyright 2016.All rights reservedHA

10、WQCopyright 2016.All rights reservedApache HAWQ 发展历程2011年-常雷博士在EMC/Pivotal提出创意,HAWQ项目启动。2013年-HAWQ 1.0发布,性能是Hive的数百倍。2014年-HAWQ SIGMOD论文发表,得到国际数据库界认可。2014年-HAWQ为全球多家大型企业客户采用。2015年-HAWQ开源成为Apache项目。2016年-常雷博士及HAWQ核心团队创立偶数科技。2017年-偶数得到国际顶级VC投资,致力于HAWQ的发展。2017年-Oushu Database3.0企业版本发布,全新执行器,世界上最快的数据仓库H

11、AWQ主要发展历程10倍倍性能提升性能提升Copyright 2016.All rights reservedGreenplum database(2003)replicationPrimarySegmentSegment hostMaster hostInterconnectPrimarySegmentMirrorSegmentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMi

12、rrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentreplicationData/CatalogData/CatalogDegree of Parallelism=8#Segment Per Node=4Copyright 2016.All rights reservedHAWQAlpha:Greenplum DatabaseonHDFS(2011)PrimarySegmentSegment hostMaster hostInterconnectPrimarySegmentMirrorSeg

13、mentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentNamenodeBreplicationRack1Rack2DatanodeDatanodeDatanodeMeta OpsDatanodereplicationrepl

14、icationCatalogCatalogDataDegree of Parallelism=8#Segment Per Node=4Issues:Recovery complexityExpansion complexityManagement complexity(many segments per node)Fixed Degree of ParallelismCopyright 2016.All rights reservedHAWQ1.0GAArchitecture(2013)SegmentSegment hostMaster hostInterconnectSegmentSegme

15、nt hostSegment hostSegment hostNamenodereplicationRack1Rack2DatanodeDatanodeDatanodeMeta OpsDatanodeDataStatelessDegree of Parallelism=8#Segment Per Node=2SegmentSegmentSegmentSegmentSegmentSegmentCatalogIssues:Recovery complexityExpansion complexityManagement complexity(many segments per node)Fixed

16、 Degree of ParallelismCopyright 2016.All rights reservedHAWQ2.0:ArchitectureChange(2016Q2)SegmentSegment hostMaster hostInterconnectSegment hostSegment hostSegment hostNamenodeRack1Rack2Meta OpsCatalogStatelessDegree of Parallelism=Any(#vseg)#Segment Per Node=1ResourceManagervsegvsegvsegvsegSegmentv

17、segvsegvsegvsegSegmentvsegvsegvsegvsegSegmentvsegvsegvsegvsegreplicationDatanodeDatanodeDatanodeDatanodeDataIssues:Recovery complexityExpansion complexityManagement complexity(many segments per node)Fixed Degree of Parallelism世界上第一个世界上第一个和和PaaS/Docker云平台原生云平台原生结合的结合的并行并行SQL引擎引擎Copyright 2016.All rig

18、hts reservedHAWQ+3.0:HornetExecutionEngine(2017Q3)SegmentSegment hostMaster hostInterconnectSegment hostSegment hostSegment hostNamenodeRack1Rack2Meta OpsCatalogStatelessHornet Execution Engine:SIMD/New hardwareResourceManagervsegvsegSegmentvsegvsegSegmentvsegvsegSegmentvsegvsegreplicationDatanodeDa

19、tanodeDatanodeDatanodeDataHornetHornetHornetHornet10times fasterThe Fastest Engine in the WorldCopyright 2016.All rights reservedOushu Database3.0 vs SparkSQL 2.2单位(毫秒单位(毫秒ms)OushuSparkratioselectcount(*)fromlineitem;21.282555120.06selectcount(*)fromlineitem;22.772440107.16AVERAGE22.032497.50113.61C

20、opyright 2016.All rights reservedcount不同数据类型的列单位(毫秒单位(毫秒ms)OushuSparkRatioselectcount(l_orderkey)fromlineitem;306.70392512.80selectcount(l_partkey)fromlineitem;274.35367413.39selectcount(l_suppkey)fromlineitem;244.77346614.16selectcount(l_linenumber)fromlineitem;133.67326524.43selectcount(l_quantity

21、)fromlineitem;110.12368933.50selectcount(l_extendedprice)fromlineitem;112.05362732.37selectcount(l_discount)fromlineitem;108.64388635.77selectcount(l_tax)fromlineitem;115.14372332.33selectcount(l_returnflag)fromlineitem;70.41459165.20selectcount(l_linestatus)fromlineitem;73.01420857.64selectcount(l_

22、shipdate)fromlineitem;127.12421833.18selectcount(l_commitdate)fromlineitem;135.43450633.27selectcount(l_receiptdate)fromlineitem;134.36419331.21selectcount(l_shipinstruct)fromlineitem;236.63431118.22selectcount(l_shipmode)fromlineitem;177.66417323.49selectcount(l_comment)fromlineitem;344.94588517.06

23、AVERAGE169.064083.7529.88Copyright 2016.All rights reservedsum/avg不同数据类型的列单位(毫秒单位(毫秒ms)OushuSparkRatioselectsum(l_orderkey)fromlineitem;323.16341410.56selectsum(l_partkey)fromlineitem;298.30332111.13selectsum(l_suppkey)fromlineitem;263.69324312.30selectsum(l_linenumber)fromlineitem;154.20319320.71se

24、lectsum(l_quantity)fromlineitem;128.39400431.19selectsum(l_extendedprice)fromlineitem;138.48404229.19selectsum(l_discount)fromlineitem;141.68350024.70selectsum(l_tax)fromlineitem;143.07353624.72selectavg(l_orderkey)fromlineitem;327.68351110.71selectavg(l_partkey)fromlineitem;303.51358311.81selectavg

25、(l_suppkey)fromlineitem;269.36333112.37selectavg(l_linenumber)fromlineitem;161.41319619.80selectavg(l_quantity)fromlineitem;131.92361427.40selectavg(l_extendedprice)fromlineitem;138.48355425.66selectavg(l_discount)fromlineitem;134.01361827.00selectavg(l_tax)fromlineitem;137.92354925.73AVERAGE199.703

26、513.0620.31Copyright 2016.All rights reservedgroupby(某一列)取count单位(毫秒单位(毫秒ms)OushuSparkRatioselectl_orderkey,count(*)fromlineitem groupbyl_orderkey;14314.14OOMNANselectl_partkey,count(*)fromlineitemgroupbyl_partkey;4127.98292997.10selectl_suppkey,count(*)fromlineitem groupbyl_suppkey;1142.611818115.9

27、1selectl_linenumber,count(*)fromlineitem group byl_linenumber;363.51957026.33selectl_quantity,count(*)fromlineitem groupbyl_quantity;370.151136730.71selectl_extendedprice,count(*)fromlineitem group byl_extendedprice;4929.78297366.03selectl_discount,count(*)fromlineitem groupbyl_discount;392.41103712

28、6.43selectl_tax,count(*)fromlineitemgroupbyl_tax;352.991037129.38selectl_returnflag,count(*)fromlineitem groupbyl_returnflag;545.861134620.79selectl_linestatus,count(*)fromlineitem groupbyl_linestatus;329.301121734.06selectl_shipdate,count(*)fromlineitem groupbyl_shipdate;638.511607725.18selectl_com

29、mitdate,count(*)fromlineitem groupbyl_commitdate;642.311616125.16selectl_receiptdate,count(*)fromlineitem groupbyl_receiptdate;647.121564924.18selectl_shipinstruct,count(*)fromlineitem groupbyl_shipinstruct;823.091153914.02selectl_shipmode,count(*)fromlineitem groupbyl_shipmode;630.631137118.03selec

30、tl_comment,count(*)fromlineitem groupbyl_comment;39032.16OOMNANAVERAGE(除去除去sparkOOM语句语句)1138.3015161.0721.66Copyright 2016.All rights reservedgroupby不同数据类型的列,取其sum和avg单位(毫秒单位(毫秒ms)OushuSparkRatioselectl_partkey,sum(l_partkey),avg(l_partkey)fromlineitemgroupbyl_partkey;8333.37544706.54selectl_suppkey

31、,sum(l_suppkey),avg(l_suppkey)fromlineitemgroupbyl_suppkey;1527.321950512.77selectl_linenumber,sum(l_linenumber),avg(l_linenumber)fromlineitemgroupbyl_linenumber;416.03991423.83selectl_quantity,sum(l_quantity),avg(l_quantity)fromlineitemgroupbyl_quantity;390.821194930.57selectl_extendedprice,sum(l_e

32、xtendedprice),avg(l_extendedprice)fromlineitemgroupbyl_extendedprice;9148.20320053.50selectl_discount,sum(l_discount),avg(l_discount)fromlineitemgroupbyl_discount;418.811075725.68selectl_tax,sum(l_tax),avg(l_tax)fromlineitemgroupbyl_tax;357.991073329.98AVERAGE2941.7921333.2918.98Copyright 2016.All r

33、ights reservedGroupby多列单位(毫秒单位(毫秒ms)OushuSparkRatioselectl_partkey,l_suppkey,count(*)fromlineitemgroupbyl_partkey,l_suppkey;13074.79OOMNANselectl_partkey,l_linenumber,count(*)fromlineitemgroupbyl_partkey,l_linenumber;18091.03OOMNANselectl_suppkey,l_extendedprice,count(*)fromlineitemgroupbyl_suppkey,

34、l_extendedprice;145543.51OOMNANselectl_partkey,l_shipmode,count(*)fromlineitemgroupbyl_partkey,l_shipmode;21298.14OOMNANselectl_partkey,l_shipdate,count(*)fromlineitemgroupbyl_partkey,l_shipdate;71890.82OOMNANselectl_suppkey,l_tax,count(*)fromlineitemgroupbyl_suppkey,l_tax;3994.25283347.09selectl_sh

35、ipdate,l_commitdate,count(*)fromlineitemgroupbyl_shipdate,l_commitdate;3159.433281110.39selectcount(l_orderkey)fromlineitemgroupbyl_linenumber,l_quantity,l_tax;1179.851808015.32AVERAGE2777.8426408.3310.93Copyright 2016.All rights reservedGroupby表达式单位(毫秒单位(毫秒ms)OushuSparkRatioselectl_partkey+l_suppke

36、y,count(*)fromlineitemgroupbyl_partkey+l_suppkey;4050.55316017.80selectl_partkey+1000fromlineitemgroupbyl_partkey+1000;2869.51270839.44selectl_tax*100fromlineitemgroupbyl_tax*100;426.141000523.48AVERAGE groupby表达式表达式2448.7322896.3313.57Copyright 2016.All rights reserved多个聚集函数单位(毫秒单位(毫秒ms)OushuSparkR

37、atioselectl_partkey,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_partkey;11878.22OOMNANselectl_suppkey,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_suppkey;2399.98237459.89selectl_linenumber,count(*),count(l_orderkey),sum(l_orderkey

38、),avg(l_orderkey)fromlineitemgroupbyl_linenumber;698.181094315.67selectl_quantity,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_quantity;702.601349619.21selectl_discount,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_discount;741.17126

39、6817.09selectl_tax,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_tax;670.631204617.96selectl_returnflag,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_returnflag;913.231281214.03selectl_linestatus,count(*),count(l_orderkey),sum(l_order

40、key),avg(l_orderkey)fromlineitemgroupbyl_linestatus;675.941244418.41selectl_shipdate,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_shipdate;1025.861784617.40selectl_shipmode,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_shipmode;selec

41、tl_comment,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_comment;117636.74OOMNANAVERAGE1722.5817189.4614.97Copyright 2016.All rights reservedTPCHQuery单位(毫秒单位(毫秒ms)OushuSparkRatioTPCHQ11175.991862615.84TPCHQ11140.011806015.84TPCHQ11161.931809615.57AVERAGE1159.3118260.

42、6715.75TPCHQ1selectl_returnflag,l_linestatus,sum(l_quantity)assum_qty,sum(l_extendedprice)assum_base_price,sum(l_extendedprice*(1-l_discount)assum_disc_price,sum(l_extendedprice*(1-l_discount)*(1+l_tax)assum_charge,avg(l_quantity)asavg_qty,avg(l_extendedprice)asavg_price,avg(l_discount)asavg_disc,co

43、unt(*)ascount_orderfromlineitem_1gorc_nonewherel_shipdate=1998-08-20groupbyl_returnflag,l_linestatus;Copyright 2016.All rights reservedOushu Database4.0:GlobalScale(2017H1)GlobalScale:No master,P2P,Geo-replication,mixedworkloadHornetHornetHornetHornetHornetHornetHornetHornetHornetHornetHornetHornetH

44、ornetHornetHornetCopyright 2016.All rights reservedHAWQ全球用户(部分)Copyright 2016.All rights reserved某大型制造企业案例背景l大量传感器数据无法及时处理l故障无法及时检测带来很大损失l传统解决方案过于昂贵实现目标l搭建大数据平台,提高其处理处理能力l200+节点分析平台集群lPB级数据存储l实现实时故障预测等应用Copyright 2016.All rights reserved某大型证券交易所 挑战为了应对每天增长的交易量,替换现有OracleEDW平台为了合规需要保存最细力度的交易数据经济有效的方式

45、保证每天处理TB级别增量数据 解决方案把所有交易数据放入Hadoop和HAWQ把12亿条记录放到HAWQ里面进行查询分析,获得更好的性能Copyright 2016.All rights reserved偶数科技简介EMC/Pivotal HAWQ创始人及HAWQ核心团队成员创立偶数两大数据仓库/AI产品 Oushu Database(HAWQ+)Apache HAWQ成员大多为Apache Committer&PMC成员,来自各大云计算和大数据公司:EMC/Pivotal,Oracle,IBM,Teradata等毕业于国内外顶级学府,多个ACM程序设计大赛奖牌得主团队研究成果发布在国际顶级数据管理会议上(比如SIGMOD等),并拥有多项国际专利获得国际顶级VC投资:红点和红杉Copyright 2016.All rights reserved谢谢!

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(2017年新一代数据仓库-Apache HAWQ.pdf)为本站 (云闲) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
会员购买
客服

专属顾问

商务合作

机构入驻、侵权投诉、商务合作

服务号

三个皮匠报告官方公众号

回到顶部