《Blaze:SparkSQL Native算子优化在快手的设计与实践(1).pdf》由会员分享,可在线阅读,更多相关《Blaze:SparkSQL Native算子优化在快手的设计与实践(1).pdf(26页珍藏版)》请在三个皮匠报告上搜索。
1、Blaze:SparkSQL Native算子优化在快手的设计与实践王磊|快手大数据SQL引擎负责人2023 What is Blaze Architecture and Implementation details Current Progress and Future WorkWhat is BlazeRoadmap of Engine Spark 1.0Volcano Model Spark 2.0Whole-stage-codegen Spark 3.0Adaptive Query Execution Whats next?Vectorized ExecutionRelation wo
2、rksProjectCorporationDescriptionVeloxMetaVectorized acceleration libraryGlutenIntel&KyligencePlugin to offload SQL Engine to Native LibraryPhotonDatabricksNative vectorized engine for SparkNative CodegenAlibabaGenerate native code for SparkWhat is BlazelThe Blaze is an accelerator for Apache Sparkl
3、Blaze leverages native vectorized execution to accelerate query processing.It combines the power of the Apache Arrow-DataFusion library and the scale of the Spark distributed computing frameworkRelationship of Blaze&DataFusionSparkBlazeGlutenDataFusionVeloxPhysical PlanProtoBufSubstraitArchitecture
4、andImplementation detailsArchitecture OverflowSpark on BlazeA Simple DemoHigh-level ComponentslBlaze Session Extension:hooks the whole accelerator into Spark execution lifetimelPlan SerDe:serialization and deserialization of DataFusion plan with protobuflJNI Gateways:passing data and control through
5、 JNI boundarieslNative Operators:defines how each SparkPlan maps to native execution counterpartsExecution Flow DetailslPhysical Plan ConversionlGenerate and Submit Native PlanlNative ExecutionPhysical Plan ConversionGenerate and Submit Native PlanNative ExecutionMore Implementation DetailslCompatib
6、le with UDFlMemory ManagementlMore Efficient Operator ImplementationCompatible with UDFMemory ManagementMore efficient operator implementationlComparasion operation:arrow-rowlSort operation:sort_unstable in rustlHashMap:hashbrown(SwissTable hashmap)lColumnarized Shuffle:shuffle data file organized i
7、n column by custom formatContribution to DataFusionlMemory ManagementlRemote Storage APIlSortExec with spilllSortMergeJoinExeclSortPrevservingMergeExec Optimized by TournamentTreeCurrent Progress andFuture WorkOperator coverageSupported OperatorSupported and IncompleteUnsupported OperatorProjectFilt
8、erSortShuffleExchangeExpandLocalLimit/GlobalLimitBroadcastExchangeTakeOrderedAndProjectFileSourceScanSortMergeJoinBroadcastHashJoinHashAggregateSortAggregateWindowObjectHashAggreagteGenerateStruct/List/Map datatypeBeachmarkPass all tpc-ds queriesUp to 10 x performance boost for a single query q822x
9、performance boost on average for all queriesGray release and online revenueOnline CPU-bound JobUp to 4.3X performace boost for a single queryAverage 2X performace boostFuture WorkImprove datatype&operator coverageLarge-scale online useAbstract interface and support more engineContribute to the open source community感谢您的观看