上海品茶

HC2022.KAIST.JihoonKim.v04.pdf

编号:136955 PDF 16页 4.47MB 下载积分:VIP专享
下载报告请您先登录!

HC2022.KAIST.JihoonKim.v04.pdf

1、1 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsTrinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsJi-Hoon Kim1),Seunghee Han1),Kwanghyun Park2),Soo-Young Ji3)and Jo

2、o-Young Kim1)1)2)3)2 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsIn-DBMS Data Analytics Three Important yet Independent Technology TrendsML-based Advanced Data AnalyticsDatabaseHW AccelerationNear-Data/In-Storage Proces

3、singEnterprise-level DBMSIn-DBMS ML supportGPU-based DBMSASIC/FPGA/GPUData-intensive applicationSmartSSD/SmartNIC/HMC3 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsTrinity:In-Database,In-Storage Platform Full Stack Syste

4、m for Near-Data-based In-DBMS ML Inference ML-basedData AnalyticsDatabaseHW AccelerationIn/Near-Data ProcessingEnterprise DBMSsMADlib Spark MLQ100BlazingSQLCPU-FPGAHMCSmartSSDSmartNICGorgonDAnAAquomanMondrianSoftware StackHardwareTrinitySmartSSD-enabled DBMS(PostgreSQL+)Conventional SW StackCPU Exec

5、utorExtended SW StackMADlibSmartSSD Executor(XRT Platform)Host code(XRT C/C+API)Device code(.sv)XRT Linux Kernel DriveSmartSSDNAND Flash(3.84TB)PCIe SwitchFPGA HW Accelerator(i-DPA)FPGADRAM(4GB)PCIe4 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform fo

6、r Advanced Data AnalyticsTrinity:In-Database,In-Storage Platform Full Stack System for Near-Data-based In-DBMS ML Inference SmartSSD-enabled DBMS(PostgreSQL+)Conventional SW StackCPU ExecutorExtended SW StackMADlibSmartSSD Executor(XRT Platform)Host code(XRT C/C+API)Device code(.sv)XRT Linux Kernel

7、DrivePCIeSmartSSDNAND Flash(3.84TB)PCIe SwitchHW Accelerator(i-DPA)FPGADRAM(4GB)Data FormatConvertingSoftwareDBMSSW-HWInterfaceHardwareAcceleratorDynamic Offloading DecisionSeamless Integration of SmartSSDDirect Page DecodingDynamic Tuple BindingHeterogeneous Core Arch.w/Reconfig.On-chip Interconnec

8、tTrinity shows up to 57.18x faster query processing speed than CPU-based DBMS5 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsComputational Storage Device New Hardware Backend:Samsungs SmartSSD Xilinx Kintex UltraScale+FPG

9、A,4GB DRAM and 3.84TB NAND flash Direct FPGA-to-SSD data access using internal PCIe switch69mm15mm100mmCPU(Host)SSDDRAMSSD ControllerFPGAPCIe SwitchSmartSSDSSD R/WFPGA DRAM R/WP2PComm.6 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Dat

10、a AnalyticsSoftware Stack for Trinity Seamless Integration of SmartSSD in DBMS Extended SW stack extended analyzer1)+optimizer2)1)Converting query information to the HW data format2)Making runtime offloading decision to select an optimal HW backendQuery info.Operation:LinregrTable oid:219,517Filteri

11、ng:=Aggregation:CNTMeta-DataYesNoPosgreSQL pipeline(CPU Executor)(HW Data format)Query planSmartSSD Cost ModelCPUCost ModelFinal DecisionParserExtended AnalyzerQuery ExtractorQuery CheckerData ReconstructorExtended OptimizerPredictor(Cost model)Executor7 of 16HOTCHIPS 2022Trinity:End-to-End In-Datab

12、ase Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsExtended Query Optimizer Performance Cost Model Determining optimal hardware backend(SmartSSD vs CPU)Showing 5.3%&12.97%average error+96%offloading accuracySmartSSD Cost ModelCPU Cost ModelLatency(ms)Database Size(GB)Har

13、dwareCost modelError(%)!#$%!&=(!%+%$#!)*+$+,+$)+-Equation-based performance modelRegression-based performance model!#$%+&($%)*+#,-.+#,-.+(&)LatencyLatencyQuery ComplexityQuery ComplexityFine-tuning8 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for

14、 Advanced Data Analytics1.Database Page Decoder Page&data processing unit Page-level parallelism2.Database Tuple Binder Dynamic tuple binding Tuple-level parallelism3.Heterogeneous Core Arch.Reconfig.on-chip interconnect Task-level parallelismOverall Architecture of i-DPA*Database Page DecodersData

15、DecodersModel DecoderPB(8KB)PPUDPUPage Buffer(8KB)Page Processing UnitData Processing UnitCH#1-NUnicastingBroadcastingQuery Processing Core#1-NIMEM(8KB)WMEM(16KB)Database Tuple BinderReconfigurable On-Chip InterconnectLinear Comp.UnitPE Array#0-7PE#0 PE#1PE#2 PE#3Adder TreeConfigurable AggregationUn

16、itRelational Comp.UnitFiltering UnitAggr.UnitTree Comp.UnitTree PE#0-7CheckerAddress GeneratorComparatorMulti-Func Comp.UnitOMEM(8KB)Ctrl.Top Aggregation UnitTop RelationalAggregatorOutput DMA*i-DPA=in-Database Processing Accelerator9 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machin

17、e Learning Acceleration Platform for Advanced Data AnalyticsDatabase Page Decoder Direct Tuple Extraction from Database Page Removing host interaction Keep BW benefit of in-storage processing Faster page decoding w/page-level parallelism JJSSD(Database)Page TableCPUFPGAw/o Page Decoderw/Page Decoder

18、Database Page DecodersCtrlPage BufferRFALURegMask info.FilteringRegPage Processing UnitData Processing UnitCH#1CH#NPage-level parallelismPCIe1212Unnecessary Data Copy10 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsDataba

19、se Tuple Binder Dynamic Tuple Binding Dynamically varying tuple packing density according to the tuple size Tuple-level parallelism&hardware utilization JPE ArrayPE ArrayPE ArrayPE ArrayPE ArrayConfig.Aggregation Unit-Packing tuple w/zero-padding-Increasing parallelism up to 8xDatabase Tuple Binder-

20、Set proper aggregation link for final outputCASE 3(#of Attr 16)CASE 2(#of Attr 9-16)CASE 1(#of Attr 5-8)CASE 0(#of Attr 5)11 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsQuery Processing Core Heterogeneous Core Architect

21、ure Reconfigurable on-chip interconnect Enabling flexible data streaming J Task-level parallelism across the computing unitsMeta-DataFiltering UnitLinear Comp.UnitTree Comp.UnitAggregation UnitMultiFuncUnitFiltering UnitLinear Comp.UnitTree Comp.UnitAggregation UnitMultiFuncUnit1234Relational&MLOpco

22、deSettt+1t+2t+3Tuple0Tuple1Tuple2Tuple3Tuple0Tuple1Tuple2Tuple0Tuple1Tuple0Task-level pipelining12 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsFPGA Implementation Result System Setup&FPGA Implementation ResultComputatio

23、nal Storage Server-2 Intel Xeon Silver 4210 CPUs-PostgreSQL v12.6-MADlib v1.17.0-156GB DIMM-3.84TB SmartSSDFPGAFreq.InterfaceCore0Core1OthersUtilizationSpecificationsKintex Ultrascale+170MHzResource UtilizationLUT267.4%FF92437975736.1%BRAM311.515152246.9%URAM83232056

24、.3%DSP93423421035.7%13 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsEnd-to-End Trinity Evaluation Evaluate Against CPU-based DBMS Platform 0.85x 57.18x faster query processing than CPU-based DBMS 15.21x faster than CPU-b

25、ased DBMS on average CPU-based DBMSTrinitySpeedupLatency(ms)SpeedupSpeedupLatency(ms)Latency(ms)SpeedupLatency(ms)Speedup14 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsScaling-up with Multiple SmartSSDs Scale-up the Ove

26、rall System SmartSSD:easy to scale-out the number of devices w/U.2 form factor Deploying 4 SmartSSDs 200 x faster than CPU-based DBMSLatency(ms)1 SmartSSD2 SmartSSD4 SmartSSD-With 2 SmartSSD 1.85x performance gain-With 4 SmartSSDs 3.66x performance gainLinear Performance ImprovementOverall 200 x Spe

27、edup AchievedUniformly distribute databasePCIe Sub-system15 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics1.Full Stack System for In-DBMS Advanced Data Analytics 57.18x faster query processing than CPU-based DBMS2.Softwar

28、e Stack(PostgreSQL+)for SmartSSD Integration Dynamic offloading decision 96%accuracy3.Near-Storage based Hardware Accelerator(i-DPA)Direct data page decoding&abundant parallel processing(3-levels)ConclusionTrinity:In-Database,Near-Data Machine Learning Acceleration Platform for advanced Data Analytics16 of 16HOTCHIPS 2022Trinity:End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data AnalyticsThank You!Questions?Feel Free to Contact Me!E-mail:jihoon0708kaist.ac.kr LinkedIn:https:/ Near-Data ML Acceleration platform

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(HC2022.KAIST.JihoonKim.v04.pdf)为本站 (2200) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
会员动态
会员动态 会员动态:

 wei**n_... 升级为标准VIP wei**n_... 升级为高级VIP

wei**n_...  升级为至尊VIP 一朴**P...  升级为标准VIP

133**88...  升级为至尊VIP  wei**n_... 升级为高级VIP

 159**56... 升级为高级VIP  159**56... 升级为标准VIP

升级为至尊VIP 136**96... 升级为高级VIP  

  wei**n_... 升级为至尊VIP wei**n_... 升级为至尊VIP 

wei**n_...   升级为标准VIP 186**65... 升级为标准VIP

137**92... 升级为标准VIP   139**06... 升级为高级VIP

 130**09... 升级为高级VIP   wei**n_...  升级为至尊VIP

wei**n_... 升级为至尊VIP wei**n_...  升级为至尊VIP

 wei**n_... 升级为至尊VIP  158**33... 升级为高级VIP 

骑**...  升级为高级VIP  wei**n_... 升级为高级VIP 

 wei**n_... 升级为至尊VIP  150**42...  升级为至尊VIP

185**92...  升级为高级VIP dav**_w... 升级为至尊VIP

zhu**zh... 升级为高级VIP  wei**n_...  升级为至尊VIP

136**49...  升级为标准VIP   158**39... 升级为高级VIP

wei**n_... 升级为高级VIP   139**38... 升级为高级VIP

159**12... 升级为至尊VIP  微**... 升级为高级VIP 

185**23...  升级为至尊VIP  wei**n_...  升级为标准VIP

152**85... 升级为至尊VIP  ask**un  升级为至尊VIP

 136**21...  升级为至尊VIP   微**... 升级为至尊VIP

 135**38... 升级为至尊VIP  139**14... 升级为至尊VIP

138**36... 升级为至尊VIP  136**02...  升级为至尊VIP

139**63...  升级为高级VIP   wei**n_... 升级为高级VIP

Ssx**om  升级为高级VIP  wei**n_... 升级为至尊VIP

 131**90... 升级为至尊VIP 188**13...  升级为标准VIP

 159**90... 升级为标准VIP 风诰 升级为至尊VIP 

182**81... 升级为标准VIP  133**39...  升级为高级VIP

wei**n_...  升级为至尊VIP  段** 升级为至尊VIP 

wei**n_... 升级为至尊VIP  136**65... 升级为至尊VIP 

136**03...  升级为高级VIP wei**n_... 升级为标准VIP

 137**52... 升级为标准VIP  139**61... 升级为至尊VIP

 微**... 升级为高级VIP  wei**n_... 升级为高级VIP 

  188**25... 升级为高级VIP  微**... 升级为至尊VIP 

wei**n_... 升级为高级VIP  wei**n_...  升级为标准VIP 

 wei**n_...  升级为高级VIP wei**n_...  升级为标准VIP

186**28... 升级为标准VIP  微**... 升级为至尊VIP 

 wei**n_... 升级为至尊VIP  wei**n_... 升级为高级VIP

189**30...  升级为高级VIP   134**70...  升级为标准VIP

  185**87... 升级为标准VIP wei**n_... 升级为高级VIP

 wei**n_... 升级为至尊VIP  微**...  升级为至尊VIP

 wei**n_...  升级为标准VIP wei**n_...   升级为至尊VIP

wei**n_... 升级为标准VIP   132**09...  升级为至尊VIP 

 麦提  升级为高级VIP wei**n_... 升级为高级VIP 

 wei**n_... 升级为至尊VIP  wei**n_... 升级为标准VIP 

 wei**n_... 升级为至尊VIP  wei**n_...  升级为标准VIP

wei**n_...  升级为至尊VIP  wei**n_...   升级为标准VIP

182**18... 升级为高级VIP   中**...  升级为至尊VIP

 136**77... 升级为标准VIP wei**n_...  升级为标准VIP 

 180**43... 升级为至尊VIP  桃**  升级为至尊VIP