《AI-HPC - BBCube 3D - Heterogeneous 3D Integration to Provide TBs Bandwidth with Lowest Bit Access Energy for AI-HPC applications.pdf》由会员分享,可在线阅读,更多相关《AI-HPC - BBCube 3D - Heterogeneous 3D Integration to Provide TBs Bandwidth with Lowest Bit Access Energy for AI-HPC applications.pdf(14页珍藏版)》请在三个皮匠报告上搜索。
1、OCP Global Summit October 18,2023|San Jose,CANorio ChujoIIR,WOW Alliance,Tokyo Institute of TechnologyResearch&Development Group,Hitachi,LTD.BBCube 3D:Heterogeneous 3D Integration to Provide TB/s Bandwidth with Lowest Bit Access Energy for AI/HPC applicationsMotivationDemands for high data bandwidth
2、 are increasingHBM has been introducedHigh bandwidth with the same power2D transmission prevents improvement of access energyHeterogeneous 3D integrationPaving the way to 10 TB/s0.0Mem.access power WMemory bandwidth GB/sDDR(2D)100 W power budgetxPUPCBDIMMDDR0.1001000
3、10000Mem.access power WMemory bandwidth GB/sDDR(2D)HBM(2.5D)100 W power budgetxPUHBM2EHBM2EHBM2EHBM2EHBM2EHBM2EHBM2EHBMSi interposer0.0Mem.access power WMemory bandwidth GB/sDDR(2D)HBM(2.5D)Heterogeneous 3D100 W power budgetHBM2EHBM2EHBM2EHBM2EHBM2EHBM2EHBM2ExPUCachedieDRAMdies
4、Heterogeneous 3DI challengesMemories on top ofxPUxPU on top of memoriesStructureCoolingDifficultEasyPower DeliveryEasyDifficultDRAMxPUDRAMxPUBBCube 3DxPUEasyCoolingxPU cannot dissipate heat sufficientlyPower deliveryImpedance of TSV causes supply voltage drop and large droopBBCube has potential to s
5、olve 3DI issuesDense TSVsThin diesWaffle shaped waferxPU diesBase waferBase wafer(4)Litho,etching and liner deposition(5)Cu ECD and flattening by CMP(2)MoldingAdhesive(1)Attach face-down(3)Wafer thinningAdhesiveBase wafer(3)Wafer thinningAdhesiveBase waferProcess Flow of BBCube 3D(CoW)Process Flow o
6、f BBCube 3D(WoW)DRAM wafer(1)Wafer thinningCarrier waferAdhesive(temporary)(4)Cu ECD and flattening by CMP(3)Litho,etching and liner depoBase waferBase waferBase wafer(2)Bonding and debonding of carrier waferAdhesiveCarrier waferBumpless Via-Last interconnect similar to Cu/Low-k BEOL processStacking
7、 and thinning first,TSV formation lastWafer/die bonding used SiOC adhesives.No needs for nano-scale planarizationBEOL-based high reliability interconnects with low-thermal budgetUltra-thinningLow TSV aspect ratio down to 2.5Low cost interconnects compared to conventional 3DIs using micro-bumps and h
8、ybrid bondingSuperior Connectivity of BBCube200 m100 mCu-TSVDeviceDieThinned Si Wafer5 mAdhesiveBEOLSiBEOLPhoto ResistDRAM1DRAM2Cross-section of CoWCross-section of WoWT.Funaki et al.,ECTC2021T.Takasaki et al.DPS2021Dense TSV realize high BWShort and slim TSV decreases CDirect Cu-Cu contact,thin Si
9、decreases RthTSV characteristicsPitch:40.0 mTSV:7.0 m(Cu:6.0 m)73.0 mPitch:10.0 mTSV:4.0 m(Cu:3.0 m)10.0 mConventional 3DIBBCube1101000.010.1110TSV capacitance fFFrequency GHzBBCubeConventional 3DI1/20Conventional 3DIBBCubeTemp.C(Log scale)Rth effK mm2/W18.020.26Surface heat:10 MW/m2Surface heat:10
10、MW/m2Stationary wall:0 CSiSiO2UnderfillSi-bumpTSVTSVSiSiO2SiAdh.0Stationary wall:0 C1/70Fig.2 TSV capacitanceFig.1 Physical dimension of TSVTable 1 Thermal resistance of TSVN.Chujo et al.,VLSI Symposium 2020,N.Chujo et al.,ECTC 2023Power supply impedance analysis0.0010.010.10.010.11101000
11、.010.1110Impedance ratioImpedance mFrequency GHzConventional 3DIBBCube1/101/201/501/1001/1001/2001/500N.Chujo et al.,ECTC2023Comparison with BBCube and conventional 3DI in impedance22-times lower at 10 MHz220-times lower at 5 GHzDC drop is decreased65.1 mV 2.9 mVWhen 45 W(50 A)xPU is stacked on lami
12、nated DRAMs DRAM temperatureIn BBCube 3D,over 47 W xPUs can be stackedIf 4 BBCube,over 188 W xPUs can be stacked506070809020304050Temperature CxPU power WDRAMs xPUBBCubeConventional3DI506070809020304050Temperature CxPU power WxPUDRAMs BBCubeConventional3DI8585N.Chujo et al.,ECT
13、C2023Fig.1 DRAMs on xPUFig.2 xPU on DRAMsBit Access Energy CalculationCalculate energy from row activation to last level cache in xPUDRAMBase dieSi interposerxPUMem.ctlLLCCoreWord lineBit lineTSVAccess route of HBM0.0100010000Access energy pJ/bitMemory band width GB/sBBCube 3D reaches 30X
14、 higher bandwidth,20X lower access energy than DDR54X higher bandwidth,5X lower access energy than HBM2EBit access energyN.Chujo et al.,VLSI Symposium 202305101520HBM2E BBCube 1/4 BBCubePower comsumpution WWL activationSense AmpIn chip busTSVPCB/Interposerin xPU1/50.0100010000Access energ
15、y pJ/bitMemory band width GB/s1/505101520HBM2E BBCube 1/4 BBCubePower comsumpution WWL activationSense AmpIn chip busTSVPCB/Interposerin xPU441/5DDR5HBM2EBBCubeThis study was carried out in the WOW Alliance program of the Tokyo Institute of TechnologyThe authors thank Zuken Inc.for the 3D drawing of BBCubeAcknowledgments1.Observe Surroundings2.Protect Humans3.Administrative SecretaryOur goal:AI Robotic Bee(50 mm3,0.5 mW)BBCube is an abbreviation ofBee(like system)Brain Cube OCP Global Summit|October 18,2023|San Jose,CA