上海品茶

hc34.SKhynix.YongkeeKwon.v03.pdf

编号:136959 PDF 25页 3.53MB 下载积分:VIP专享
下载报告请您先登录!

hc34.SKhynix.YongkeeKwon.v03.pdf

1、System Architecture and Software Stack for GDDR6-AiMYongkee Kwon,Kornijcuk Vladimir,Nahsung Kim,Woojae Shin,Jongsoon Won,Minkyu Lee,Hyunha Joo,Haerang Choi,Guhyun Kim,ByeongjuAn,Jeongbin Kim,Jaewook Lee,Ilkon Kim,Jaehan Park,Chanwook Park,Yosub Song,Byeongsu Yang,Hyungdeok Lee,Seho Kim,Daehan Kwon,S

2、eongju Lee,Kyuyoung Kim,Sanghoon Oh,Joonhong Park,Gimoon Hong,Dongyoon Ka,Kyudong Hwang,Jeongje Park,Kyeongpil Kang,Jungyeon Kim,Junyeol Jeon,Myeongjun Lee,Minyoung Shin,Minhwan Shin,Jaekyung Cha,Changson Jung,Kijoon Chang,Chunseok Jeong,Euicheol Lim,Il Park,and Junhyun Chun,SK hynix SK hynix Inc.Th

3、is material is proprietary of SK hynix Inc.and subject to change without notice./Confidential Text AbstractThis poster presents system architecture,software stack,and performance analysis for SK hynixs very first GDDR6-based processing-in-memory(PIM)product sample,called Accelerator-in-Memory(AiM).A

4、iM is designed for the in-memory acceleration of matrix-vector product operations,which are commonly found in machine learning applications.The strength of AiM primarily comes from the two design factors,which are 1)all-bank operation support and 2)extended DRAM command set.All-bank operations allow

5、 AiM to fully utilize the abundant internal DRAM bandwidth,which makes it an attractive solution for memory-bound applications.The extended command set allows the host to address these new operations efficiently and provides a clean separation of concerns between the AiM architecture and its softwar

6、e stack design.We present a dedicated FPGA-based reference platform with a software stack,which is used to validate AiM design and evaluate its system-level performance.We also demonstrate FMC-based AiM extension cards that are compatible with the off-the-shelf FPGA boards and serve as an open resea

7、rch platform allowing potential collaborators and academic institutes to access our hardware and software systems.2 SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential AiM FPGA Platform(w/CPU Host)AiM Subsystem and Software StackAiM ConceptThe

8、 Accelerator-in-Memory(AiM)is a GDDR6-based Processing-in-Memory device designed to accelerate memory-intensive Machine Learning applications in memory.Conventional System vs.AiM System FC Layer Activation Normalization Attention Element-wise AddCPU/GPUBKBKBKBKBKBKBKBKBKBKBKBKBKBKBKBKPERIDRAM100%Nor

9、malizationFC LayerFC LayerScaled DotProduct AttentionFC LayerFC LayerGELUNormalizationDATACompute on CPUConventional System Normalization Attention Element-wise AddCPU/GPUBKBKBKBKBKBKBKBKBKBKBKBKBKBKBKBKPERIDRAMDATAGBGBPUPUPUPUPUPUPUPUPUPUPUPUPUPUPUPUAiM-Enabled System96%NormalizationFC LayerFC Laye

10、rScaled DotProduct AttentionFC LayerFC LayerGELUNormalizationOffload to AiM4%Compute on CPUAIM Subsystem512 KBRegisterPCIe IPMuticasting InterconnectAiMCTRL 1AiMCTRL 0AXI SlaveXilinx UltraScale+FPGAAiM(2 CH/Chip)AiMDMAAXI MasterQDMAPCIe EPAiMCH BAiMCH AKernel SpacePCIe RCUser SpaceRuntime LibraryFra

11、meworkApplicationDevice DriverHost x86 CPUCONTENTSGDDR6-AiM OverviewAiM SubsystemAiM Software StackFPGA Platform&PerformanceOpen Research PlatformIVVAiMCH AAiMCH BAiM SubsystemAiM DMA512 KBMulticasting InterconnectAiMCTRL 0AiMCTRL 1PCIeEPPCIeRCSoftware Framework(Pytorch and ONNX runtime)Application(

12、GPT-2,LSTM,)AiM Software StackDeviceDriverAiMRuntime LibraryUser Space SWPCIe IPsKernel Space SWAiM Subsystem IPsAiM HWLegend SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential GDDR6-AiM Feature Summary5SK hynixs very first GDDR6-based proces

13、sing-in-memory(PIM)product sample,called Accelerator-in-Memory(AiM)GDDR6-AiM*DRAM TypeGDDR6Process Technology1yMemory Density8Gb(4Gb DDP)Organization2CH/Chip,x16 mode onlyIO Data rate16 Gb/s/pin(1.25V)Bandwidth64 GB/sProcessing Unit(PU)16 PU/die,32 PU/ChipOperatingSpeed1 GHzCompute Throughput 1TFLOP

14、S/ChipNumeric PrecisionBrain Floating Point 16(BF16)Activation Function support*Sigmoid,tanh,GELU,ReLU,Leaky ReLU,TargetsMemory-bound DNN applications*S.Lee,et al.“A 1ynm 1.25V 8Gb,16Gb/s/pin GDDR6-based Accelerator-in-Memory supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep

15、-Learning Applications”,2022 INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE(ISSCC).IEEE,2022*With using internal lookup table and linear interpolation unit.Any customized function may apply with accuracy limitation.GDDR6-AiM FloorplanPUBK 4PUBK 7PUBK 5PUBK 6PUBK 12PUBK 15PUBK 13PUBK 14PUBK 0PUBK 3PUB

16、K 1PUBK 2PUBK 8PUBK 11PUBK 9PUBK 10 SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential GDDR6-AiM Key Operation:Matrix Vector6MAC and Activation Function operations can be performed in all banks in parallel.Weight matrix data is sourced from B

17、anks;Vector data is sourced from the Global Buffer.MAC results are stored in latches collectively referred to as MAC_REG.Activation Function results are stored in latches collectively referred to as AF_REG.W0 V0W1 V1W14V14W15V15MAC_REGACCUMULATORMultiply-And-Accumulate(MAC)Performs MAC operation on

18、sixteenBF16 weight matrix and vector elements(corresponds to a single DRAM column access,i.e.32B).Computation results are stored in a dedicated MAC_REG set and can be later accessed by the user.GLOBAL_BUFFERBANK0WBANK1WBANK15WWeightsActivationsVMACxMACMACMAC_REGAF(x)AFAFAF_REG32B2KB2KBActivation Fun

19、ction ModulePerforms Activation Function(AF)computation by linearly interpolating pre-stored AF template data using MAC calculation results.Interpolation results are stored in a dedicated AF_REG set and can be later accessed by the user.LUT0LUT1LUT2MAC ResultAF Result+f=Weight MatrixActivation Vecto

20、r32B SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential Bank ActivationACT4,ACT16Activate four/sixteen banks in parallelACTAF4,ACTAF16Activate rows storing Activation Functions LUTs in four/sixteen banks in parallelCompute CommandsMACSB,MAC4B

21、,MACABPerform MAC in one/four/sixteen banks in parallelAFCompute Activation Function in all banksEWMULPerform element-wise multiplicationData CommandsRDCPCopy data from a bank to the Global BufferWRCPCopy data from the Global Buffer to a bankWRGB*Write to Global Buffer(often Activation vector data)R

22、DMAC*Read from MAC result registerRDAF*Read from Activation Function result registerWRMAC*Write to MAC result register(or WRBIAS as often BIAS data is written)WRBKWrite to all activated banks in parallel Commands marked as CMD*require a special Mode Register set to be recognized.New Commands introdu

23、ced in AiM7GDDR6-AiM OverviewAiM SubsystemAiM Software StackFPGA Platform&PerformanceOpen Research PlatformIVVCONTENTSAiMCH AAiMCH BAiM SubsystemAiM DMA512 KBMulticasting InterconnectAiMCTRL 0AiMCTRL 1PCIeEPPCIeRCSoftware Framework(Pytorch and ONNX runtime)Application(GPT-2,LSTM,)AiM Software StackD

24、eviceDriverAiMRuntime LibraryUser Space SWPCIe IPsKernel Space SWAiM Subsystem IPsAiM HWLegend SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential AiM Subsystem512 KBRegisterMulticasting InterconnectAiMCTRL 1AiMCTRL 0AXI SlaveAiMDMAAiMCH BAiMC

25、H AAiM Subsystem:High-Performance Portable Reference Design9AiM Command/Data DMAAiM Multicasting InterconnectAiM ControllerPCIe/DMAx86CPUARM,RISC-V,AIAcceleratorAXIAXIAXI123Main FocusorEnables efficient workload distribution through flexible instruction parallelism.Supports unicast,multicast,and bro

26、adcast modes.Decodes AiM instructions from software and provides direct memory access for the host.Generates and schedules low-level AiM and typical DRAM commands.AiM Subsystem is a hardware bridge between the host and the AiM devices.It is designed to 1)maximize compute throughput for a set of AiM

27、devices and to 2)minimize software stack overhead.123Multicasting InterconnectCTRL0CTRL1CTRL2CTRL3CTRL4CTRL5CTRL6CTRL7BROADCASTMulticasting InterconnectCTRL0CTRL1CTRL2CTRL3CTRL4CTRL5CTRL6CTRL7MULTICASTMulticasting InterconnectCTRL0CTRL1CTRL2CTRL3CTRL4CTRL5CTRL6CTRL7UNICAST SK hynix Inc.This material

28、 is proprietary of SK hynix Inc.and subject to change without notice./Confidential AiM DMAOperations EngineAXIPacketDecoderMulticasting InterconnectAXI InterconnectDRAM Operations DRAM_WRITE DRAM_READGPR Operations GPR_WRITE GPR_READISR Operations ISR_WRITE_SBK ISR_WRITE_GB ISR_WRITE_MAC ISR_WRITE_A

29、FLUT ISR_READ_MAC ISR_READ_AF ISR_COPY ISR_MAC_SBK ISR_MAC_ABK ISR_AF ISR_EWMULAXIInterfaceResponse Ordering EngineGeneral Purpose Register(GPR)512KBInstruction Set Register(ISR)FSMAiM Command/Data DMA10Operations EngineInterprets AXI operations from the host and accordingly generates either memory

30、access or AiM compute requests.Memory requests are passed to the memory directly,while AiM requests are additionally processed by the dedicated FSM called ISR.AXI Packet Decoder performs host(software)packet interpretation.General Purpose Register(GPR)is an SRAM used for 1)holding bias data to be re

31、written multiple times during GEMV computation,2)holding activation results to be used as vectors for the following layers,3)holding diagnostic and debugging data,e.g.temperature.Instruction Set Register(ISR)FSM generates streams of low-level AiM requests based on the higher-level software commands.

32、See next slide for an example.Response Ordering EngineAn optional block used for restoring the order of DRAM read transactions,which can be reordered by the AiM controllers.1212 SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential Bank Activati

33、onACT4,ACT16Activate four/sixteen banks in parallelACTAF4,ACTAF16Activate rows storing Activation Functions LUTs in four/sixteen banks in parallelCompute CommandsMACSB,MAC4B,MACABPerform MAC in one/four/sixteen banks in parallelAiM HW/SW Interface:ISR Command Format11192bCOLOPCODEOPSIZERESERVEDTCH_M

34、ASKBK/AF_IDXROW/GPR_ADDRG5b2b10b1b12b1b8b4b14b6bOTADDRDATA1b32b256bHOST MEMORY OPERATION2BYTE MASK16bC1bIO06 520 1924 2333 32 3146 45 4464 63256 25559 58 57 56 55COL0 x0COPSIZERESERVEDTCH_MASKBKROWGC IOEx.)ISR_MAC_ABKISR CommandAiMAiMControllerHostProcessorAiM CommandCK_cCK_tCAMACAiMDMANumber of MAC

35、 commands to send to AiMTarget addresses for the MAC commandsAn all-bank MAC operation SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential PHYFR-FCFSArbiterBankEngine 0EngineArbiterRefreshEngineAiMEngineAiM ControllerMulticasting InterconnectS

36、chedulerAiM Controller12GDDR6-AiM Controller=GDDR6 Controller+AiM Command SchedulingSchedulerFR-FCFS Arbiter rearranges memory requests to reduce the number of row activations.AiM requests are always scheduled in-order.Bank Engines and AiM Engine generate and schedule DRAM commands(ACT,PREPB,MAC,)fr

37、om the memory requests.Refresh Engine periodically generates and schedules refresh commands.Engine Arbiter multiplexes between the engines and issues prioritized DRAM commands to PHY.12PHYA standard GDDR6 DRAM PHY block.Details depend on the selected implementation technology(e.g.FPGA).12 SK hynix I

38、nc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential AIM Subsystem512 KBRegisterPCIe IPMuticasting InterconnectAiMCTRL 1AiMCTRL 0AXI SlaveXilinx UltraScale+FPGAAiM(2 CH/Chip)AiMDMAAXI MasterQDMAPCIe EPAiMCH BAiMCH AAiM FPGA Platform:AiM Subsystem Prototy

39、ped in Xilinx FPGA13GDDR6-AiM FMC Extension CardXilinx VCU118AiM ControllerPHYXIPHYPLLIOBRXTX BITSLICESPacket HandlerCalibrationHandlerMCSSchedulerPHY is implemented by using configurable source-synchronous interface technology(SelectIO)available in Xilinx FPGAs.Interface calibration is performed by

40、 a specialized RTL FSM(Calibration Handler)controlled by an MCS microcontroller.GDDR6-AiM OverviewAiM SubsystemAiM Software StackFPGA Platform&PerformanceOpen Research PlatformIVVCONTENTSAiMCH AAiMCH BAiM SubsystemAiM DMA512 KBMulticasting InterconnectAiMCTRL 0AiMCTRL 1PCIeEPPCIeRCSoftware Framework

41、(Pytorch and ONNX runtime)Application(GPT-2,LSTM,)AiM Software StackDeviceDriverAiMRuntime LibraryUser Space SWPCIe IPsKernel Space SWAiM Subsystem IPsAiM HWLegend SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential AiM Hardware SystemAiM Soft

42、ware Stack15AiM Runtime LibraryDNN FrameworkDNN Applications(GPT,RNN,LSTM,MNIST)AiM Device DriverAiM Instruction DispatcherMemory AllocatorAiM Software EmulatorAiM Instruction GeneratorMemoryManagementAiM OPKernelKernel SpaceSoftware StackAiM ExtensionAiM Execution ProviderAIM Subsystem512 KBRegiste

43、rMuticasting InterconnectAiMCTRL 1AiMCTRL 0AXI SlaveAiM(2 CH/Chip)AiMDMAAiMCH BAiMCH AAiM Runtime LibraryAiM OP Kernel includes memory-bound DNN operations such as Linear Op,Sequential OP,RNN OP,and LSTM OP.Memory Management allocates buffers and reshapes data.AiM Instruction Generator generates AiM

44、 Instruction(ISR Command)Stream for AiM DMA.AiM Device DriverMemory Allocator allocates chunk memories and manages buffersAiM Instruction Dispatcher dispatches AiM Instruction Stream to AiM DMA12AiMFunction Simulator12AiMPerformance modelUser Space SK hynix Inc.This material is proprietary of SK hyn

45、ix Inc.and subject to change without notice./Confidential PyTorch and ONNX Runtime Support16Implemented AiM extension using C+extensionExposes AiM Operators to PyTorch via the AiM extensionProvides ease of programmability with“to_aim”APIImplemented using AiM EP(Execution Provider)Some nodes in ONNX

46、graph are offloaded,depending on the capability of AiM EP Add“AiMExeucutionProvider”into the“EP_List”In-Memory GraphGraph PartitionerProvider RegistryDistributedGraph RunnerAiMExecution ProviderCPUExecution ProviderAiM Runtime LibraryExecution ProvidersOutputResultONNX Model(.onnx)InputDataLinear la

47、yerSequential layersRNN,GRU,LSTMOffloading CoverageDNN ApplicationConversion API(to_aim()PyTorch Custom Ops PyTorch C+extensionAiM Runtime LibraryPyTorchONNX Runtime SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential AiM Runtime LibraryDNN Fr

48、ameworkDNN Applications(GPT,RNN,LSTM,MNIST)AiM Device DriverAiMInstructionDispatcherMemoryAllocatorAiM Software EmulatorAiMInstruction GeneratorMemoryManagementAiMOPKernelSoftware StackAiMExtensionAiMExecution ProviderAiMFunctionSimulatorAiMPerformance modelAiM Software Emulator17Supports AiM functi

49、on simulator that emulates the functionalities of the AiM FPGA platformOffers flexibility of the hardware model with a HW configuration fileSupports AiM performance analytical modelAllows to develop their own AI application to run on AiM and to estimate performance without the AiM FPGA platformAiM F

50、unction SimulatorFunction Simulator InterfaceHW Config#Channels#Banks#AiMs512 KBAiM Functional ModelAIM CTRL 0AIM CTRL 0AIM CTRL 0GDDR6-AiM Model(CH 0)AiM Performance ModelAiM PacketsAIM CTRL 0AIM CTRL 0AIM CTRL 0AiM CTRLModel(CH 0)AiM DMAModelAiM Software Emulator Total Exec.Time ISR CommandsLatenc

51、y PCIe Latency SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential AiM Software StackAiM SubsystemAiMMulticasting InterconnectISR Command Flow:From DNN Framework to AiM18DNN Frameworks(e.g.PyTorch)Runtime LibraryAiM DMAInterconnect and AiM Con

52、trollersWeight MatrixVectorBiasResultf(+)=MAC_ABKISR_WR_GB(V0)ISR_WR_BIAS(B0)ISR_MAC_ABKISR_RD_MAC(B0)ISR_WR_BIAS(B1)ISR_MAC_ABKISR_RD_MAC(B1)ISR_WR_GB(V1)ISR_WR_BIAS(B0)ISR_MAC_ABKISR_RD_AF(R0)ISR_WR_BIAS(B1)ISR_MAC_ABKISR_AFISR_RD_AF(R1)ISR_AFTile0Tile1Tile2Tile3132 Send the weight matrix informat

53、ion to AiM runtime library.AiM runtime library reshapes the matrix and stores it into AiMs Generate an ISR command stream from the weight matrix information Convert the DNN model to the AiM OPs Call the AiM OPs defined in AiM runtime library Send the ISR command stream to AiM DMA via QDMADecode ISR

54、command stream and generate AiM packets.(e.g.,ISR_MAC_ABK generates 64 AiM MAC_ABK packets)Send or broadcast AiM packets to the AiM controllers.AiM controllers decode AiM packets into AiM commands regarding their stateSend AiM commands to the AiM devices and execute the operation(e.g.,execute all-ba

55、nk MAC)ISR_MAC_ABKOPSIZE=63CH_MASK=7ROW,COLMAC_ABKROW,COLCH_MASK=7MAC_ABKROW,COLABBCTile0(Job0)V0V1B0B1R0R1Tile1(Job1)Tile2(Job2)Tile3(Job3)AiM InstructionStreamISR_WR_GB(V0)ISR_WR_BIAS(B0)ISR_MAC_ABKISR_RD_MAC(B0)ISR_WR_BIAS(B1)ISR_MAC_ABKISR_RD_MAC(B1)ISR_WR_GB(V1)ISR_WR_BIAS(B0)ISR_MAC_ABKISR_RD_

56、AF(R0)ISR_WR_BIAS(B1)ISR_MAC_ABKISR_AFISR_RD_AF(R1)ISR_AFA5MAC_ABKMAC_ABKMAC_ABKMAC_ABKC7f+=AiMV0BKPUBKPUBKPUBKPUBKPUBKPUGB468 SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential AiM Matrix Tiling:Column-major vs.Row-major Tiling19ISR_WR_GB(V0

57、)ISR_WR_BIAS(B0)ISR_MAC_ABKISR_RD_MAC(B0)ISR_WR_BIAS(B1)ISR_MAC_ABKISR_RD_MAC(B1)Tile 0Tile 1ISR_WR_GB(V0)ISR_WR_BIAS(B0)ISR_MAC_ABKISR_MAC_ABKTile 0Tile 2ISR_RD_MAC(R0)ISR_WR_GB(V1)Column-major tiling keeps activation vector in Global BufferRow-major tiling keeps partial sums accumulated in MAC_REG

58、 +=Tile0Tile1Tile2Tile3V0V1B0B1R0R1Weight MatrixActivationVectorBiasResultThe tradeoffs between column-and row-major tiling may result in different performance,depending on the size and shape of the matrix.1Tile 1ISR_MAC_ABKTile 3ISR_RD_MAC(R1)ISR_WR_GB(V1)ISR_WR_GB(V0)ISR_WR_BIAS(B1)ISR_MAC_ABKTile

59、 2ISR_WR_BIAS(B1)ISR_MAC_ABKISR_RD_MAC(R1)Tile 3ISR_WR_GB(V1)ISR_WR_BIAS(B0)ISR_MAC_ABKISR_RD_MAC(R0)1221GDDR6-AiM OverviewAiM SubsystemAiM Software StackFPGA Platform&PerformanceOpen Research PlatformIVVCONTENTS SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without

60、 notice./Confidential AiMCH AAiMCH BAiM SubsystemAiM DMA512 KBPCIe IPMulticasting InterconnectAiMCTRL 0AiMCTRL 1AXI QDMAAXIXilinxUltraScale+FPGAPCIeEPAiM Devices(2 CH/Chip)PCIeRCSoftware Framework(Pytorch and ONNX runtime)Application(GPT-2,LSTM,)Host(x86 CPU)DeviceDriverAiMRuntime LibraryXilinx IPsU

61、ser Space SWPCI Express HWKernel Space SWAiM Subsystem IPsAiM HWLegendAiM FPGA Platform:Xilinx FPGA board+x86 CPU21Xilinx VCU118+2 AiM FMC cards(2 AiM chips(4 channels,2GB)Running at 2Gb/s/pin(4GB/s per channel,256GFLOPs per chip)Note:GDDR6-AiM can run at up to 16 Gb/s/pin SK hynix Inc.This material

62、 is proprietary of SK hynix Inc.and subject to change without notice./Confidential -1 2 3 4 5 6 7 8102.4GB/s16GB/s128GB/s128GB/sNormalized execution timeAiM offloadableOthersData movementPerformance Evaluation:GPT-3 13B configuration22Peak BW(External)Higher gains are expected if AiM is directly dep

63、loyed on the memory channels running at 16Gb/s/pin,as demonstrated by“AiM Projected”with our performance analytical modelEven higher gains can be achieved when unnecessary data movement over PCIe interface is eliminatedScaled Dot Product AttentionConcatenationFully Connected LayerFully Connected Lay

64、erFully Connected Layer+GELUNormalizationNormalizationFully Connected Layer+Elementwise-AddElementwise-AddDecoder BlockDecoder BlockInput EmbeddingDecoder BlockSoftmaxN layersGPT architectureFully ConnectedHuggingface Text GenerationCPU:Intel Xeon Gold 6230,4ch*DDR4-3200 AiM:4ch 2Gb/s/pin,column-maj

65、or tilingCPU(FP32)AiM(BF16)Measured6.73xAiM(BF16)Projected16.64x18.05x SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential AiM Analytical Performance Model Validation23AiM 8ch 2Gb/s/pinOur performance analytical model matches the measured perf

66、ormance within 5%for varying the number of channels and frequency of AiMs055404550768 x 768(Base)1024 x 1024(Medium)1280 x 1280(Large)1600 x 1600(Xlarge)Execution time(us)MeasuredAnalytical modelGDDR6-AiM OverviewAiM SubsystemAiM Software StackFPGA Platform&PerformanceOpen Research Platfo

67、rmIVVCONTENTS SK hynix Inc.This material is proprietary of SK hynix Inc.and subject to change without notice./Confidential Open Research Platform Future Research Topics AiM Architecture Exploration AiM Controller Scheduling algorithm:Normal DRAM Read/Write vs.Refresh vs.AiM commands AiM Software Sta

68、ck and Applications More applications mapping to AiMs AIM optimal tiling algorithm Automatic decision at runtime whether or not to offload to AiM25AiM SDK with AiM Function SimulatorAiM FMC extension card with AiM Subsystem(FPGA bitstream)SKhynix_PIMAiMCH AAiMCH BAiM SubsystemAiM DMA512 KBPCIe IPMul

69、ticasting InterconnectAiMCTRL 0AiMCTRL 1AXI QDMAAXIXilinxUltraScale+FPGAPCIeEPAiM Devices(2 CH/Chip)PCIeRCSoftware Framework(Pytorch and ONNX runtime)Application(GPT-2,LSTM,)Host(x86 CPU)DeviceDriverAiMRuntime LibraryXilinx IPsUser Space SWPCI Express HWKernel Space SWAiM Subsystem IPsAiM HWLegendAiM Runtime LibraryDNN FrameworkDNN Applications(GPT,RNN,LSTM,MNIST)AiM Device DriverAiMInstructionDispatcherMemoryAllocatorAiM Software EmulatorAiMInstruction GeneratorMemoryManagementAiMOPKernelSoftware StackAiMExtensionAiMExecution ProviderAiMFunctionSimulatorAiMPerformance model

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(hc34.SKhynix.YongkeeKwon.v03.pdf)为本站 (2200) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
会员动态
会员动态 会员动态:

wei**n_... 升级为标准VIP wei**n_... 升级为高级VIP 

 wei**n_...  升级为至尊VIP 一朴**P... 升级为标准VIP 

133**88...  升级为至尊VIP wei**n_...  升级为高级VIP

159**56...  升级为高级VIP 159**56...  升级为标准VIP 

升级为至尊VIP 136**96... 升级为高级VIP 

wei**n_...  升级为至尊VIP   wei**n_... 升级为至尊VIP

wei**n_... 升级为标准VIP  186**65... 升级为标准VIP

 137**92...  升级为标准VIP  139**06...  升级为高级VIP

130**09... 升级为高级VIP   wei**n_... 升级为至尊VIP 

wei**n_... 升级为至尊VIP  wei**n_... 升级为至尊VIP 

 wei**n_... 升级为至尊VIP  158**33... 升级为高级VIP 

 骑**... 升级为高级VIP  wei**n_... 升级为高级VIP

 wei**n_... 升级为至尊VIP  150**42... 升级为至尊VIP

 185**92...  升级为高级VIP  dav**_w...  升级为至尊VIP

 zhu**zh... 升级为高级VIP   wei**n_...  升级为至尊VIP

136**49...  升级为标准VIP 158**39... 升级为高级VIP 

wei**n_...  升级为高级VIP 139**38... 升级为高级VIP 

 159**12... 升级为至尊VIP  微**...  升级为高级VIP 

 185**23... 升级为至尊VIP wei**n_... 升级为标准VIP  

  152**85... 升级为至尊VIP  ask**un 升级为至尊VIP

 136**21... 升级为至尊VIP 微**...  升级为至尊VIP

135**38...  升级为至尊VIP 139**14... 升级为至尊VIP 

 138**36... 升级为至尊VIP    136**02... 升级为至尊VIP

139**63... 升级为高级VIP   wei**n_...  升级为高级VIP

Ssx**om   升级为高级VIP  wei**n_... 升级为至尊VIP 

 131**90... 升级为至尊VIP 188**13...  升级为标准VIP 

159**90...  升级为标准VIP 风诰 升级为至尊VIP 

182**81...  升级为标准VIP 133**39... 升级为高级VIP 

wei**n_...  升级为至尊VIP 段**  升级为至尊VIP

 wei**n_... 升级为至尊VIP  136**65... 升级为至尊VIP

136**03...  升级为高级VIP wei**n_... 升级为标准VIP

137**52... 升级为标准VIP  139**61... 升级为至尊VIP

微**... 升级为高级VIP  wei**n_... 升级为高级VIP 

188**25...  升级为高级VIP  微**... 升级为至尊VIP

wei**n_...  升级为高级VIP wei**n_... 升级为标准VIP 

wei**n_...  升级为高级VIP  wei**n_... 升级为标准VIP 

 186**28... 升级为标准VIP  微**... 升级为至尊VIP

 wei**n_...  升级为至尊VIP wei**n_... 升级为高级VIP 

189**30... 升级为高级VIP   134**70... 升级为标准VIP

 185**87... 升级为标准VIP  wei**n_... 升级为高级VIP 

 wei**n_... 升级为至尊VIP 微**...  升级为至尊VIP 

wei**n_... 升级为标准VIP   wei**n_... 升级为至尊VIP 

 wei**n_... 升级为标准VIP  132**09... 升级为至尊VIP 

麦提 升级为高级VIP  wei**n_...  升级为高级VIP

 wei**n_... 升级为至尊VIP wei**n_...  升级为标准VIP

wei**n_...  升级为至尊VIP  wei**n_... 升级为标准VIP  

wei**n_...  升级为至尊VIP wei**n_...  升级为标准VIP

182**18... 升级为高级VIP   中**... 升级为至尊VIP

  136**77... 升级为标准VIP wei**n_...  升级为标准VIP

180**43...  升级为至尊VIP   桃** 升级为至尊VIP