《SNIA-SDC23-Ki-Is-SSD-with-CXL-Interfaces-Brilliantly-Stupid-or-Stupidly-Brilliant.pdf》由会员分享,可在线阅读,更多相关《SNIA-SDC23-Ki-Is-SSD-with-CXL-Interfaces-Brilliantly-Stupid-or-Stupidly-Brilliant.pdf(30页珍藏版)》请在三个皮匠报告上搜索。
1、Is SSD with CXL Interfaces Is SSD with CXL Interfaces Brilliantly Stupid or Brilliantly Stupid or Stupidly Brilliant?Stupidly Brilliant?YANG SEOK KIYANG SEOK KI,Ph.D.Vice President,Memory Solutions Lab.,Samsung ElectronicsStupid or Brilliant?Stupid or Brilliant?Draisine(Running machine),1817:the 1st
2、 bicycle in recordThe Jazz Singer,1927:the 1st movie with an audio trackSSD with CXL InterfacesSSD with CXL Interfaces Storage with memory and/or storage interfacesSSDTechnical NeedsMemory HierarchyMemory Hierarchy Keep hot data close to CPU using data localityMemory HierarchyTraditional WorkloadTra
3、ditional WorkloadHotColdNeeds(1):Persistent MemoryNeeds(1):Persistent Memory Discontinuation of the leading technologyDatabase:Oracle Exadata Redo LogStorage:DAOS(Distributed Application Object Storage)Database ServerPersistent MemoryRedo buffers for Instance 1Commit RecordRedo Record2RDMA PUTs(25+u
4、s)1 write for redo record1 write for commit recordIntegrated FabricPersistent MemoryBlock StorageDAOS Storage EngineMetadata,Low-Latency I/O,and Indexing/QueryBulk DataMemory InterfaceNVMeInterfacePMDKSPDKStorage ServerNeeds(2):Secondary MemoryNeeds(2):Secondary Memory High overhead of virtual memor
5、y implementationSwap for memory extension on diskRedis Auto Tiering for memory extension on SSD0:1:N-1:CPU0:1:N-1:Virtual AddressesPage TableMemoryPhysical AddressesDiskStorage engineLRU/LFUSSDWarm ValuesDRAMHot ValuesNeeds(3):Fast Small IONeeds(3):Fast Small IO High overhead of IOs smaller than 4KB
6、DLRM size is rapidly growing0.0002020220.010.10000Unit:BillionsGrowingComputeNeedsGrowingMemoryNeedsTopMLPBottomMLPInteractionsSparse Features.Dense FeaturesEmbeddingEmbeddingSource:MetaAlexNetGoogLeNetVGCResNetBERTAlphaZeroResNetGPT-3SwitchTransformer(G)CXL-based SS
7、D A Hybrid device of DRAM and NAND with CXL interfacesCXLCXL-based SSDbased SSDPersistent MemoryNAND(Backing Storage)ApplicationDAX(kernel)DRAM(Persistent Memory)SecondaryMemoryVirtual memory(kernel/user)DRAM(cache/buffer)StorageDRAM(Cache/Buffer)NAND(volatile memory)NAND(Storage)CXL.memCXL.ioCXL.me
8、mCombo(Memory&Storage)Page CacheNAND(Storage)SWCXL.ioDRAM(Memory)ApplicationApplicationApplicationPage CacheCXL.memCXL(Compute Express Link)CXL(Compute Express Link)Asynchronous blocking memory interface with optional coherencyCXLCXL.memCXL.cacheCXL.ioCXLCXL Device TypesCXL Device Types Device types
9、 based on protocols,not functionsCXL.cacheCXL.ioType1 DeviceCXL.cacheCXL.ioType2 DeviceCXL.memCXL.memCXL.ioType3 DeviceCXLCXL-based SSD as Persistent Memorybased SSD as Persistent Memory Type-3 device similar to NVDIMMLinux(DAX)Load/StoreCXL.memApplicationMetadata,low-latency IO,high-priority indexi
10、ng/logging/queryRoCEDumpSizeDumpTime(s)DumpEnergy16GB3.258J32GB6.4115J64GB12.8229J128GB25.6457J256GB51.2913J512GB102.41,825JMemory-Semantic SSD with Persistence FeatureHostControllerCXL Root PortCXL BUSCPU CacNAND FlashBacking StoreBatteryPower SupplyAppDRAMPower Failure ProtectionPower Failure Prot
11、ectionBattery-backed DRAM with speed comparable to DDR5Persistence achieved with data dumps to NAND flashSupports flush-on-fail with CXL 2.0 GPF feature04080120160100%Write50%Write:50%Read10%Write:90%ReadsDDR5 DRAMMemory-Semantic SSD Persistent Memory Persistent Memory CompetitorOperations Per Secon
12、d(Million)Persistent Memory PerformancePersistent Memory PerformanceMemory-Semantic SSD Persistency DemoSmallI/O4 KB64 Bytes128 BytesCXL.memoryProcessing AI and ML applications,usually need relatively small-sized data chunksApplications can write data to the DRAM cache at DRAM speed Low latency enab
13、led by CXL.memory protocolCPUCXLCXL-based SSD as Secondary Memorybased SSD as Secondary Memory Example of Memory Configuration with TM ModeSecondary Memory OptionsSecondary Memory OptionsCPUL4$128GBDDR5 DIMML4$128GBDDR5 DIMM2TB SSD Capacity16GB DDR5L5$2TBMS-SSDSystem Memory SpaceCPULocal DRAM128GBDD
14、R5 DIMMLocal DRAM128GBDDR5 DIMM2TB SSD Capacity16GB DDR5L5$2TBMS-SSDSystem Memory SpaceLocal DRAMLocal DRAMHDM-T2 2TBHDM-T1 16GBCPULocal DRAM128GBDDR5 DIMMLocal DRAM128GBDDR5 DIMM2TB SSD Capacity16GB DDR5L5$2TBMS-SSDSystem Memory SpaceLocal DRAMLocal DRAMHDM-T2 2TB*Compared to PCIe Gen4 NVMe SSD Sma
15、ll granularity data access enable performance scales with cache hitsDirect memory access advantage;no software cache overheadLarge memory capacity at lower TCO1.11.73.243.01.11.52.732.71.01.42.016.30.91.11.48.00.88.080.00708090100Cache Hit Rate(%)64B128B256B512BMemory Reads Per Second(Mil
16、lion)Option 1 PerformanceOption 1 PerformanceLinuxLoad/StoreLBACXL.memCXL.ioApplicationCXLCXL-based SSD as Fast Small IO Storagebased SSD as Fast Small IO Storage64 Bytes128 BytesSmall I/O4 KBNormal I/OCXL.Mem ReadRandom Perf(128B)Cache hit 0%0.8 MIOPSCache hit 50%1.5 MIOPSCache hit 100%35.0 MIOPSLa
17、tencyCache hit/miss1us/70usCXL.ioSeq.perf(128KB)Read:5,500 MB/sWrite:2,000 MB/sRandom Perf(4KB)Read:800 KIOPSWrite:85 KIOPSMemory-Semantic SSDSystem Memory SpaceInitiate DMAData TransferMemory-Semantic SSDMethod 1Method 2Byte Addressable CXL.memSector/Block CXL.ioFineFine-grain Access to Storage Dat
18、agrain Access to Storage Datammap*Results based on publicly available DLRM workload traces from Meta and FPGA based PoC Memory-Semantic SSD*DLRM:Deep Learning Recommendation Model 4,45016,82430,445Inferences Per Second*I/O BasedHost Software Cache BasedHigh OverheadLow DRAM data reuseHigh Software O
19、verheadHardware Device Cache BasedTopMLPBottomMLPInteractionsSparse Features.Dense FeaturesMemory-Semantic SSDDevice DRAM CacheNAND flash memory.EmbeddingEmbeddingComputeDenseMemoryDenseDLRM*performance(Meta)DLRM Performance with Fast Small IOsDLRM Performance with Fast Small IOs23Samsung Semiconduc
20、tor01Title in Samsung Sharp Sans Bold(34)Body text in Samsung Sharp Sans Medium(16)Insert more text here.Use this page when Samsung fonts are available.Subtitle in Samsung Sharp Sans Bold(24)Movie Recommendation System DemoChallenges and Opportunities No definition and specStandard and EcoStandard a
21、nd EcoPersistent MemoryNAND(Backing Storage)ApplicationDAX(kernel)DRAM(Persistent Memory)SecondaryMemoryVirtual memory(kernel/user)DRAM(cache/buffer)StorageDRAM(Cache/Buffer)NAND(volatile memory)NAND(Storage)CXL.memCXL.ioCXL.memCombo(Memory&Storage)Page CacheNAND(Storage)SWCXL.ioDRAM(Memory)Applicat
22、ionApplicationApplicationPage CacheCXL.mem02004006008001001,00010,000100,0001,000,00010,000,000100,000,000 1,000,000,000 10,000,000,000Bandwidth(GB/s)Latency(ns)Latency ToleranceLatency Tolerance Impact of long latency on CPU performanceDDRFlashL1/L2CacheLLCHDDCMMLatency-domainThroughput-
23、domainLatency or Throughput*When 100%hit ratio*Compared to PCIe Gen4 NVMe SSD Close to DRAM end-to-end performance at a lower TCO*Up to 10 x better end-to-end performance with FPGA-based PoC*00000400005000000708090100110Inferences Per SecondCache Hit Ratio(%)Block IOBlock IO+Ho
24、st Memory CacheMemory-Semantic SSDDRAM MemoryCache ManagementCache Management Managing in-device DRAM is the key!Wrap UpWrap Up SSD with CXL interfaces for Persistent memory Performant secondary memory Storage for AI and HPC Near data processing platform Community efforts Standard for SSD with CXL interfaces(+cache management)Software ecosystem CPU architecture to tolerate long latency