上海品茶

Hot Chips 2022 CXL MemoryChallenges.pdf

编号:136962 PDF 39页 2.52MB 下载积分:VIP专享
下载报告请您先登录!

Hot Chips 2022 CXL MemoryChallenges.pdf

1、CXL Memory ChallengesAugust 21,2022Prakash Chauhan,Meta&Mahesh Wagh,AMDAMD Official Use Only Memory trends and challenges Cost,bandwidth,capacity CXL enabled solutions heterogeneous,tiered,cost and performance optimized Evolution of CXL attached memory Simple expanders-interleaved expanders-pooled m

2、emory-FAMAI/ML specific topologies CXL attached memory challenges Tiered memory performance making it transparentDeployment challenges-RAS,telemetry,servicing,securityAgenda8/17/222Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyMemory an increasing fraction of system

3、sostMemory Price(cost/bit)flat due to scaling challengesIncreasing core counts and new workloads driving memory demandIncreased Capacity Increased BandwidthServer Memory-Challenging Trends8/17/223Data Source:De Dios&AssociatesCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use

4、 OnlyAdding DDR channels to CPU for bandwidth and capacityLarge CPU socketsCost,ReliabilityPCB layer countAdditional layer per channelBoard form-factorDifficulty fitting in standard widthsIncreasing data rates for bandwidthPCB technologyBack-drill,SMT connectors,blind viasEqualization circuitsComple

5、xity,cost added to both ends1DPC Capacity/Granularity IssuesSystem Level Challenges8/17/224Confidential|CXL Consortium 2022AMD Official Use Only A common,standard interface for many types of memory Enables system flexibility to make use of different media characteristicspersistence,latency,BW,endura

6、nce,etc Enables usage of heterogeneous memory tiers Differential signallingCXL:a media-agnostic memory interface8/17/225DDR3/4/5 DRAMLPDDR3AMPM ComponentMedia ControllerCXL Memory ExpanderCXLDDR3/4/5 DRAMLPDDRDRAMPM ComponentMedia ControllerCXL Memory ExpanderCXLMedia ControllerCXL Memory ExpanderCX

7、LHost CPUHost CPUHost CPUCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyTiers of happiness8/17/226Confidential|CXL Consortium 2022AMD Official Use OnlyCXL enables System design flexibility 8/17/227Low core count CPUDDR5 DIMMDDR5 DIMMDDR5 DIMMDDR5 DIMMNative DDR channe

8、lsHigh core count CPUDDR5 DIMMDDR5 DIMMDDR5 DIMMDDR5 DIMMCXL MemoryCXL MemoryCXL MemoryCXL MemoryCXL linksNative DDR channelsCommon motherboardEnables flexibility to add a variety of memory without impacting natively attached DIMMsCXL Memory can be optimized independently for system cost,capacity,po

9、wer,bandwidthCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyCXL Cost savings example8/17/228CXL MemoryCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial CPU80 CoresDDR5 DIMMDDR5 DIMMNative DDR 8 DIMMsDDR5 DIMMDDR5 DIMMDDR5 DIMMDDR5 DIMMDDR5 DIMMDDR5 DIMM Assumi

10、ng 2GB per core Need 160GB of memory 16GB DIMMs x8=128GB 32GB DIMMs x8=256GB 32GB DIMMs x5 Lost BW and Perf 16GB DIMMs x8+32GB CXL memory Right size with added BandwidthAMD Official Use OnlyCXL Evolution:Flexible Fungible memory8/17/229Host 0Host 1Host 2Host 3Multi Port CXL Memory expanderMC 1MC 2Me

11、mory PoolMemory ChannelsPooled MemoryAmortize CXL infra costFlexible allocation CXL LinksShared MemoryDeduplicationHost2host communicationlarge datasetsHost 0Host 1Host 2Host 3Multi Port CXL Memory expanderMC 1MC 2Shared Memory PoolMemory ChannelsCXL LinksFabric MemoryScaling to huge datasetsSwitchS

12、witchSwitchSwitchHostHostHostHostCXL LinksCXL LinksCXL ISL LinksHostHostHostHostCXL LinksCXL LinksCXLMCCXLMCHostCXL MemoryexpanderMCMemoryMemory ChannelsDirect attachedAdd CapacityAdd BandwidthSlower-cheaper tierCXL LinkCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only

13、AI/ML/HPC/HPDA workloads consume significant amount of powerAI Memory Topologies8/17/2210FabricManager Optimize compute to happen near the data Enable near and in-memory computeCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only Capacity and bandwidth expansion Numa Domai

14、ns Roundtrip latency to CXL Numa socket-to-socket latency Exposed to the HV,Guest OS,Apps Based on ACPI objects CDAT SRAT,SLIT,HMAT Affinity/NUMA info describes characteristics QoS required for balanced performance OS-assisted optimization of the memory subsystem Several software and hardware techni

15、ques under development for managing hot/cold pagesTiered Memory8/17/2211NUMA Domains Mission Critical/RT Analytics Less frequently accessible Voluminous Data Page MigrationCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyTransparent Tiered memory8/17/2212CPUCXL ASICDDRC

16、XLDDRCXL memoryNative memory CXL latency good but nearly 2x native Need smart page placement to keep performance intact“hot”memory pages in native memory“cold”memory pages in CXL memoryCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only With current NUMA schemes hot pages

17、 can get stuck in CXL memory Hurts application performanceTiering-issues with existing software8/17/2213CPUCXL ASICCXLNative memorynative memory almost fullnew allocation into CXL memorypreviously“cold”page becomes hotCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyCol

18、d page detection using existing kernel mechanismDemotion-moving cold pages to CXL memory proactively to maintain headroom in native memoryPromotion-moving hot pages to native memoryAdditional optimization to avoid ping-ponging between hot/coldExample:Tiering with TPP8/17/2214CPUCXL ASICCXLused memor

19、yinitiates demotion-AutoNUMA like sampling-kernel notified on access-trigger promotionCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyDeployment Challenges8/17/2215Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyRAS CXL ERROR HANDLING8/17

20、/2216Error TypeError ReportingError HandlingPoisonReported synchronously with the poisoned data and traverses to the requestorConsistent with handling poison from direct attached memory.Host Implementation mechanisms(Ex.Machine Check)Viral Reported via CXL LinkConsistent with handling fatal error fr

21、om direct attached memory(Host Implementation specific mechanisms)CXL Protocol Errors Reported via PCIe AER.AER“Internal Error”(UIE/CIE)indicates information is also logged in CXL RAS capability structure.New PCIe RCEC collects the error information and signals the FW/OS.Handling is analogous to PCI

22、e Root Port handling.CXL component eventsReported via CXL Device/Component Command Interface(mailbox,etc.)using CXL-defined record formats.New Device may signal FW using VDM or OS using MSI/MSI-X.FW/OS issues mailbox commands to device to gather records.CXL Protocol Errors and Component events requi

23、res new software development.Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only FW FIRST Platform firmware follows the AER path to decode the errors.Platform firmware collects CXL RAS capability information as part AER handling.Platform firmware will use the UEFI“CXL Pro

24、tocol Error Section”CPER format to represent data.Platform firmware may signal OS through existing ACPI methods.OS FIRST OS must handle AER events through existing methods.If AER status indicates an“Internal Error”(UIE/CIE),and the source is a RCEC,then the OS must inspect the AER and CXL RAS struct

25、ures of all associated DPs.If AER status indicates an“Internal Error”(UIE/CIE),and the source is a EndPoint,then the OS must inspect the CXL RAS structure of the associated UP.RAS CXL PROTOCOL ERROR HANDLING8/17/2217Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only Boot

26、-time tasks OS must request and be granted“Memory Error reporting”control through _OSC.OS may load a“CXL Memory Device”driver to manage the Device/Component Command Interface.OS may load a“CXL Event”service driver.Leverage“CXL Memory Device”driver to interact with mailbox,etc.Enable polling timers a

27、nd MSI/MSI-X interrupts for Events Run-time tasks Event driver handlers will execute mailbox commands to get and clear records.Event driver will parse records and report to user.Event driver may call other OS facilities to act on error informationRAS CXL COMPONENT EVENT HANDLING8/17/2218OS CXL DRIVE

28、R DEVELOPMENT REQUIRED Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only Access throttling(rate metering)mechanisms required for balanced performance across Memory Tiers Mechanisms are Host implementation specific QoS Telemetry mechanisms enable CXL devices to provide t

29、imely device load notifications in responses.Immediate&On-going Covers Device Internal Loading,Egress Port Backpressure,and Temporary throughput reduction Enables host to optimize rate metering mechanisms Multiple QoS Classes supported,covers MLD QoS ever important as use models expandQoS Telemetry8

30、/17/2219Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only CXL Devices encounter several events/telemetry that need to be managed Firmware events RAS events Thermal events Device Health Run-Time management of CXL Devices requires a mechanism Notify events Read Event Logs

31、 Commands to manage/repair Device Read info for Diagnostics Get Device Capabilities information In-band&Out-of-band mechanisms to meet manageability requirements Ecosystem Software Development required for Lifecycle Management Lifecycle Management8/17/2220Copyright CXL Consortium 2022|Hot Chips 2022

32、 CXL Tutorial AMD Official Use Only Link IDE provides link level encryption and integrity defined.Proposals under development to meet emerging security and confidential compute requirements Security8/17/2221CXL 2.0 provides Integrity and Data Encryption of traffic across all entities(Root Complex,Sw

33、itch,Device)CXL 2.0 SwitchCPU/SoC Root ComplexCXL DeviceHome AgentMCHost MemoryIO BridgeCXL.ioCXL.memoryArea of ProtectionMCDevice MemoryCopyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyCXL MEMORY DEVICE TYPES AND POOLING8/17/2222Copyright CXL Consortium 2022|Hot Chips

34、2022 CXL Tutorial AMD Official Use OnlyCXL SLD Memory Device Support one or more PCIe Endpoint Functions Type 0 header in PCI Configuration space Primary function(device number 0,function number 0)must carry one instance of CXL DVSEC ID 0 with Revision 1 or greater.Non-CXL Function Map DVSEC to adve

35、rtise Non-CXL functions Must support operating in CXL 1.1 mode PCIe Endpoint RCIEP Type 3 device Component Register Block includes HDM Decoder registers Connected to a Single Virtual HierarchyD0HDM DecodersHPADPACXLHPA Host Physical AddressDPA Device Physical Address 8/17/22Copyright CXL Consortium

36、2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only Represents an SLD behind each CXL Port Device Vendor-Specific mechanisms to configure resources per SLD Example 4 Ported CXL Memory Device with equal allocation of pooled resources across portsCXL Multi-Ported Memory Device Pooled Memory DeviceH

37、DM DecodersHPACXLHPA Host Physical AddressDPA Device Physical Address D0HDM DecodersHPACXLD0HDM DecodersHPACXLD0HDM DecodersHPADPACXLD0DPADPADPA8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only A Pooled Type 3 device can partitionits resources into Logical Device

38、s(LD)Up to 16 LDs(Type 3 only)AND One Fabric Manager(FM)owned LD Each LD Appears as Type 3 SLD device Identified by LD-ID FM binds each LD to a Virtual Hierarchy FM Owned LD Accessible by FM only by using LD-ID of 0 xFFFFh Manage Link and Device Memory resources are not assigned to LD owned LD Error

39、 messages generated by LD are routed to FM Does not participate in GPF Flows MLD Link MLD Link Discovery&Link Operation configured via Alternate Protocol NegotiationCXL Multi-Logical Memory DevicePooled Memory DeviceHPA Host Physical Address DPA Device Physical Address HPADPACXLHDM DecodersIO Decode

40、rsLD#00HDM DecodersIO DecodersLD#01HDM DecodersIO DecodersLD#02HDM DecodersIO DecodersLD#03HDM DecodersIO DecodersLD#04HDM DecodersIO DecodersLD#05HDM DecodersIO DecodersLD#06HDM DecodersIO DecodersLD#07HDM DecodersIO DecodersLD#08HDM DecodersIO DecodersLD#09HDM DecodersIO DecodersLD#10HDM DecodersI

41、O DecodersLD#11HDM DecodersIO DecodersLD#12HDM DecodersIO DecodersLD#13HDM DecodersIO DecodersLD#14HDM DecodersIO DecodersLD#15FM OWNED LDFM API8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only Fabric Manager is a control entity that manages the CXL 2.0 Switch an

42、d the Memory Controller FM can be an external BMC,a Host,or Firmware internal to the Switch FM Endpoint is a required feature for any switch that supports MLD ports or that supports dynamic SLD port binding FM API is the standardized interface for the FM to communicate with devices FM API uses an MC

43、TP interface between Fabric Manager and devices MCTP physical interface is switch vendor specific but could be PCIe,CXL.io VDM,SMBus,Ethernet,UART,USB,internal,In general there are no real-time response requirements for the Fabric Manager so it neednt be performantFabric Manager Endpoint and APIFabr

44、icManagerFabric Manager EndpointCXL Switch or Device8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only Fabric Manager plays a critical role in CXL for systems supporting Memory Pooling The Fabric Manager enables dynamic system changes supporting Memory disaggregat

45、ion Some examples:Managing all devices that support traffic from multiple Hosts including:Downstream ports connected to MLD ports FM-owned Logical Device within an MLD component Unbinding and rebinding of Logical Devices within an MLD between Hosts Unbinding and rebinding of an SLD Re-allocation of

46、memory within an MLD Re-allocation of memory within a multi-port SLDFabric Manager8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyMemory Pooling with Single Logical Devices8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial H2 notifies FM that D4 me

47、mory is no longer neededD2H1H2D#D1D3D4FMD2CXL 2.0 SwitchFabric Manager APIAMD Official Use OnlyMemory Pooling with Single Logical Devices8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial FM tells switch to UNBIND D4Switch notifies H2 of the managed hot removeD2H1H2D#D1D3D4FMD2CXL 2.0

48、SwitchFabric Manager APIAMD Official Use OnlyMemory Pooling with Single Logical Devices8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial FM tells switch to BIND D4 to H1 Switch notifies H1 using managed hot addH1 enumerates and configures accesses to D4D2H1H2D#D1D3D4D2CXL 2.0 SwitchFM

49、AMD Official Use OnlyMemory Pooling with Multi-Logical Devices8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial D#D1H1D3D4D2H2CXL 2.0 SwitchFMAMD Official Use OnlyMemory Pooling with Multi-Logical Devices8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial D#D1H1D3D4D2H2CX

50、L 2.0 SwitchFMH2 notifies FM that some D2 memory is no longer neededAMD Official Use OnlyMemory Pooling with Multi-Logical Devices8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial FM tells D2 to de-allocate some blue memoryD2 notifies H2D#D1H1D3D4D2H2CXL 2.0 SwitchFMAMD Official Use O

51、nlyMemory Pooling with Multi-Logical DevicesFM tells D2 to allocate some yellow memoryD2 notifies H1H1 updates HDM ranges and starts using memoryD#D1H1D3D4D2H2CXL 2.0 SwitchFM8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyMemory Pooling without a SwitchH1D2H2D1

52、FMH2 notifies FM that some D2 memory is no longer needed8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyMemory Pooling without a SwitchH1D2H2D1FMFM tells D2 to de-allocate some blue memoryD2 notifies H28/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tuto

53、rial AMD Official Use OnlyMemory Pooling without a SwitchH1D2H2D1FMFM tells D2 to allocate some yellow memoryD2 notifies H1,H1 updates HDM ranges8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use OnlyMemory Pooling with CXL Other FM API features beyond BIND,UNBIND,and

54、SET LD allocations:Switch discovery including capacity,capabilities,and connected devices Event notification such as switch link events and Advanced Error Reporting for FM owned resources Manage MLD QoS parameters Benefits of Memory Pooling Effective utilization of memory resources within a system D

55、ynamic Allocation/deallocation of memory resources Total Cost Of Ownership(TCO)savings8/17/22Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial AMD Official Use Only CXL attached memory use models continue to grow,from direct attached to realizable memory centric computing CXL consortium continues to address the challenges to support CXL attached memory Large opportunities lie ahead for the ecosystem Get involved!Summary8/17/2239Copyright CXL Consortium 2022|Hot Chips 2022 CXL Tutorial

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(Hot Chips 2022 CXL MemoryChallenges.pdf)为本站 (2200) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
会员动态
会员动态 会员动态:

wei**n_... 升级为标准VIP  wei**n_... 升级为高级VIP 

 wei**n_... 升级为至尊VIP 一朴**P... 升级为标准VIP 

 133**88...  升级为至尊VIP  wei**n_... 升级为高级VIP

159**56...   升级为高级VIP 159**56... 升级为标准VIP 

升级为至尊VIP  136**96...  升级为高级VIP

wei**n_... 升级为至尊VIP wei**n_... 升级为至尊VIP 

 wei**n_... 升级为标准VIP  186**65... 升级为标准VIP

137**92...  升级为标准VIP 139**06...  升级为高级VIP 

 130**09... 升级为高级VIP wei**n_... 升级为至尊VIP 

wei**n_...  升级为至尊VIP  wei**n_...  升级为至尊VIP

wei**n_...  升级为至尊VIP  158**33...  升级为高级VIP

 骑**... 升级为高级VIP  wei**n_...  升级为高级VIP

wei**n_...  升级为至尊VIP 150**42... 升级为至尊VIP 

 185**92... 升级为高级VIP  dav**_w...  升级为至尊VIP

zhu**zh... 升级为高级VIP   wei**n_... 升级为至尊VIP

 136**49...  升级为标准VIP 158**39...  升级为高级VIP

 wei**n_... 升级为高级VIP  139**38... 升级为高级VIP 

159**12... 升级为至尊VIP 微**... 升级为高级VIP 

 185**23... 升级为至尊VIP wei**n_...  升级为标准VIP 

152**85... 升级为至尊VIP  ask**un  升级为至尊VIP

 136**21... 升级为至尊VIP 微**... 升级为至尊VIP 

 135**38... 升级为至尊VIP  139**14... 升级为至尊VIP

 138**36... 升级为至尊VIP  136**02...  升级为至尊VIP

 139**63...  升级为高级VIP wei**n_... 升级为高级VIP 

Ssx**om 升级为高级VIP   wei**n_... 升级为至尊VIP 

131**90... 升级为至尊VIP   188**13...  升级为标准VIP

159**90...  升级为标准VIP   风诰 升级为至尊VIP

  182**81... 升级为标准VIP 133**39... 升级为高级VIP 

 wei**n_... 升级为至尊VIP 段**   升级为至尊VIP

  wei**n_... 升级为至尊VIP 136**65... 升级为至尊VIP 

136**03...  升级为高级VIP   wei**n_... 升级为标准VIP

  137**52... 升级为标准VIP 139**61... 升级为至尊VIP

 微**...  升级为高级VIP wei**n_...   升级为高级VIP

 188**25... 升级为高级VIP 微**... 升级为至尊VIP 

wei**n_... 升级为高级VIP wei**n_... 升级为标准VIP  

wei**n_...  升级为高级VIP wei**n_... 升级为标准VIP  

186**28... 升级为标准VIP    微**... 升级为至尊VIP

wei**n_...  升级为至尊VIP wei**n_... 升级为高级VIP 

189**30... 升级为高级VIP  134**70... 升级为标准VIP 

185**87... 升级为标准VIP   wei**n_... 升级为高级VIP 

 wei**n_... 升级为至尊VIP   微**... 升级为至尊VIP

 wei**n_... 升级为标准VIP wei**n_... 升级为至尊VIP 

wei**n_...  升级为标准VIP  132**09...  升级为至尊VIP 

麦提  升级为高级VIP  wei**n_...  升级为高级VIP

 wei**n_... 升级为至尊VIP wei**n_...  升级为标准VIP

wei**n_... 升级为至尊VIP  wei**n_...  升级为标准VIP

wei**n_...  升级为至尊VIP wei**n_...  升级为标准VIP 

182**18...  升级为高级VIP  中**... 升级为至尊VIP  

136**77... 升级为标准VIP     wei**n_... 升级为标准VIP

 180**43...  升级为至尊VIP  桃** 升级为至尊VIP