《Hot Chips 2022 CXL Overview and evolution.pdf》由会员分享,可在线阅读,更多相关《Hot Chips 2022 CXL Overview and evolution.pdf(24页珍藏版)》请在三个皮匠报告上搜索。
1、CXL Overview and EvolutionIshwar AgarwalIntelCXL Board of Directors180+Member CompaniesIndustry Open Standard for High Speed CommunicationsConfidential|CXL Consortium 2022CXL Specification Release Timeline3Confidential|CXL Consortium 2022March 2019CXL 1.0 Specification ReleasedSeptember 2019CXL Cons
2、ortium Officially IncorporatesCXL 1.1 Specification ReleasedNovember 2020CXL 2.0 Specification ReleasedQ3 2022CXL 3.0 Specification Release New breakthrough high-speed interconnect Enables a high-speed,efficient interconnect between CPU,memory and accelerators Builds upon PCI Express infrastructure,
3、leveraging the PCIe physical and electrical interface Maintains memory coherency between the CPU memory space and memory on CXL attached devices Enables fine-grained resource sharing for higher performance with heterogeneous processing Enables memory disaggregation,memory pooling and sharing,persist
4、ent memory and emerging memory media Delivered as an open industry standard CXL Specification 3.0 is available now with full backward compatibility with CXL 2.0 and CXL 1.1 Future CXL Specification generations will continue to innovate to meet industry needs with backward compatibility CXL Overview4
5、Confidential|CXL Consortium 2022 Alternate protocol that runs across the standard PCIe physical layer Uses a flexible processor port that can auto-negotiate to either the standard PCIe transaction protocol or the alternate CXL transaction protocols CXL 2.0 and CXL 1.1 align to 32 GT/s PCIe 5.0 CXL 3
6、.0 aligns to 64GT/s PCIe 6.0 and is backward compatibleWhat is CXL?8/18/20225Compute Express Link and CXL Consortium are trademarks of the Compute Express Link Consortium.The CXL transaction layer is compromised of three dynamically multiplexed sub-protocols on a single link:CXL Protocols8/18/20226C
7、ompute Express Link and CXL Consortium are trademarks of the Compute Express Link Consortium.8/18/20227Compute Express Link and CXL Consortium are trademarks of the Compute Express Link Consortium.All 3 representative usages have latency critical elements:CXL.cache CXL.memory CXL.io CXL cache and me
8、mory stack is optimized for latency:Separate transaction and link layer from IO Fixed message framing CXL io flows pass through a stack that is largely identical a standard PCIe stack:Dynamic framing Transaction Layer Packet(TLP)/Data Link Layer Packet(DLLP)encapsulated in CXL flits CXL Stack Low la
9、tency Cache and Mem TransactionsCXL Stack Designed for Low Latency Representative CXL UsagesConfidential|CXL Consortium 2022Memory BuffersCXLCXL.ioCXL.memoryPROTOCOLSMemoryMemoryMemoryMemoryMemory ControllerProcessorDDRDDRMemory BW expansionMemory capacity expansionStorage class memoryUSAGESAccelera
10、tors with MemoryCXLCXL.ioCXL.cacheCXL.memoryPROTOCOLSGP GPUDense computationUSAGESHBMAcceleratorCacheProcessorDDRDDRCaching Devices/AcceleratorsCXLCXL.ioCXL.cachePROTOCOLSPGAS NICNIC atomicsUSAGESAcceleratorNICCacheProcessorDDRDDRTYPE 1TYPE 2TYPE 3HBMIndustry trends Use cases driving need for higher
11、 bandwidth:e.g.,high performance accelerators,system memory,SmartNIC etc.CPU capability requiring more memory capacity and bandwidth per core Efficient peer-to-peer resource sharing/messaging across multiple domains Memory bottlenecks due to CPU pin and thermal constraints needs to be overcomeCXL 3.
12、0 Specification9Confidential|CXL Consortium 2022CXL 3.0 introduces Double the bandwidth Zero added latencyover CXL 2.0 Fabric capabilities Multi-headed and fabric attached devicesEnhance fabric managementComposable disaggregated infrastructure Improved capability for better scalability and resource
13、utilizationEnhanced memory poolingMulti-level switchingDirect memory/Peer-to-Peer accesses by devices New symmetric memory capabilitiesImproved software capabilities Full backward compatibility with CXL 2.0,CXL 1.1,and CXL 1.0CXL 3.0 is a huge step function with fabric capabilities while maintaining
14、 full backward compatibility with prior generationsNot supportedSupported CXL 3.0 Spec Feature Summary10Confidential|CXL Consortium 2022FeaturesCXL 1.0/1.1CXL 2.0CXL 3.0Release date201920202022Max link rate 32GTs32GTs64GTsFlit 68 byte(up to 32 GTs)Flit 256 byte(up to 64 GTs)Type 1,Type 2 and Type 3
15、DevicesMemory Pooling w/MLDsGlobal Persistent FlushCXL IDESwitching(Single-level)Switching(Multi-level)Direct memory access for peer-to-peerEnhanced coherency(256 byte flit)Memory sharing(256 byte flit)Multiple Type 1/Type 2 devices per root portFabrics(256 byte flit)Uses PCIe 6.0 PHY 64 GT/s PAM-4
16、and high BER mitigated by PCIe 6.0 FEC and CRC(different CRC for latency optimized)Standard 256B Flit along with an additional 256B Latency Optimized Flit(0-latency adder over CXL 2)0-latency adder trades off FIT(failure in time,109hours)from 5x10-8to 0.026 and Link efficiency impact from 0.94 to 0.
17、92 for 2-5ns latency savings(x16 x4)1 Extends to lower data rates(8G,16G,32G)Enables several new CXL 3 protocol enhancements with the 256B Flit formatCXL 3.0:Doubles bandwidth with same latency11Confidential|CXL Consortium 2021128B TLP8B CRC6B FEC6B DLP108B TLP(PCIe 6.0 Flit Layout)126B Data 8B CRC6
18、B FEC2B Flit Hdr114B Data(CXL 256B Standard Flit Layout)6B CRC6B CRC6B FEC116B data2B Flit Hdr120B dataEven Flit-halfOdd Flit-half(CXL 256B Latency-Optimized Flit Layout)1:D.Das Sharma,A Low-Latency and Low-Power Approach for Coherency and Memory Protocols on PCI Express 6.0 PHY at 64.0 GT/s with PA
19、M-4 Signaling,IEEE Micro,Mar/Apr 2022(https:/ieeexplore.ieee.org/document/9662217)Confidential|CXL Consortium 202212RECAP:CXL 2.0 FEATURE SUMMARYMEMORY POOLINGDevice memory can be allocated across multiple hosts.2Multi Logical Devices allow for finer grain memory allocation1CXL 2.0 SwitchD1D2D3D4D#H
20、#H4H3H2H121Confidential|CXL Consortium 202213Add Supports single-level switching Enables memory expansion and resource allocation RECAP:CXL 2.0 FEATURE SUMMARYSWITCH CAPABILITYH1CXL SwitchCXLD1CXLD#CXLD3CXLD2CXLCXL 2.0Confidential|CXL Consortium 202214Preliminary as of August 2022CXL 3.0:MULTIPLE LE
21、VEL SWITCHING,MULTIPLE TYPE-1/2 DevicesEach hosts root port can connect to more than one device type(up to 16 CXL.cachedevices)Multiple switch levels(aka cascade)Supports fanout of all device types 1Type 1 or Type 2DeviceType 3 DevicesH1CXL SwitchCXLCXLCXL 2.0MemoryDCXLCXLCXLMemoryMemoryType 2Device
22、H1CXL 3.0FPGAMemoryNICFPGAType 2DeviceType 1DeviceType 3Device122CXLCXLCXL SwitchCXL SwitchCXL SwitchCXLCXLCXLCXLCXL2D1D3D2D4D5D#CXL 3.0 PROTOCOL ENHANCEMENTS(UIO and BI)for DEVICE TO DEVICE CONNECTIVITYCXL 3.0 enables non-tree topologies and peer-to-peer communication(P2P)within a virtual hierarchy
23、 of devices Virtual hierarchies are associations of devices that maintains a coherency domain P2P to HDM-DB memory is I/O Coherent:a new Unordered I/O(UIO)Flow in CXL.io the Type-2/3 device that hosts the memory will generate a new Back-Invalidation flow(CXL.Mem)to the host to ensure coherency if th
24、ere is a coherency conflictConfidential|CXL Consortium 20221511CXL Switch(es)H#H4H3H2H1CXLCXLCXLCXLCXLCXLCXLCXLCXLCXLCXLCXLD31D6Direct P2P access including HDM MemoryD11CXL Switch(es)H#H2H1CXLCXLCXLCXLShared MemoryCXL 3.0:COHERENT MEMORY SHARINGDevice memory can be shared by all hosts to increase da
25、ta flow efficiency and improve memory utilization Host can have a coherent copy of the shared region or portions of shared region in host cacheCXL 3.0 defined mechanisms to enforce hardware cache coherency between copies1Confidential|CXL Consortium 2022161S1S1 Copy2323Standardized CXL Fabric Manager
26、S2S1 CopyS2 CopyS2 CopyPooled MemoryConfidential|CXL Consortium 202217Preliminary as of August 2022CXL 3.0:POOLING&SHARINGCXL Switch(es)Standardized CXL Fabric ManagerD1S2D2D3D4S1D#S3H#H4H3H2H1S1 CopyS1 CopyS1 CopyS2 CopyS2 CopyS3 CopyS3 CopyExpanded use case showing memory sharing and poolingCXL Fa
27、bric Manager is available to setup,deploy,and modify the environment Shared Coherent Memory across hosts using hardware coherency(directory+Back-Invalidate Flows).Allows one to build large clusters to solve large problems through shared memory constructs.Defines a Global Fabric Attached Memory(GFAM)
28、which can provide access to up to 4095 entities123Confidential|CXL Consortium 202218FABRICS OverviewNodes can be any combination:HostsType 1 Device with cacheExample:Smart NICType 2 Device with cache and memoryExample:AI AcceleratorType 3 Device with memory Example:memory expanderPCIe Device11CXL Sw
29、itchHostCXLNodeCXLCXL SwitchHostCXLNodeCXLCXL SwitchNodeCXLHostCXLCXL SwitchNodeCXLHostCXLCXL SwitchEnd-PointCXLHostCXLCXL SwitchNodeCXLHostCXLCXL SwitchCXL SwitchConfidential|CXL Consortium 202219CXL 3.0:GLOBAL FABRIC ATTACHED MEMORY(GFAM)DEVICE CXL 3.0 enables Global Fabric Attached Memory(GFAM)ar
30、chitecture which differs from traditional processor centric architecture by disaggregating the memory from the processing unit and implements a shared large memory pool Memory can be of the same type or different types which can be accessed by multiple processors directly connected to GFAM or throug
31、h a CXL switchGFAMFabric InterfaceDRAMDRAMCPUCPUNICGPUNVMNVMConfidential|CXL Consortium 202220CXL 3.0:FABRICS EXAMPLE USE CASEMachine Learning Accelerator and GFAM Device in a Fabric ArchitectureCXL SwitchHostAcceleratorHostAcceleratorHostAcceleratorGFAMGFAMGFAMFM vHostNVMNVMDRAMDRAMGFAM enables mul
32、tiple media types,i.e.DRAM,Flash,future memory typesConfidential|CXL Consortium 202221CXL 3.0:FABRICS EXAMPLE USE CASEHPC/AnalyticsCXL SwitchHostAcceleratorGFAMGFAMGFAMHostAcceleratorHostAcceleratorHostAcceleratorNICNICFM vHostSharing memory and networking devices to reduce cost and improve efficien
33、cyConfidential|CXL Consortium 202222CXL 3.0:FABRICS EXAMPLE USE CASEComposable Systems with Spine/Leaf ArchitectureCXL 3.0 Fabric ArchitectureInterconnected Spine Switch SystemLeaf Switch NIC EnclosureLeaf Switch CPU EnclosureLeaf Switch Accelerator EnclosureLeaf Switch Memory EnclosureCXL SwitchAcc
34、eleratorMemoryCPUsCPUsCXL SwitchGFAMGFAMGFAMCXL SwitchCXL SwitchNICNICCXL SwitchCXL SwitchSpine SwitchesLeaf SwitchesEnd DevicesFabric ManagerExample traffic flow CXL 3.0 features Enhanced memory pooling and enables new memory usage models Multi-level switching with multiple host and fabric capabili
35、ties and enhanced fabric management New symmetric coherency capabilities Improved software capabilities CXL 3.0 Summary23Confidential|CXL Consortium 2021 CXL 3.0 introduces new usage models Delivers industry needs for higher bandwidth Optimized system level flows with advanced switching,efficient peer-to-peer and fine-grained resource sharing across multiple domains Call to Action Join CXL Consortium Follow us on Twitter and LinkedIn for updates!Thank YouConfidential|CXL Consortium 2022