上海品茶

Hot Chips 2022 CXL3 Coherence Deep Dive.pdf

编号:136969 PDF 48页 1.91MB 下载积分:VIP专享
下载报告请您先登录!

Hot Chips 2022 CXL3 Coherence Deep Dive.pdf

1、PublicCoherence Deep Dive for CXLRob Blankenship Intel Corporation and CXL Protocol Working Group co-chairAugust 2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial Public Coherence/Caching Primer CXL Cache Hierarchy CXL.Cache Deep Dive What is new in CXL3(Device Scaling)CXL.Mem Deep Dive

2、What is new in CXL3 Direct P2P to HDM/Multi-Host CoherenceAgenda8/18/20222Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial PublicCaching PrimerCopyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 8/18/20223Public Caching temporarily brings data closer to the consumer Improves latency a

3、nd bandwidth using prefetching and/or locality Prefetching:Loading Data into cache before it is required Spatial Locality(locality is space):Access address X then X+n Temporal Locality(locality in Time):Multiple access to the same DataCaching Overview8/18/20224Copyright|CXL Consortium 2020-Hot Chips

4、 2022 CXL Tutorial AcceleratorLocal Data CacheAccess Latency:10nsDedicated Bandwidth:100+GB/sReadDataReadDataHost MemoryAccess Latency:200nsShared Bandwidth:100+GB/sPublic Modern CPUs have 2 or more levels of coherent cache Lower levels(L1),smaller in capacity with lowest latency and highest bandwid

5、th per source.Higher levels(L3),less bandwidth per source but much higher capacity and support more sources Device caches are expected to be up to 1MB.CPU Cache/Memory Hierarchy with CXL8/18/20225Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial Note:Cache/Memory capacities are examples and

6、not aligned to a specific product.CPU Socket 0CPUL150 KB500 KB L2.10 MB L3(aka LLC)500 KB L210 GB Directly Connected Memory(aka DDR)CXL.Cache.CXL.mem10 GBHome AgentCXL.ioPCIeCPU Socket 1CPUL150 KBCPUL150 KBCPUL150 KBCoherent CPU-to-CPU Symmetric LinksCXL.mem10 GBCXL.CacheCXL.ioPCIeWr Cache50KBDevice

7、1 MBDevice1 MBWr Cache50KBPublicHow do we make sure updates in cache are visible to other agents?Invalidate all peer caches prior to update Can managed with software or hardware CXL uses hardware coherenceDefine a point of“Global Observation”(aka GO)when new data is visible from writesTracking granu

8、larity is a“cacheline”of data 64-bytes for CXLAll addresses are assumed to be Host Physical Address(HPA)in CXL cache and memory protocols Translations done using Address Translation Services(ATS).Cache Consistency8/18/20226Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial Public Modern CPU c

9、aches and CXL are built on M,E,S,I protocol/states Modified Only in one cache,Can be read or written,Data NOT up-to-date in memory Exclusive Only in one cache,Can be read or written,Data IS up-to-date in memory Shared Can be in many caches,Can only be read,Data IS up-to-date in memory Invalid Not in

10、 cache M,E,S,I is tracked for each cacheline address in each cache Cacheline address in CXL is Addr51:6 Notes:Each level of the CPU cache hierarchy follows MESI and layers above must be consistent Other extended states and flows are possible but not covered in context of CXLCache Coherence ProtocolC

11、opyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 8/18/20227Public All peer caches managed by the“Home Agent”within the cache level.A“Snoop”is the term for the Home to check cache state and causing cache state changes.Example CXL Snoops:Snoop Invalidate(SnpInv):Causes a cache to degrade to I-

12、state,and must return any Modified data.Snoop Current(SnpCurr):Does not change cache state,but does return indication of current state and any modified data.How are Peer Caches Managed?8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 8PublicCXL Cache ProtocolCopyright|CXL Consortiu

13、m 2020-Hot Chips 2022 CXL Tutorial 8/18/20229PublicSimple set of 15 reads and writes from the device to host memoryKeep the complexity of global coherence management in the host.CXL3 enables up to 16 cache devices below each root port Prior generations limited to 1 per root port.Cache Protocol Summa

14、ry 8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 10PublicCache Protocol Channels3 channels in each direction:D2H vs H2DData and RSP channels are pre-allocatedD2H Requests from the device H2D Requests are snoops from the hostOrdering:H2D Req(Snoop)push H2D RSP8/18/2022Copyright|C

15、XL Consortium 2020-Hot Chips 2022 CXL Tutorial 11PublicRead Flow Diagram to show message flows in timeX-axis:AgentsY-axis:Time8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 12PublicRead Flow Diagram to show message flows in timeX-axis:AgentsY-axis:Time8/18/2022Copyright|CXL Conso

16、rtium 2020-Hot Chips 2022 CXL Tutorial 13PublicCPU Socket 0CPUL150 KB500 KB L2.10 MB L3(aka LLC)500 KB L210 GB Directly Connected Memory(aka DDR)CXL.Cache.CXL.mem10 GBHome AgentCXL.ioPCIeCPU Socket 1CPUL150 KBCPUL150 KBCPUL150 KBCoherent CPU-to-CPU Symmetric LinksCXL.mem10 GBCXL.CacheCXL.ioPCIeWr Ca

17、che50KBDevice1 MBDevice1 MBWr Cache50KBMapping Flow Back to CPU HierarchyCXLDevice8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 14PublicCPU Socket 0CPUL150 KB500 KB L2.10 MB L3(aka LLC)500 KB L210 GB Directly Connected Memory(aka DDR)CXL.Cache.CXL.mem10 GBHome AgentCXL.ioPCIeCPU

18、 Socket 1CPUL150 KBCPUL150 KBCPUL150 KBCoherent CPU-to-CPU Symmetric LinksCXL.mem10 GBCXL.CacheCXL.ioPCIeWr Cache50KBDevice1 MBDevice1 MBWr Cache50KBMapping Flow Back to CPU Hierarchy Peer Cache can be:Peer CXL Device with CacheCPU Cache in Local SocketCPU Cache in Remote SocketCXLDevicePeerCachePee

19、r CachePeer CachePeer CachePeer Cache8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 15PublicCPU Socket 0CPUL150 KB500 KB L2.10 MB L3(aka LLC)500 KB L210 GB Directly Connected Memory(aka DDR)CXL.Cache.CXL.mem10 GBHome AgentCXL.ioPCIeCPU Socket 1CPUL150 KBCPUL150 KBCPUL150 KBCohere

20、nt CPU-to-CPU Symmetric LinksCXL.mem10 GBCXL.CacheCXL.ioPCIeWr Cache50KBDevice1 MBDevice1 MBWr Cache50KBMapping Flow Back to CPU Hierarchy Peer Cache can be:Peer CXL Device with CacheCPU Cache in Local SocketCPU Cache in Remote Socket Memory Controller can be:Native DDR on Local SocketNative DDR on

21、Remote SocketCXL.mem on peer DeviceCXLDevicePeerCacheHomePeer CachePeer CacheHomeMemoryControllerMemoryControllerMemoryControllerMemoryControllerPeer CachePeer Cache8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 16PublicExample#2:Write For Cache Writes there are three phases:Owne

22、rshipSilent WriteCache EvictionCXLDevicePeerCacheHomeISMemoryControllerLegendCache State:ModifiedExclusiveSharedInvalidAllocate Tracker Deallocate Tracker 8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 17PublicExample#2:Write For Cache Writes there are three phases:OwnershipSilen

23、t WriteCache EvictionOwnershipCXLDevicePeerCacheHomeISI ES IMemoryControllerLegendCache State:ModifiedExclusiveSharedInvalidAllocate Tracker Deallocate Tracker 8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 18PublicExample#2:Write For Cache Writes there are three phases:Ownership

24、Silent WriteCache EvictionOwnershipCXLDevicePeerCacheHomeISI ES IE MMemoryControllerLegendCache State:ModifiedExclusiveSharedInvalidAllocate Tracker Deallocate Tracker Write8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 19PublicExample#2:Write For Cache Writes there are three pha

25、ses:OwnershipSilent WriteCache EvictionOwnershipWriteEvictionCXLDevicePeerCacheHomeISI ES IE MDataM IMemoryControllerLegendCache State:ModifiedExclusiveSharedInvalidAllocate Tracker Deallocate Tracker 8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 20PublicExample#3:Steaming Write

26、 Direct Write to Host Ownership+Write in a single flow.Rely on completion to indicate ordering May see reduced bandwidth for ordered traffic Host may install data into LLC instead of writing to memoryCXLDevicePeerCacheHomeISS IDataMemoryControllerLegendCache State:ModifiedExclusiveSharedInvalidAlloc

27、ate Tracker Deallocate Tracker 8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 21Public15 Request in CXL Reads:RdShared,RdCurr,RdOwn,RdAny Read-0:RdownNoData,CLFlush,CacheFlushed Writes:DirtyEvict,CleanEvict,CleanEvictNoData Streaming Writes:ItoMWr,WrCur,WOWrInv,WrInv(F)8/18/2022C

28、opyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 22PublicCXL Memory ProtocolCopyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 8/18/202223PublicSimple reads and writes from host to memory Memory Technology Independent HBM,DDR,PMem Architected hooks to manage persistenceIncludes 2-bits

29、 of“meta-state”per cacheline Memory Only device:Up to host to define usage.For Accelerators:Host encodes required cache state.Host-managed Device Memory(HDM)comes in 3 types:Host Managed Coherence(HDM-H)Device Managed Coherence(HDM-D)Device Managed Coherence with Back-Invalidation(HDM-DB)new in CXL3

30、Memory Protocol Summary8/18/202224Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial Public3 channels in each direction M2S Request(Req),Request w/Data(RwD)S2M Non-Data Response(NDR),Data Response(DRS)which are pre-allocated.M2S BIRsp,S2M BISnp used for HDM-DB to manage coherence New in CXL3.

31、Limited Ordering Req channel for HDM-D memory(CXL2 Accelerators)NDR Channel for conflict flows with HDM-DBMemory Protocol Channels8/18/202225Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial PublicExample#1:WriteMedia ECC handled by deviceHDM-H provide 2-bits of host defined Meta Value which

32、 device optionally supportsNote:only host caching of HDM-H(Host only coherent)CXL Memory Only DeviceHostCXL DeviceMemory ControllerMemory MediaSent when data visible to future readsNew MetaValue8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 26PublicExample#2:ReadMeta Value Change

33、 requires device to write.Memory Only DeviceHostCXL DeviceMemory ControllerMemory MediaNew MetaValueOld MetaValue8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 27PublicExample#3:Read no MetaHost may indicate no Meta-state update required on readsMemory Only DeviceHostCXL DeviceMe

34、mory ControllerMemory MediaNo Meta Update Needed8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 28PublicExample#4:MemInvUsed to read/update Meta-state without reading the data itself.Memory Only DeviceHostCXL DeviceMemory ControllerMemory MediaNew MetaValueOld MetaValue8/18/2022Co

35、pyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 29Public“Device Coherent”Provide ability for host and device to cacheRequest MetaValue field indicates host cache state.Any Host can be in M,E,S,I states Shared Host can be in S or I states and indicating the host requesting S-state.Invalid Hos

36、t is in I-state and is not requesting cache state.Request SnpType indicates Device Cache state change SnpInv Invalidate Device Cache SnpData Device Cache in I or S state.Device Coherence Engine(Dcoh)is the final conflict resolution arbiter between host and device accesses for HDM-D*memory.HDM-D/HDM-

37、DB Common Attributes8/18/202230Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial Public CXL.mem requests indicate coherence required from the host.CXL.Cache used for device to change host cache state Host must detect device accessing its own memory and trigger special flows which return a“Fo

38、rward”message.Can be blocked behind access to host memory.Requires device to implement full directory tracking(aka Bias Table)Device Coherent(HDM-D)Specifics8/18/202231Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial PublicFull Directory Bias TableDevice state of 1 or 2 bits per Cacheline i

39、ndicating if host has a cached copy Device Bias:No host caching,allowing direct reads Host Bias:Host may have a cached copy,so read goes through the host Optionally tracking Shared vs Any state in host.With Shared State,the device may directly read data,but must not modify.Host tracks which peer cac

40、hes have copiesPublicHost Bias ReadMemRdFwd message sent after coherence resolved on M2S Request ChannelHost/HADCOHDev$Dev MemBIAS=HOSTPeer CacheRdOwn XHitSRspISnpInv XSS to IMemRdFwd XDataAccelerator Device with HDM-DRdOwn XIBIAS=DEVICEGO-EI to EMemRdFwd XPublicDevice Bias ReadNo messages on CXL in

41、terfaceHost/HADCOHDev$Dev MemBIAS=DEVICEPeer CacheIMemRdFwd XDataAccelerator Device with HDM-DRdOwn XIGO-EI to EPublicDevice Cache EvictionsE/M in cache imply Bias=Device so no indication to hostHost/HADCOHDev$Dev MemBIAS=DEVICEPeer CacheIMemWr XCmpAccelerator Device with HDM-DDirtyEvict XMGO-WrPull

42、M to IData XPublicHost Bias Streaming WriteMemRdFwdmessage sent after coherence resolvedHost/HADCOHDev$Dev MemBIAS=HOSTPeer CacheWOWr XHitSRspISnpInv XSS to IMemWr XDataAccelerator Device with HDM-DWOWr XIBIAS=DEVICEGO-WrPullMemWrFwd XCmpExtCmpPublicDevice Bias Streaming WriteNo message to hostHost/

43、HADCOHDev$Dev MemBIAS=DEVICEPeer CacheIAccelerator Device with HDM-DWOWr XIMemWr XDataGO-WrPullCmpExtCmpPublicHDM-DB New in CXL3Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 8/18/202238PublicHDM-DB“Device Coherent with Back-Invaldation”(HDM-DB)adds BISnp and BIRsp channel for optimize co

44、herence management enabling inclusive Snoop Filter(SF)architectures.Same“BIAS Table”states tracking host coherence:I,S,AInclusive SF architecture may block M2S Request waiting for Back-Invalidation Snoop(BISnp)to complete which enables sizing to match host caching expect instead of memory capacity.P

45、ublicEnables Inclusive Snoop Filter(SF)to track host cachingDevice can block new requests waiting for SF VictimBack-Invalidation Snooping(HDM-DB)8/18/202240Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial Public To improve efficiency there is BISnp messages that cover more than one cachelin

46、e(aka“Block”).Either 2(128B)or 4(256B)cachelines are supported.Block Access with BISnp8/18/202241Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial PublicNew Use Models with HDM-DBCopyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 8/18/202242Public HDM-DB enables direct P2P from CXL or

47、 PCIe sources in CXL3 In prior generation all HDM access must go through the host CPU to resolve coherence.HDM-DB will directly resolve coherence with the host before committing the P2P.Direct Peer-to-Peer(P2P)to HDM8/18/202243Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial H1CXL SwitchD1C

48、XL Type-2/3D2PCIe AccelD3CXLAccelHDM-DBPublic Pooled Memory and CXL Switching added in CXL2 allow for dedicated assignment of memory resources from to a host.Shared Memory assigned to multiple hosts enabled in CXL3 Multi-Host Hardware Coherent Shared Memory possible with HDM-DB MORE on these uses in

49、 Fabric TutorialPooled and Shared Memory8/18/202244Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial H1H2H3H#CXL SwitchD1D2D3D4D#.Hardware Coherent SharedSoftware Coherent SharedPublic45Summary CXL protocols are evolving CXL2 added switching and pooled memory capabilities.CXL3 enabling new c

50、apabilities:CXL.Cache Scaling CXL.Mem Back-Invalidation Channel for SF,Direct P2P,Multi-Host Coherence Port Based Routing(covered in Fabric Tutorial)Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 8/18/202245PublicThank YouCopyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial PublicJoin Today!puteexpresslink.org/joinFollow Us on Social MediaComputeExL Consortium Channel8/18/2022Copyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial 47PublicAudience Q&ACopyright|CXL Consortium 2020-Hot Chips 2022 CXL Tutorial

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(Hot Chips 2022 CXL3 Coherence Deep Dive.pdf)为本站 (2200) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
会员动态
会员动态 会员动态:

wei**n_... 升级为标准VIP  wei**n_... 升级为高级VIP

wei**n_...  升级为至尊VIP  一朴**P...  升级为标准VIP

 133**88... 升级为至尊VIP wei**n_...  升级为高级VIP

 159**56... 升级为高级VIP  159**56...  升级为标准VIP

升级为至尊VIP  136**96...  升级为高级VIP

wei**n_...  升级为至尊VIP  wei**n_... 升级为至尊VIP 

wei**n_... 升级为标准VIP  186**65...  升级为标准VIP

 137**92... 升级为标准VIP  139**06... 升级为高级VIP

130**09...   升级为高级VIP  wei**n_...  升级为至尊VIP

 wei**n_...  升级为至尊VIP wei**n_...  升级为至尊VIP 

 wei**n_... 升级为至尊VIP 158**33...  升级为高级VIP 

 骑**... 升级为高级VIP wei**n_...  升级为高级VIP

wei**n_...  升级为至尊VIP 150**42...  升级为至尊VIP 

185**92...  升级为高级VIP  dav**_w... 升级为至尊VIP

zhu**zh... 升级为高级VIP  wei**n_... 升级为至尊VIP

136**49... 升级为标准VIP 158**39... 升级为高级VIP 

wei**n_...  升级为高级VIP  139**38... 升级为高级VIP 

  159**12... 升级为至尊VIP 微**... 升级为高级VIP

 185**23...  升级为至尊VIP  wei**n_... 升级为标准VIP

152**85... 升级为至尊VIP ask**un  升级为至尊VIP

136**21... 升级为至尊VIP  微**... 升级为至尊VIP

135**38...  升级为至尊VIP  139**14... 升级为至尊VIP  

  138**36... 升级为至尊VIP 136**02... 升级为至尊VIP  

139**63...  升级为高级VIP  wei**n_...  升级为高级VIP

Ssx**om  升级为高级VIP  wei**n_... 升级为至尊VIP

131**90... 升级为至尊VIP  188**13... 升级为标准VIP 

159**90...  升级为标准VIP 风诰  升级为至尊VIP

182**81...  升级为标准VIP 133**39... 升级为高级VIP

 wei**n_... 升级为至尊VIP  段** 升级为至尊VIP 

wei**n_...   升级为至尊VIP  136**65... 升级为至尊VIP

136**03...  升级为高级VIP wei**n_...  升级为标准VIP 

137**52...  升级为标准VIP 139**61...  升级为至尊VIP

 微**... 升级为高级VIP  wei**n_... 升级为高级VIP 

188**25... 升级为高级VIP   微**...  升级为至尊VIP 

wei**n_... 升级为高级VIP  wei**n_... 升级为标准VIP

wei**n_...   升级为高级VIP  wei**n_... 升级为标准VIP 

186**28...  升级为标准VIP  微**... 升级为至尊VIP

 wei**n_...  升级为至尊VIP  wei**n_...  升级为高级VIP

 189**30... 升级为高级VIP 134**70...  升级为标准VIP

 185**87...  升级为标准VIP  wei**n_... 升级为高级VIP 

wei**n_...   升级为至尊VIP 微**...  升级为至尊VIP 

wei**n_... 升级为标准VIP   wei**n_...  升级为至尊VIP

wei**n_...  升级为标准VIP  132**09... 升级为至尊VIP 

麦提  升级为高级VIP  wei**n_... 升级为高级VIP

 wei**n_... 升级为至尊VIP wei**n_... 升级为标准VIP

 wei**n_... 升级为至尊VIP wei**n_...   升级为标准VIP

wei**n_... 升级为至尊VIP wei**n_... 升级为标准VIP 

182**18...  升级为高级VIP  中**... 升级为至尊VIP

136**77... 升级为标准VIP  wei**n_...  升级为标准VIP 

180**43... 升级为至尊VIP  桃** 升级为至尊VIP