《用于PCIE的高能效光链路.pdf》由会员分享,可在线阅读,更多相关《用于PCIE的高能效光链路.pdf(16页珍藏版)》请在三个皮匠报告上搜索。
1、OCP Global Summit October 18,2023|San Jose,CAJeff Hutchins,RanovusEnergy Efficient Optical Links for PCIe AI/ML Scale-OutKey objectives for energy efficient links for AI/MLPresented at Lightcountings“Linear Drive Enables Green All-Optical Connectivity for Datacenters Webinar”,March 2023 Presented at
2、 Lightcountings“Linear Drive Enables Green All-Optical Connectivity for Datacenters Webinar”,March 2023 Key PCIe Enablers for AI/ML Scale-Out:qReduced power consumptionqImproved densityqReduced latencyMoores law is not keeping pace further increasing pressure on networking3AI datacenter applications
3、 for energy efficient PCIe links4Cloud DCFront-End w/TCP/IP,UDP,etc.Ethernet(200GbE)FRONT ENDNetwork Type I/Fs Standards based Energy efficient Potentially high density for some applicationsCompute type I/Fs(PCIe like interfaces)Network Type I/FsComp-IOComp-MemComp-CompHigh-PerfHBMDDR/LPDDRCXL.memUP
4、I xGMI NVLinkCXL.cacheNVLinkInfinibandEthernetPCIeBack-End w/low latency(GPU-GPU)BACKEND/COMPUTEAdapted from oif2023.269,Ram Huggahalli,Microsoft Bookended or standards based Low Latency Energy efficient Potentially high density for some applications Bookended or standards based Low Latency Energy e
5、fficient Potentially high density for some applicationsCategorization of data Center linksAI datacenter applications for energy efficient PCIe links5AI/ML InterfacesPCIe-like InterfacesCharacteristicsCloud DCFront-End w/TCP/IP,UDP,etc.Ethernet(200GbE)FRONT ENDNetwork Type I/Fs Standards based Energy
6、 efficient Potentially high density for some applicationsCompute type I/Fs(PCIe like interfaces)Network Type I/FsComp-IOComp-MemComp-CompHigh-PerfHBMDDR/LPDDRCXL.memUPI xGMI NVLinkCXL.cacheNVLinkInfinibandEthernetPCIeBack-End w/low latency(GPU-GPU)BACKEND/COMPUTEAdapted from oif2023.269,Ram Huggahal
7、li,Microsoft Bookended or standards based Low Latency Energy efficient Potentially high density for some applications Bookended or standards based Low Latency Energy efficient Potentially high density for some applicationsCategorization of data Center linksArchitectures to achieve AI/ML PCIe scale-o
8、ut objectives6Root ComplexRDLHost PCBHDINon-retimed OEEliminating the DSP(Power,Latency)Better electrical channel(non-retimed co-packaged case)(Power,Density,Link Accountability)RetimedRetimedTIAASICCDRPAM4CDReRxeTxDRVTIAASICeRxeTxDRVCDRPAM4CDRNon-re,med(linearly or non-amplified)Partially Retimed(C
9、DR)TIAASICeRxeTxDRVTIAASICDRVAGCCDReTxeRxAGCCTLEeTxeRxoTxoRxoRxoTxeTxeRxeTxeRxoTxoRxoRxoTxeTxeRxqLatency can be reduced by eliminating the retimer(non-retimed)qLink accountability&density can be improved through short channels(e.g.co-packaging)Each approach presents tradeoffs!LatencyDensityEnergy Ef
10、ficiencyLink AccountabilityRetimedRemove DSP-Partially or non-retimed OECo-packaging(better electrical channel)ApproachesAI/ML Scale-Out PCIe Application ObjectivesTwo key issues for PCIe links containing optics7Two key issues for PCIe links containing optics which will look explore:qImpact of meeti
11、ng PCIe Electrical ComplianceqPain points supporting PCIe protocolPCIe electrical compliance with optical links8PCIe Electrical Link:Root ComplexEnd PointPCIe compliance pointsElectrical LinkPCIe defines electrical PCIe compliance points which must be met to be“compliant”PCIe electrical compliance w
12、ith optical links9PCIe Electrical Link:Root ComplexEnd PointPCIe compliance pointsElectrical LinkAn optical link can be inserted in the link while maintaining PCIe electrical compliance points by including“optically enabled”PCIe retimers which“hide”the optical link from the root complex and end poin
13、tNote that PCIe hasnt and isnt planning to define optical complianceOptically Enabled PCIe Retimer Link:Root ComplexOptically EnabledPCIe RetimerEnd PointOptically EnabledPCIe RetimerPCIe compliance pointsPCIe compliance pointsPotential optical compliance pointsOptical LinkThe impact of optics in th
14、e link can be hidden from each end with PCIe retimers or redriversPCIe electrical compliance with optical links10The impact of optics in the link can be hidden from each end with PCIe retimers or redriversPCIe Electrical Link:Root ComplexEnd PointPCIe compliance pointsElectrical LinkWithout a PCIe r
15、etimer,the system must be designed to meet the PCIe electrical compliance points at either end of the link when interfaced with an optical linkOptically Enabled PCIe Retimer Link:Root ComplexOptically EnabledPCIe RetimerEnd PointOptically EnabledPCIe RetimerPCIe compliance pointsPCIe compliance poin
16、tsPotential optical compliance pointsOptical LinkOptical Link without PCIe Retimer:Root ComplexNon-PCIe-RetimedOptical TransceiverEnd PointNon-PCIe-RetimedOptical TransceiverPCIe compliance pointsOptical LinkCompliance methodology for links with optics11One of the key challenges is to define a robus
17、t specification methodology for less than fully retimed linksOptically Enabled PCIe Retimer Link:Root ComplexOptically EnabledPCIe RetimerEnd PointOptically EnabledPCIe RetimerPCIe compliance pointsPCIe compliance pointsPotential optical compliance pointsOptical LinkOptical link can be bookended(e.g
18、.,AOC)or standardized when coupled with a PCIe retimerWithout a PCIe retimer,its challenging to define compliance points for interoperabilityOptical Link without PCIe Retimer:Root ComplexNon-PCIe RetimedOptical TransceiverEnd PointNon-PCIe RetimedOptical TransceiverPCIe compliance pointsOptical Link
19、Without a retimer,there is no partitioning of the the optical and electrical portion of the linkThe optical signal is dependent on the signal quality of the incoming electrical signal,thereby potentially impacting the PCIe compliance at the receiving end of the linkWithout a PCIe retimer,interoperab
20、ility compliance is challenging to define without sacrificing significant link budget PCIe protocol was architected for copper links,not optical links12Protocol&OpticsqProtocol may need adaptions to support optics and meet the requirements.For example:q Ethernet work wells with opticsq PCIe is less
21、accommodatingVarious challenges with PCIe protocol and optics to be fully PCIe compliantqRx DetectqImpedance used to determine if remote electrical receiver is present and ready for traffic,but doesnt work with optics!qInconsistent SignalingqElectrical idle(EI)(zero differential voltage)is difficult
22、 with AC coupling.The eTx of the optical link may not achieve compliance for EIqTime to achieve sufficient BER upon traffic resumption may be longer than eRxs timeoutqOther considerationsqSome signaling states may approach the lower cutoff frequency of the opticsqIf needed,accommodations for sideban
23、d signals need to be consideredqSpread spectrum clocks may not work well with optics,prefer SRNS mode(Separate Reference clocks with no Spread spectrum)qNon-PCIe Retimed Links:Some adaptations to the PCIe protocol are helpful to be more“optics friendly”qOptically enabled retimers:Supports existing P
24、CIe protocol electrically while presenting“optics friendly”behavior for the optical linkEIEIEIRoot ComplexNon-PCIe RetimedOptical TransceiverEnd PointNon-PCIe RetimedOptical TransceiverPCIe compliance pointsEnd-to-end modeling is key to developing interoperable standards13Many suppliers developed ac
25、curate optical link modelsqAre often proprietaryqOften computationally intensiveqThese models will be important to define interoperability standardsqHowever,a generic,technology independent model would be a significant to the industry to propel standardization progressHost PCBCo-Packaged Assembly Su
26、bstrateASICSubstrateAnalog Optical EngineMR/LR SerDesHost PCBCo-Packaged Assembly SubstrateASICSubstrateAnalog Optical EngineMR/LR SerDes 2kmqTx SerDes(bare die)qElectrical Channel+LGA(1)qOptical Engine TxqRx SerDes(bare die)qElectrical Channel+LGAqOptical Engine RxqFiber ModelIBIS-AMIs-parametersIB
27、IS-AMIredriverIBIS-AMIredrivers-parametersIBIS-AMIGNOEM:Generalized non-linear model of a hypothetical optical transmitterGNOEM is a technology independent model of an optical engine which provides sufficent accuracyWorst case residualExample with hypothetical OE demonstrates non-linearitySerDes12dB
28、 TLDRVMODYo(TP2)YiSignals are normalizedTime window selected to show worst case residualINPUTOUTPUTMODEL OUTPUTINPUTOUTPUTMODEL OUTPUTEnd-to-end modeling is key to developing interoperable standardsA generic,technology independent model would be a significant to the industry to propel standardizatio
29、n progressEnergy Efficient Optical Links for PCIe AI/ML Scale-Out14There has been significant interest in a next generation of energy efficient links to achieve some combination of:qReduced power consumptionqReduced latencyqImproved densityThere are a variety of architectural approaches to achieve t
30、he various combinations of the objectives:qOptical enabled PCIe retimer(or a redriver)qNon-PCIe retimed optical links There are several challenges in achieving the objectives,especially at the higher signaling ratesqSpecifying compliance for interoperability(especially where clear retiming partition
31、s arent present)qEmploying end-to-end modeling with potentially technology independent,computationally efficient models qChoosing a strategy to address PCIe protocols not being optics friendlyqAdapt PCIe protocol to be optics friendlyqHiding the presence of optics with an PCIe optically enable retim
32、erOptically enabled PCIe links will enable AI/ML PCIe scale-out but there are numerous challenges to be overcomeTechnology Comparison15Optically Enabled PCIe Retimer Link:Root ComplexOptically EnabledPCIe RetimerEnd PointOptically EnabledPCIe RetimerOptical LinkCo-packagedwithoutPCIe retimer:Root Co
33、mplexEnd PointOptical LinkBlock Diagrams:Non-retimed OEPCIeRetimerNon-retimed OEPCIeRetimerNon-retimed OENon-retimed OEArchitectural ParameterDescriptionApplication requirement or technical capabilityAnalog(non-retimed)optical engine,protocol agnostic,low latency,co-packaged or used in a pluggableUs
34、e caseAI/ML PCIe scale-out Co-packaged with Host ASIC and non-retimed Near or Co-packaged with PCIe retimerLinks/HostDependent on customer applicationHost interface(power)Application may use MR,LR,or custom LR+SerDesUn-retimed/retimed?Non-retimed.Retiming can be provided when packaged as an opticall
35、y enabled PCIe retimerFEC requirementsPCIe:PCIe standard FECTechnology Comparison16ParameterUnitsMinMaxCommentsData rate per laneGbps-64/128100/200PCIe6/7Ethernet rate capable module#lanes/link#832PICs support 8,16,or 32 lanesAggregate BW/linkGbps8006400Linear BW densityGbps/mm100800Based on full PI
36、C widthAreal BW densityGbps/mm28.366.7Based on full PIC die sizeEnergy efficiency single direction linkpJ/b2.0/4.0OE+laserReachm500500Latency(no FEC)ns0.1+5*d0.1+5*dOE+ToFBER(no FEC)-1e-8 1e-8Latency(post FEC)nsFEC+0.1+5*dFEC+0.1+5*dHost+OE+ToFBER(Post FEC)-1e-12 1e-12Optical channelMM,SM,PMSMSMStandard SMFOptical connectorsYes/NoYesYesMax Operating Temp(OE lid)C7070Liquid cooling compatibility-Air&ImmersionAir&Immersion