《用于 ML 阵列边缘的高密度光互连.pdf》由会员分享,可在线阅读,更多相关《用于 ML 阵列边缘的高密度光互连.pdf(9页珍藏版)》请在三个皮匠报告上搜索。
1、OCP Global Summit October 18,2023|San Jose,CAKaren Liu,Nubis CommunicationsHigh-Density Optical Interconnect for ML ClustersIntra-cluster network needs to look more like chip-to-chip I/O than networking I/OTechnology Gap for Pod-Pod within a ClusterTo external networkChip-to-chipLow power/small die
2、areaExternal networking Traditional copper/optics Single xPULocal PodFull ClusterCluster I/O needOptical reach with chip edge density Pod-pod interconnect30 mm24 dB each end)Architectural ParameterDescription(Capability)Application requirement or technical capabilityApplicationTechnical CapabilityUs
3、e caseShelf-shelf interconnect between xPU within a cluster.High-density Linear I/OLinks/Host32 144 lanes/chip 2x2 to 5x5 chips per local pod“substrate”2-3 links per pod,depends on topology100 Gbps/laneAny aggregation of lanes per linkHost interface(power)32 Gbps,64 Gbps,100 Gbps PAM4;56 NRZ No hard
4、 power cut-off,but less is betterMR,LR SerdesUn-retimed/retimed?Unretimed preferred for latency&powerUnretimed preferred for densityFEC requirementsFEC required,KP4 mostly,varies by application KP4 is defaultParameterUnitsMinMax(Capability)&AssumptionsData rate per laneGbps25200#lanes/link#1632Aggre
5、gate BW/linkGbps16003200Linear BW densityGbps/mm1005001 link per outward facing edge chip edgeAreal BW densityGbps/mm231015-20 mm edge width x 30mm reticle dimensionEnergy efficiency single direction linkpJ/bitLess is better10 (4.5 typ)Energy efficiency with external light sourcepJ/bitLess is better
6、 10 (6)External laser is in 3(500)“Once optical,people will likely want more reach”Latency(no FEC)nanosec1 n“Same as copper for same reach.”BER(no FEC)1e-8includes 2 orders of magnitude desired for marginLatency(post FEC)nanoseclatency with FEC for the optical link.Host to host may also have FECBER(
7、Post FEC)Optical channelMM,SM,PMMM/SMStandard fiber,nothing exoticOptical connectorsNot required but nice to haveMax Operating Temp C4575Varies (75C for module,55C for laser)Reliability/linkFITSMaximum operating temperatureLiquid cooling Liquid cooling is preferred but not strictly required.Cost(initial volumes)$/GbpsCost(large volumes)2x cost of copperCommercial GA timingCY2H2024