《Keynote 3:NVIDIA.PDF》由会员分享,可在线阅读,更多相关《Keynote 3:NVIDIA.PDF(16页珍藏版)》请在三个皮匠报告上搜索。
1、SmartNICs and DPUs Accelerate Generative AI at Data Center ScaleKevin Deierling,VP of Networking|SmartNICs Summit,June 2023Democratizing AI Across Diverse FieldsAI Workloads Accelerating Data Center Transformation ChatGPT is the fastest-growing application in historyChatGPT49423926050What
2、sAppFacebookSnapchatInstagramTikTokTime to 100 Million Users(months)DPUNVIDIA Full Stack Compute and NetworkingFueling giant-scale AI infrastructureModern AI is a Data Center Scale Computing WorkloadData centers are becoming AI factories:data as input,intelligence as outputAlexNetVGG-19Seq2SeqResnet
3、InceptionV3XceptionResNeXtDenseNet201ELMoMoCo ResNet50Wav2Vec 2.0TransformerGPT-1BERT LargeGPT-2XLNetMegatron-NLGMicrosoft T-NLGGPT-3MT NLG 530BBLOOMChinchillaPaLM1001,00010,000100,0001,000,00010,000,000100,000,0001,000,000,00010,000,000,0002000020202120222023Before
4、Transformers=8x/2yrsTransformers=215x/2yrsChatGPTSingle GPUHGX 8-GPU100s-1000s HGX 8-GPUSystemsTraining Compute(petaFLOPs)AI Training Computational RequirementsNetworking for AI Data CentersAI FactoriesSingle or few users|Extremely large AI models|NVLink and InfiniBand AI fabricAI CloudMulti-tenant|
5、Variety of workloads|Ethernet networkThe Core of AI Factories NVIDIA AI Compute Networking#of GPUEthernetInfiniBandInfiniBand+NVLinkAIFactoryAICloudThroughputAI Factories and Clouds Require Different Infrastructure NetworkingAI Clouds Going Through A Major ChangeGenerative AI workloads require new c
6、lass of EthernetLoosely Coupled ApplicationsDistributed ComputingTCP(Low BandwidthFlows and Utilization)RoCE(High Bandwidth Flows and Utilization)High Jitter ToleranceLow Jitter ToleranceOversubscribed TopologiesPerformance Optimized TopologiesHeterogeneous TrafficAverage Multi-PathingBursty Network
7、 CapacityPredictive PerformanceControl/User Access Network(North-South)AI Fabric(East-West)Control NetworkAI FabricNVIDIA Spectrum-X PlatformWorlds first high-performance Ethernet for AIFull stack optimized for Generative AI cloudsSpectrum-4 Ethernet Switch,BlueField-3 DPURoCE adaptive routing and p
8、erformance isolationEnd-to-end cloud provisioning and securitySpectrum-4 Ethernet SwitchBlueField-3DPU04080120160200Perf/GPUPerf/TCO$Perf/WattTraditional EthernetSpectrum-X AI EthernetNVIDIA Spectrum-X Delivers the Highest GPT-3 Performance400GbE/GPU1.7x Performance/GPU1.7x Performance/TCO$1.7x Perf
9、ormance/PowerGPT3 175B Performance per GPU,TCO$,and Watts with 16K H100 GPUs%Sustainable Cloud Computing Nearly all data centers are power limited There are hard limits on power inputs in existing data centers Increasing electricity costs are becoming a long-term trend Need to get more out of your d
10、ata centers Energy efficiency becomes a high priority BlueField DPU enables more while consuming less powerBlueField-3 enables power-efficient cloud data centers34472848400500600700800Idle LoadIPsec on CPUBlueField DPU-Accelerated+137+384Power Usage(W)*Compared to idle load power consumpt
11、ionIPsec 34%Power Savings/Server OVS Networking:29%Power Savings/Server Redis:34%Less Power per TransactionIsrael-1Hyperscale generative AI full stack performance optimization platform256 Dell PowerEdge XE9680 Servers|2048 H100 GPUs80 Spectrum-4 Switches|2560 BlueField-3 DPUsSmart SwitchHigh Throughput Switch+Fully Programmable DPU Accelerated NetworkingFLEXIBLE SWITCH SYSTEMQSFP-DDSWITCH ASICDPUS=CPS/PPSReplaces Traditional T1 SwitchRapidly Evolving BlueField EcosystemUnlocking the potential of the DPUCybersecurityCloudPlatformStorage