《CUDA 11 新特性介绍.pdf》由会员分享,可在线阅读,更多相关《CUDA 11 新特性介绍.pdf(70页珍藏版)》请在三个皮匠报告上搜索。
1、Jingrong Zhang, GTC CHINA CUDA 11 NEW FEATURES 2 NVIDIA A100 Highlights Programing with CUDA 11 oWarp Synchronous Reduction oL2 Cache Residency Control oAsynchronous copy oAsynchronous barrier AGENDA 3 NVIDIA A100 HIGHLIGHTS 5 Miracles of NVIDIA A100 https:/ 4 UNIFIED AI ACCELERATION All results are
2、 measured BERT Large Training (FP32 stream_attribute.accessPolicyWindow cudaStreamSetAttribute(stream, cudaStreamAttributeAccessPolicyWindow, Set an L2 persisting access window using a CUDA Stream. 30 RESIDENCY CONTROLS Set an L2 persisting access window 31 RESIDENCY CONTROLS Set an L2 persisting ac
3、cess window CUDA Graph cudaKernelNodeAttrValue node_attributenode_attribute.accessPolicyWindow cudaGraphKernelNodeSetAttribute(node, cudaKernelNodeAttributeAccessPolicyWindow, Set an L2 persisting access window using a CUDA Graph. 32 RESIDENCY CONTROLS Set an L2 persisting access window 33 EXAMPLE:HISTOGRAM Dataset Size = 1024 MB ( 256 Million integers) Size of Histogram bins = 20 MB (5 Million in