《WebNN Web 端侧推理的未来-胡宁馨.pdf》由会员分享,可在线阅读,更多相关《WebNN Web 端侧推理的未来-胡宁馨.pdf(22页珍藏版)》请在三个皮匠报告上搜索。
1、WEBNN,WEB 端侧推理的未来胡宁馨 胡宁馨 张敏 张敏 英特尔 SATG Web 平台工程英特尔 SATG Web 平台工程2023 年 12 月2023 年 12 月WebML 客户端推理的优势隐私摄像头、麦克风等传摄像头、麦克风等传感器数据保留在设备感器数据保留在设备中中离线初始资源缓存并离线初始资源缓存并离线后,不再依赖网络后,不再依赖网络延迟无云端网络问题,浏无云端网络问题,浏览器实时推理览器实时推理成本无需云端算力支持无需云端算力支持0 安装浏览器中运行,无需浏览器中运行,无需额外安装,并易于共额外安装,并易于共享享跨平台在几乎所有平台上运在几乎所有平台上运行 AI 应用行 A
2、I 应用WebML 客户端推理突发的突发的延迟敏感延迟敏感持续的持续的电量敏感电量敏感周期的周期的吞吐量敏感吞吐量敏感多样的客户端 AI 场景,多种满足需求的计算单元多样的客户端 AI 场景,多种满足需求的计算单元CPUCPU无处不在无处不在低延迟,单一推理任务低延迟,单一推理任务GPUGPU高并行性,高 batch size高并行性,高 batch size与 3D/渲染/媒体管道集成与 3D/渲染/媒体管道集成NPUNPU专用低功耗AI加速器专用低功耗AI加速器高能耗比,提升电源效率高能耗比,提升电源效率Web 开发者的需求The web needs The web needs its o
3、wnits ownneural networksneural networksspecificationspecification to leverage to leverageApple Silicon,TensorApple Silicon,TensorCores,and others.Cores,and others.“Delighted to find theDelighted to find theworking drafts of WebNN.working drafts of WebNN.Incredible new powerIncredible new powerunlock
4、ed for the free,openunlocked for the free,openand competitive Web!and competitive Web!“Native Tensor supportNative Tensor support!Would be amazing to haveWould be amazing to haveTensor objects and opsTensor objects and opsbuilt into Chrome,andbuilt into Chrome,andavailable as an“ML API”.available as
5、 an“ML API”.“Although some scientificAlthough some scientificcomputing libraries existcomputing libraries existfor JS/TS,having for JS/TS,having built-inbuilt-insupportsupport would be far more would be far moredesirable!desirable!“If go through the code ofIf go through the code ofutils,maths,audio,
6、tensorutils,maths,audio,tensorin JS,it is annoying that Iin JS,it is annoying that Ihad to implement thesehad to implement theseops myself in JS.ops myself in JS.“llama2-7b in the browser llama2-7b in the browser using WebNNusing WebNN is going to is going tobe on-device,localbe on-device,localML cc
7、 xenovacomML cc xenovacom“WebNN 简介新兴的 W3C Web 标准 API新兴的 W3C Web 标准 API神经网络的统一抽象神经网络的统一抽象通过原生 ML API 访问 AI 硬通过原生 ML API 访问 AI 硬件加速器件加速器接近原生的 AI 推理性能和结接近原生的 AI 推理性能和结果的可靠性果的可靠性目前在 Chrome 和 Edge Canary 中可用(runtime flag)目前在 Chrome 和 Edge Canary 中可用(runtime flag)WebNN 标准规范WebNN 标准规范由 W3C Web Machine Lear
8、ning 工作组负责起草WebNN 标准规范由 W3C Web Machine Learning 工作组负责起草WebNN 标准规范由 Intel 及 Microsoft 联合编辑WebNN 标准规范由 Intel 及 Microsoft 联合编辑WebNN 标准规范进展已交付已交付2023 年 3 月:W3C CR2023 年 3 月:W3C CR60 个 CNN/RNN ops,60 个 CNN/RNN ops,float16/32,int32/uint32,float16/32,int32/uint32,int8/uint8int8/uint8图像分类:SqueezeNet,图像分类:S
9、queezeNet,MobileNet,ResNetMobileNet,ResNet物体检测:TinyYOLO物体检测:TinyYOLO噪声抑制:RNNoise,NSNet噪声抑制:RNNoise,NSNet最新进展最新进展2023 年 12 月:W3C CR更新2023 年 12 月:W3C CR更新新增 19 个 Transformer ops,新增 19 个 Transformer ops,int64/uint64int64/uint6424年:支持 NPU 和量化24年:支持 NPU 和量化目标模型目标模型文本到图像:Stable Diffusion文本到图像:Stable Diffu
10、sion图像分割:Segment Everything图像分割:Segment Everything语音转文本:Whisper Tiny语音转文本:Whisper Tiny文本到文本(encoder-decoder):文本到文本(encoder-decoder):T5 及 M2M100T5 及 M2M100文本生成(decoder):LLaMA2文本生成(decoder):LLaMA2WebNN 架构WebNN 编程模型WebNN 代码示例constconst context=await navigator.ml.createContext(deviceType:context=await n
11、avigator.ml.createContext(deviceType:gpugpu););/The following code builds a graph as:/The following code builds a graph as:/constant1-+/constant1-+/+-Add-intermediateOutput1-+/+-Add-intermediateOutput1-+/input1 -+|/input1 -+|/+-Mul-output/+-Mul-output/constant2-+|/constant2-+|/+-Add-intermediateOutp
12、ut2-+/+-Add-intermediateOutput2-+/input2 -+/input2 -+/Use tensors in 4 dimensions./Use tensors in 4 dimensions.constconst TENSOR_DIMS=TENSOR_DIMS=1 1,2 2,2 2,2 2;constconst TENSOR_SIZE=TENSOR_SIZE=8 8;constconst builder=builder=newnew MLGraphBuilder(context);MLGraphBuilder(context);/Create MLOperand
13、Descriptor object./Create MLOperandDescriptor object.constconst descdesc=dataType:=dataType:float32float32,dimensions:TENSOR_DIMS;,dimensions:TENSOR_DIMS;/constant1 is a constant MLOperand with the value 0.5./constant1 is a constant MLOperand with the value 0.5.constconst constantBuffer1=constantBuf
14、fer1=newnew Float32Array(TENSOR_SIZE).fill(Float32Array(TENSOR_SIZE).fill(0.50.5););constconst constant1=builder.constant(constant1=builder.constant(descdesc,constantBuffer1);,constantBuffer1);/input1 is one of the input MLOperands.Its value will be set before execution/input1 is one of the input ML
15、Operands.Its value will be set before executionconstconst input1=builder.input(input1=builder.input(input1input1,descdesc););/constant2 is another constant MLOperand with the value 0.5./constant2 is another constant MLOperand with the value 0.5.constconst constantBuffer2=constantBuffer2=newnew Float
16、32Array(TENSOR_SIZE).fill(Float32Array(TENSOR_SIZE).fill(0.50.5););constconst constant2=builder.constant(constant2=builder.constant(descdesc,constantBuffer2);,constantBuffer2);/input2 is another input MLOperand.Its value will be set before execution./input2 is another input MLOperand.Its value will
17、be set before execution.constconst input2=builder.input2=builder.inputinput(input2input2,desc);,desc);/intermediateOutput1 is the output of the first Add operation./intermediateOutput1 is the output of the first Add operation.constconst intermediateOutput1=builder.intermediateOutput1=builder.addadd(
18、constant1,input1);(constant1,input1);/intermediateOutput2 is the output of the second Add operation./intermediateOutput2 is the output of the second Add operation.constconst intermediateOutput2=builder.intermediateOutput2=builder.addadd(constant2,input2);(constant2,input2);/output is the output MLOp
19、erand of the Mul operation./output is the output MLOperand of the Mul operation.constconst output=builder.output=builder.mulmul(intermediateOutput1,intermediateOutput2);(intermediateOutput1,intermediateOutput2);/Compile the constructed graph./Compile the constructed graph.constconst graph=graph=awai
20、tawait builder.builder.buildbuild(outputoutput:output);:output);/Setup the input buffers with value 1./Setup the input buffers with value 1.constconst inputBuffer1=inputBuffer1=newnew Float32ArrayFloat32Array(TENSOR_SIZETENSOR_SIZE).).fillfill(1 1););constconst inputBuffer2=inputBuffer2=newnew Float
21、32ArrayFloat32Array(TENSOR_SIZETENSOR_SIZE).).fillfill(1 1););constconst outputBuffer=outputBuffer=newnew Float32ArrayFloat32Array(TENSOR_SIZETENSOR_SIZE););/Execute the compiled graph with the specified inputs./Execute the compiled graph with the specified inputs.constconst inputs=inputs=input1inpu
22、t1:inputBuffer1,:inputBuffer1,input2input2:inputBuffer2,:inputBuffer2,;constconst outputs=outputs=outputoutput:outputBuffer;:outputBuffer;constconst results=results=awaitawait putecompute(graph,inputs,outputs);(graph,inputs,outputs);consoleconsole.loglog(Output value:Output value:+results.+results.o
23、utputsoutputs.outputoutput););/Output value:2.25,2.25,2.25,2.25,2.25,2.25,2.25,2.25/Output value:2.25,2.25,2.25,2.25,2.25,2.25,2.25,2.25WebNN 在 Chromium 中的实现WebNN 操作符的实现状态(部分)Web PlatformWeb PlatformTestsTestsXNNPack/CPUXNNPack/CPUbackendbackendDirectML/GPU backendDirectML/GPU backendExecution Deleg
24、ateExecution DelegateExecutionExecutionProviderProvider Lite for Lite forTensorFlow.jsTensorFlow.js clamp clamp ELEMENT_WISE_CLIP ELEMENT_WISE_CLIP ReluN1To1 ReluN1To1 Clip Clip Relu6 Relu6 concatenate2 concatenate2 JOIN JOIN Concatenation Concatenation Concat Concat concatenate3 concatenate3 concat
25、enate4 concatenate4 convolution_2d convolution_2d CONVOLUTION CONVOLUTION Conv2d Conv2d Conv Conv DepthwiseConv2d DepthwiseConv2d deconvolution_2d deconvolution_2d CONVOLUTION CONVOLUTION TransposeConv TransposeConvConvTransposeConvTransposeConvolution2DTransposeBiasConvolution2DTransposeBias add2 a
26、dd2 ELEMENT_WISE_ADD ELEMENT_WISE_ADD Add Add Add Add subtract subtractELEMENT_WISE_SUBTRACTELEMENT_WISE_SUBTRACT Sub Sub Sub Sub multiply2 multiply2 ELEMENT_WISE_MULTIPLY ELEMENT_WISE_MULTIPLY Mul Mul Mul Mul divide divide ELEMENT_WISE_DIVIDE ELEMENT_WISE_DIVIDE Div Div Div Div maximum2 maximum2 EL
27、EMENT_WISE_MAX ELEMENT_WISE_MAX Maximum Maximum Max MaxWebNN SpecWebNN Specclampclamp concatconcat conv2dconv2d convTranspose2dconvTranspose2d add add element-wiseelement-wisebinarybinary sub sub element-wiseelement-wisebinarybinary mul mul element-wiseelement-wisebinarybinary div div element-wiseel
28、ement-wisebinarybinary max max element-wiseelement-wisebinarybinary WebNN 操作符的实现状态(部分)Web PlatformWeb PlatformTestsTestsXNNPack/CPUXNNPack/CPUbackendbackendDirectML/GPU backendDirectML/GPU backendExecution DelegateExecution DelegateExecutionExecutionProviderProvider Lite forLite forTensorFlow.jsTens
29、orFlow.js abs abs ELEMENT_WISE_ABS ELEMENT_WISE_ABS Abs Abs Abs Abs ceiling ceiling ELEMENT_WISE_CEIL ELEMENT_WISE_CEIL Ceil Ceil Ceil Ceil floor floor ELEMENT_WISE_FLOOR ELEMENT_WISE_FLOOR Floor Floor Floor Floor negate negate ELEMENT_WISE_NEGATE ELEMENT_WISE_NEGATE Neg Neg Neg Neg elu elu ACTIVATI
30、ON_ELU ACTIVATION_ELU Elu Elu Elu Elu fully_connected fully_connected GEMM GEMM FullyConnected FullyConnected Gemm Gemm hardswish hardswishy=x*max(0,min(6,(x+3)/6y=x*max(0,min(6,(x+3)/6 HardSwish HardSwish HardSwish HardSwish leaky_relu leaky_relu ACTIVATION_LEAKY_RELU ACTIVATION_LEAKY_RELU LeakyRel
31、u LeakyRelu LeakyRelu LeakyRelu prelu preluACTIVATION_PARAMETERIZED_RELUACTIVATION_PARAMETERIZED_RELU Prelu Prelu Prelu Prelu clamp clamp ACTIVATION_RELU ACTIVATION_RELU Relu Relu Relu Relu sigmoid sigmoid ACTIVATION_SIGMOID ACTIVATION_SIGMOID Logistic Logistic Sigmoid SigmoidWebNN SpecWebNN Specabs
32、 abs element-wiseelement-wiseunaryunary ceil ceil element-wiseelement-wiseunaryunary floor floor element-wiseelement-wiseunaryunary neg neg element-wiseelement-wiseunaryunary eluelu gemmgemm hardSwishhardSwish leakyReluleakyRelu preluprelu relurelu sigmoidsigmoid WebNN 操作符的实现状态(部分)Web PlatformWeb Pl
33、atformTestsTestsXNNPack/CPU backendXNNPack/CPU backendDirectML/GPU backendDirectML/GPU backendExecution DelegateExecution DelegateExecution ProviderExecution Provider Lite forLite forTensorFlow.jsTensorFlow.js static_constant_pad static_constant_pad PADDING PADDING Pad Pad Pad Pad average_pooling_2d
34、 average_pooling_2d AVERAGE_POOLING AVERAGE_POOLING AveragePool2d AveragePool2dGlobalAveragePoolGlobalAveragePool Mean Mean AveragePool AveragePool max_pooling_2d max_pooling_2d MAX_POOLING2 MAX_POOLING2 MaxPool2d MaxPool2d GlobalMaxPool GlobalMaxPool MaxPool MaxPool static_resize_bilinear_2d static
35、_resize_bilinear_2d RESAMPLE RESAMPLE ResizeBilinear ResizeBilinear Resize Resize static_reshape static_reshape DML_TENSOR_DESC DML_TENSOR_DESC Reshape Reshape Reshape Reshape even_split2 even_split2 SPLIT SPLIT Split Split Split Split even_split3 even_split3 even_split4 even_split4 static_slice(une
36、ven static_slice(unevensplit)split)static_slice static_slice SLICE SLICE Slice Slice Slice Slice StridedSlice StridedSlice softmax softmaxACTIVATION_SOFTMAXACTIVATION_SOFTMAX Softmax Softmax Softmax Softmax static_transpose static_transpose DML_TENSOR_DESC DML_TENSOR_DESC Transpose Transpose Transpo
37、se Transpose787867673232666634347070WebNN SpecWebNN Specpadpad averagePool2daveragePool2dpoolingpooling maxPool2d maxPool2d poolingpooling resample2dresample2dreshapereshape splitsplit sliceslice softmaxsoftmax transposetranspose WebNN 的实现状态(DirectML)目前已经支持 66 个 ops(GPU)目前已经支持 66 个 ops(GPU)Transfome
38、r 的 ops 已基本支持Transfomer 的 ops 已基本支持正在为 NPU 支持作出适配正在为 NPU 支持作出适配WebNN 和主流 JavaScript ML 框架的集成WebNN 与 ONNXRuntime Web 集成的代码示例WebAssembly 后端WebAssembly 后端WebNN 后端WebNN 后端importimport InferenceSessionInferenceSession fromfrom onnxruntime-webonnxruntime-web;/././Initialize the ONNX model/Initialize the O
39、NNX modelconstconst initModelinitModel=asyncasync()=)=ort.ort.envenv.wasmwasm.numThreadsnumThreads=1 1;/4/4ort.ort.envenv.wasmwasm.simdsimd=truetrue;ort.ort.envenv.wasmwasm.proxyproxy=truetrue;constconst optionsoptions:InferenceSessionInferenceSession.SessionOptionsSessionOptions=/provider name:wasm
40、,webnn/provider name:wasm,webnn/deviceType:cpu,gpu/deviceType:cpu,gpu/powerPreference:default,high-performance/powerPreference:default,high-performanceexecutionProvidersexecutionProviders:namename:wasmwasm,/WebAssembly CPU/WebAssembly CPU;/./.;constconst results=results=awaitawait model.model.runrun
41、(feeds);(feeds);constconst output=resultsmodel.output=resultsmodel.outputNamesoutputNames 0 0;importimport InferenceSessionInferenceSession fromfrom onnxruntime-webonnxruntime-web;/././Initialize the ONNX model/Initialize the ONNX modelconstconst initModelinitModel=asyncasync()=)=env.env.wasmwasm.nu
42、mThreadsnumThreads=1 1;/4/4env.env.wasmwasm.simdsimd=truetrue;env.env.wasmwasm.proxyproxy=truetrue;constconst optionsoptions:InferenceSessionInferenceSession.SessionOptionsSessionOptions=/provider name:wasm,webnn/provider name:wasm,webnn/deviceType:cpu,gpu/deviceType:cpu,gpu/powerPreference:default,
43、high-performance/powerPreference:default,high-performanceexecutionProvidersexecutionProviders:namename:webnnwebnn,deviceTypedeviceType:gpugpu,powerPreferencepowerPreference:defaultdefault,;/./.;constconst results=results=awaitawait model.model.runrun(feeds);(feeds);constconst output=resultsmodel.out
44、put=resultsmodel.outputNamesoutputNames 0 0;WebNN XNNPack/CPU 性能数据(标准化)MediaPipe 模型,越高越好MediaPipe 模型,越高越好W3C Machine Learning for the Web社区组讨论和探索新想法,孵化机器学习推理的新提案讨论和探索新想法,孵化机器学习推理的新提案39 个组织代表,126 名参与者39 个组织代表,126 名参与者 工作组基于社区组孵化的提案,标准化机器学习推理的 Web API基于社区组孵化的提案,标准化机器学习推理的 Web API17 个组织代表,43 名参与者(3 名特邀专家)17 个组织代表,43 名参与者(3 名特邀专家)谢谢!https:/webnn.devhttps:/webnn.devWebNN 交流群WebNN 交流群