上海品茶

HC2022.Google.Pienaar.v1.pdf

编号:136967 PDF 42页 1.97MB 下载积分:VIP专享
下载报告请您先登录!

HC2022.Google.Pienaar.v1.pdf

1、MLIR FundamentalsHot Chips 34,2022Jacques PienaarGoogle1OutlineBrief MLIR introductionMLIR philosophyWhat you get in the boxQuestions2A collection of modular and reusable software components that enables the progressive lowering of high level operations,to efficiently target hardware in a common way

2、3Multi-Level Intermediate Representation New compiler infrastructureOriginally built by team in TensorFlow ecosystemUnder neutral governance as part of LLVM project4Origin:Many graph compilersNot ideal(old)state322+2+2+15Many broken user journeysLLVM IR has proved itself as a versatile“mid-level”rep

3、resentation similar to C with vectors and SSALLVM IRLLVM:Industry Standard for Compiler Infrastructures6LLVM IR has proved itself as a versatile“mid-level”representation similar to C with vectors and SSALLVM IRMachine IRGlobalISelMC IRLLVM IR is not enough for low-level representationsMultiple lower

4、 levels of abstraction introduced over timeLLVM:Industry Standard for Compiler Infrastructures7LLVM IR is not enough for high-level representationsThere is a huge abstraction gap between ASTs and LLVM IR,covered in a one-shot conversion in Clang.LLVM IRMachine IRGlobalISelMC IRclang ASTC,C+,ObjC,CUD

5、A,OpenCL,.LLVM:Industry Standard for Compiler Infrastructures8Clang has a representation parallel to AST used in,e.g.,static analyzer,various advanced diagnostics.LLVM IRMachine IRGlobalISelMC IRclang ASTC,C+,ObjC,CUDA,OpenCL,.clang CFGLLVM:Industry Standard for Compiler Infrastructures9Some tools(e

6、.g.Polly)resort to raising from LLVM IR to represent higher-level constructs such as loops.LLVM IRMachine IRGlobalISelMC IRclang ASTC,C+,ObjC,CUDA,OpenCL,.ScopInfoLLVM:Industry Standard for Compiler Infrastructures10Newer languages/compilers define custom intermediate representations between AST and

7、 LLVM IR for language-specific analyses and transformationsLLVM IRMachine IRGlobalISelMC IRclang ASTC,C+,ObjC,CUDA,OpenCL,.swift ASTrust ASTjulia ASTfortran ASTSILMIRjulia IRFIRSwiftRustJuliaFortranLLVM:Industry Standard for Compiler Infrastructures11LLVM IRMachine IRGlobalISelMC IRAs we saw modern

8、ML frameworks include domain specific compiler infrastructures albeit domain-specificclang ASTC,C+,ObjC,CUDA,OpenCL,.swift ASTrust ASTjulia ASTfortran ASTSILMIRjulia IRFIRSwiftRustJuliaFortranTF graphXLA HLOTensor RTTFLiteTF API,KerasIts not even only source languages!12LLVM IRMachine IRGlobalISelMC

9、 IRclang ASTC,C+,ObjC,CUDA,OpenCL,.swift ASTrust ASTjulia ASTfortran ASTSILMIRjulia IRFIRSwiftRustJuliaFortranTF graphXLA HLOTensor RTTFLiteTF API,KerasType system support.CSE,DCE and other“canonicalizations”.Location tracking and diagnostics.Pass management.Regions,basic blocks,statements.Conversio

10、ns and validations.Tooling for tests,benchmarks,etc.How much code in common but reimplemented?13Great!High-level domain-specific optimizationsProgressive lowering encourages reuse between levelsGreat location tracking enables flow-sensitive“type checking”Domain specific intermediate representation N

11、ot great!Huge expense to build this infrastructureReimplementation of all the same stuff:pass managers,location tracking,use-def chains,inlining,constant folding,CSE,testing tools,.Innovations in one community dont benefit the others14A toolkit for representing and transforming“code”Represent and tr

12、ansform IR Represent Multiple Levels oftree-based IRs(ASTs),graph-based IRs(TF Graph,HLO),machine instructions(LLVM IR)IR at the same timeWhile enablingCommon compiler infrastructure location trackingricher type systemcommon set of passes(analysis/optimization)And much more15Missing direction?Sort o

13、fAlmost always easier to preserve than recover infoLifting is fragileUser-intent impossible to recoverPrinciple:Dont destroy information/structure youll need to recover later16Example:TensorFlow Control Flow v117 of 70 x=constant_op.constant(10.0,name=x)pred=math_ops.less(1,2)fn1=lambda:math_ops.exp

14、(x,name=fn1)fn2=lambda:constant_op.constant(20.0)r=control_flow_ops.cond(pred,fn1,fn2)r=r*x _=gradients.gradients(r,x)0User writes:%c0=mhlo.constant dense:tensor%c1=mhlo.constant dense:tensor%0=mhlo.while(%arg1=%arg0):(tensor)-tensor cond%1=pare(%c0,%c1)comparison_direction=LT:(tensor,tensor)-tensor

15、 mhlo.return(%1):(tensor)-()do /.mhlo.return(%4):(tensor)-()XLA wants(using MLIR MHLO dialect):Dataflow computingMix and match in a single IRTensorFlow%x=tf.Conv2d(%input,%filter)strides:1,1,2,1,padding:SAME,dilations:2,1,1,1 :(tensor,tensor)-tensorXLA(MHLO)LLVM IR%m=“mhlo.AllToAll(%z)split_dimensio

16、n:1,concat_dimension:0,split_count:2 :(memref)-memref%f=llvm.add%a,%b :f32LoweringI very rarely work with only 1 dialect(even at given time)18Mix and match in a single IRTensorFlow%x=tf.Conv2d(%input,%filter)strides:1,1,2,1,padding:SAME,dilations:2,1,1,1 :(tensor,tensor)-tensorXLA HLOLLVM IR%m=“mhlo

17、.AllToAll(%z)split_dimension:1,concat_dimension:0,split_count:2 :(memref)-memref%f=llvm.add%a,%b :f32LoweringIn softwareIn hardwareIP blocksGenerator libraries19vsDont create artificial boundariesMLIR enables building domain specific IRs and representing problem domainsForce all into oneWithout rein

18、venting the wheelWithout forcing abstracting over and dropping semantics until desiredDifferent mechanisms for abstracting(ops,interfaces,types)20Core design principles1.Parsimony2.Traceability3.ProgressivitySee MLIR:Scaling Compiler Infrastructure for Domain Specific Computation,CGO,March 1,2021 fo

19、r further expansion21Design principlesParsimonyTraceabilityProgressivity“Entities should not be multiplied without necessity.”In compilers,some things are intrinsically complex,avoid making easy things incidentally complex.A small set of versatile built-in concepts enables wide extensibility of the

20、system.It is almost always easier to preserve information than to recover it.Keep the compiler accountable by making its operation transparent and analyzable.Declarative specification helps unless it becomes more complex than algorithms.In compilers,premature lowering is the predecessor of all evil.

21、Preserve high-level abstractions as long as necessary,lower them consciously.Embrace diverging flows and extensibility.Intermediate state is important in an IR.22Design requirementsParsimonyTraceabilityProgressivity-Everything extensible-SSA+graphs+regions-Pervasive source location-Declarative defin

22、itions-Support high-level abstractions-Progressive lowering23How is MLIR different?From graph representation through optimization to code generationState of Art Compiler TechnologyMLIR is NOT just a common graph serialization format nor is there anything like itModular&ExtensibleNot opinionatedChoos

23、e the level of representation that is right for your device24MLIR:Reusable Compiler Abstraction ToolboxNo forced IR impedance mismatchFresh look at problemsIR/optimization format design involves multiple tradeoffsIterative process,constant learning experienceMLIR allows mixing levels of abstraction

24、with non-obvious compounding benefitsDialect-to-dialect lowering is easyOps from different dialects can mix in same IRLowering from“A”to“D”may skip“B”and“C”Avoid lowering too early and losing informationPremature lowering predecessor of all evilHelp define hard analyses awayDoesnt think for you,enab

25、les iterating25Whats in the box26Whats in box:Looking at codeModel operationsDefining passes/transformsTestingNot shown:Defining custom attributes&typesDataflow analysis frameworksSparse&dense,lattice of values,possible to combine multiple togetherExisting sets of optimizations,analysis,dialects27Op

26、erationsMLIR provides general system for creating and modelling operationsOperations enable defining the level of abstraction/optimizationVery little built-in concepts in MLIRfunction,module,for-loop are all just operationsa user could have defined them justOperations need not be definedfunc some_fu

27、nc(%arg0:!random_dialect)-!another_dialect%result=custom.operation(%arg0):(!random_dialect)-!another_dialect return%result:!another_dialect28Syntax In a Nutshell%res:2=mydialect.morph(%input#3)some.attribute=true,other_attribute=1.5 :(!mydialect.custom_type)-(!mydialect.other_type,!mydialect.other_t

28、ype)loc(callsite(foo at mysource.cc:10:8)Name of theresultsOp IdNumber of values returnedDialectprefixArgumentIndex inthe producers resultsDialect prefix for the typeOpaque string/Dialect specific typeList of attributes:constant named argumentsMandatory and Rich Location29DialectsA MLIR dialect is a

29、 logical grouping including:consistent collection of abstractions/libraryA prefix(“namespace”reservation)A list of custom typesA list of operations,each its name and implementation:Verifier for operation invariants Semantics(has-no-side-effects,constant-folding,CSE-allowed,.)Possibly custom parser a

30、nd assembly printerA list of passes(for analysis,transformations,and dialect conversions)30Defining a DialectDialect&custom types defined in C+Dialect can define hooks fortype printing and printing,constant folding.Ops can be definedProgrammatically(in C+)Using Op Definition Spec(TableGen)All(almost

31、 all?)ops in TF,TFlite,MLIR core defined using ODSModel hierarchies,multiclasses,.Custom printing,parsing,folding,canonicalization,verificationDocumentationdef TF_LeakyReluOp:TF_Op let summary=Computes rectified linear:max(features,features*alpha).;let arguments=(ins TF_FloatTensor:$features,Default

32、ValuedAttr:$alpha );let results=(outs TF_FloatTensor:$activations );/Derived attributes are infrequent outside TF.TF_DerivedOperandTypeAttr T=TF_DerivedOperandTypeAttr;let hasFolder=1;31Progressive disclosure:Op modelling is a sliding scaleStart with conservative definition of op,refine over timeThe

33、 more modelled,the betterVerification-good invariants results in smaller debugsSide-effects enables greater optimizations:may change the world-has to run before delete opt-in to performanceFor data flow names operands,results&basic attributes goes farDeclarative assembly formatdef TF_Log1pOp:TF_Op l

34、et summary=Computes natural logarithm of(1+x)element-wise.;let description=I.e.,(y=log_e(1+x).Example:python x=tf.constant(0,0.5,1,5)tf.math.log1p(x)=0.,0.4054651,0.6931472,1.7917595 ;let arguments=(ins TF_FpOrComplexTensor:$x );let results=(outs TF_FpOrComplexTensor:$y );TF_DerivedOperandTypeAttr T

35、=TF_DerivedOperandTypeAttr;https:/www.tensorflow.org/mlir/tf_ops#tflog1p_tflog1pop32Passes/transforms/patternsNow you have operations/modelled your problem,what now?Optimize the modelMostly computationally(make it go faster)Quantize it?Compress operations?Analyze the graphFind maximum memory usageCo

36、mpile to target architectureLower to loops,target raw libraries talk later today by Harsh33Writing a patternTwo ways:1.C+pattern2.Declarative rewrite specificationdef:Pat;34Specify simple patterns simplySupport M-N patternsSupport constraints on Operations,Operands and AttributesSupport specifying d

37、ynamic predicatesSupport native C+code rewritesAlways a long tail,dont make the common case hard for the tail!Goal:Reduces boilerplate,easy to express for simple casesdef:Pat;35Declarative Rewrite Rule frontendCurrently TableGen DAG(S-expr)formatWidely used in LLVM backendsAcquired taste still:)It i

38、s intended to keep the simple case simpleAlso working on PDL,a lower level transformation bytecodeFrontend independent,goal to be targeted by multiple frontendsOthers are building some Python rewrite specifications on topOthers generating thesefrom YAML36Define a passinclude mlir/Pass/PassBase.td/Th

39、is defines the structure for a pass.Normally one would have multiple/patterns or transformations per pass.And so defining a new pass isnt that/frequent./The format below is used to both dictate on what the pass operates and to/add description from which documentation could be generated.It can also/h

40、ave additional options as well specify dependent dialects.def AddRewritePass:Pass /name of pass on CLI /type of op it operates on let summary=Example addition rewrite pass;let description=Does cool stuff.;/Constructor that will return an instance of the AddRewrite pass.let constructor=:mlir:TF:Creat

41、eAddRewritePass();let options=/This pass doesnt have any options(which is default),but adding to /make aware of as this allows passing options to the pass.;let statistics=Statistic ;37Pass driver(opt-tool)Normally per project levelPretty easy to add new driver:#include add_rewrite_pass.h#include ten

42、sorflow/compiler/mlir/init_mlir.h#include tensorflow/compiler/mlir/tensorflow/dialect_registration.h#include third_party/llvm/mlir/include/mlir/InitAllDialects.h#include third_party/llvm/mlir/include/mlir/InitAllPasses.h#include third_party/llvm/mlir/include/mlir/Support/MlirOptMain.hint main(int ar

43、gc,char*argv)tensorflow:InitMlir y(&argc,&argv);mlir:DialectRegistry registry;mlir:registerAllDialects(registry);mlir:RegisterAllTensorFlowDialects(registry);mlir:registerAllPasses();/New pass being tested.mlir:TF:registerTensorFlowAddRewritePasses();return failed(mlir:MlirOptMain(argc,argv,Rewrite

44、test pass drivern,registry);38Hint:develop your passes iteratively like your tests/RUN:mlir-opt%s-affine-loop-unroll=unroll-full|FileCheck%sfunc loop_nest_simplest()/UNROLL-FULL:affine.for%arg0=0 to 100 step 2 affine.for%i=0 to 100 step 2 /UNROLL-FULL:%c1_i32=constant 1:i32 /UNROLL-FULL-NEXT:%c1_i32

45、_0=constant 1:i32 /UNROLL-FULL-NEXT:%c1_i32_1=constant 1:i32 /UNROLL-FULL-NEXT:%c1_i32_2=constant 1:i32 affine.for%j=0 to 4%x=constant 1:i32 /UNROLL-FULL:return /UNROLL-FULL:returnInput may be written by hand or result of tools such as tf-translate or dumped reproducer moduleunit testcolab39Getting

46、involved40MLIR is a community project41 Important takeaway from looking around internally and externally,from Compilers for Machine Learning(C4ML)&HPC community(SC)to HW folks(ISSCC)All solving similar problems over and overEffort on common(but very important and not really common)parts take away from value add MLIR is OSS with active communitymlir.dev/forum for Discourse forum(RFCs and longer discussions happen here)mlir.dev/chat for Discord chat(quick convos,across time zones often here)Thank you to the team!Questions?42

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(HC2022.Google.Pienaar.v1.pdf)为本站 (2200) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
会员动态
会员动态 会员动态:

wei**n_...  升级为标准VIP wei**n_...  升级为高级VIP 

 wei**n_... 升级为至尊VIP  一朴**P... 升级为标准VIP

 133**88... 升级为至尊VIP   wei**n_... 升级为高级VIP

159**56... 升级为高级VIP  159**56... 升级为标准VIP 

升级为至尊VIP  136**96... 升级为高级VIP

wei**n_...  升级为至尊VIP wei**n_...  升级为至尊VIP

wei**n_... 升级为标准VIP  186**65... 升级为标准VIP 

137**92... 升级为标准VIP  139**06...  升级为高级VIP

 130**09... 升级为高级VIP wei**n_...  升级为至尊VIP 

 wei**n_... 升级为至尊VIP  wei**n_... 升级为至尊VIP

wei**n_...  升级为至尊VIP 158**33... 升级为高级VIP  

 骑**... 升级为高级VIP  wei**n_... 升级为高级VIP

wei**n_...   升级为至尊VIP 150**42...  升级为至尊VIP 

 185**92... 升级为高级VIP dav**_w...  升级为至尊VIP 

zhu**zh...  升级为高级VIP wei**n_... 升级为至尊VIP  

  136**49... 升级为标准VIP 158**39...  升级为高级VIP

wei**n_... 升级为高级VIP   139**38... 升级为高级VIP

159**12...  升级为至尊VIP  微**... 升级为高级VIP 

185**23... 升级为至尊VIP  wei**n_...  升级为标准VIP

 152**85...  升级为至尊VIP ask**un  升级为至尊VIP 

 136**21... 升级为至尊VIP  微**... 升级为至尊VIP  

 135**38... 升级为至尊VIP  139**14... 升级为至尊VIP

138**36... 升级为至尊VIP    136**02...  升级为至尊VIP

139**63...  升级为高级VIP  wei**n_...  升级为高级VIP

 Ssx**om 升级为高级VIP  wei**n_... 升级为至尊VIP 

131**90... 升级为至尊VIP   188**13... 升级为标准VIP

159**90... 升级为标准VIP  风诰 升级为至尊VIP 

182**81...  升级为标准VIP 133**39... 升级为高级VIP  

 wei**n_... 升级为至尊VIP  段**  升级为至尊VIP 

wei**n_... 升级为至尊VIP   136**65... 升级为至尊VIP

136**03... 升级为高级VIP  wei**n_... 升级为标准VIP 

 137**52... 升级为标准VIP  139**61... 升级为至尊VIP 

 微**... 升级为高级VIP  wei**n_... 升级为高级VIP

188**25... 升级为高级VIP 微**...  升级为至尊VIP 

 wei**n_... 升级为高级VIP wei**n_... 升级为标准VIP 

 wei**n_... 升级为高级VIP  wei**n_... 升级为标准VIP 

 186**28... 升级为标准VIP  微**... 升级为至尊VIP  

 wei**n_... 升级为至尊VIP  wei**n_... 升级为高级VIP

189**30... 升级为高级VIP  134**70... 升级为标准VIP 

 185**87...  升级为标准VIP  wei**n_...  升级为高级VIP

 wei**n_...  升级为至尊VIP 微**... 升级为至尊VIP

wei**n_...  升级为标准VIP wei**n_... 升级为至尊VIP 

wei**n_...   升级为标准VIP  132**09... 升级为至尊VIP

麦提 升级为高级VIP   wei**n_... 升级为高级VIP

wei**n_...  升级为至尊VIP wei**n_... 升级为标准VIP 

wei**n_...   升级为至尊VIP  wei**n_... 升级为标准VIP

wei**n_...  升级为至尊VIP   wei**n_...  升级为标准VIP

182**18...  升级为高级VIP 中**... 升级为至尊VIP 

136**77... 升级为标准VIP wei**n_...  升级为标准VIP

180**43... 升级为至尊VIP    桃**  升级为至尊VIP