《HC2022.PNNL.Curzel.v01.pdf》由会员分享,可在线阅读,更多相关《HC2022.PNNL.Curzel.v01.pdf(13页珍藏版)》请在三个皮匠报告上搜索。
1、From High-Level Frameworks to custom Silicon with SODASerena Curzel,Nicolas Bohm Agostini,Reece Neff,Ankur Limaye,Jeff(Jun)Zhang,Vinay Amatya,Marco Minutoli,Vito Giovanni Castellana,Joseph Manzano,David Brooks,Gu-Yeon Wei,Fabrizio Ferrandi,Antonino Tumeo2Overview The SODA Synthesizer is a modular,mu
2、lti-level,interoperable,extensible,open-source hardware compiler from high-level programming frameworks to silicon Compiler-based frontend,leveraging the MultiLevelIntermediate Representation(MLIR)Compiler-based backend,leveraging state-of-the-art High-Level Synthesis(HLS)techniques Generates synthe
3、sizable Verilog for a variety of targets,from Field Programmable Gate Arrays(FPGAs)to Application Specific Integrated Circuits(ASICs)Optimizations at all levels are performed as compiler optimization passes3ResultsUseful linksASIC accelerators for LeNet layersSODA-OPTSODA Docker ImagePanda-Bambu HLS
4、(v 0.9.7)SODA Tutorial:DATE 20224Motivations Data Science algorithms,Machine Learning models and frameworks are quickly evolving Increased complexity and tight performance/power/area constraints(especially on edge devices)require domain-specific acceleratorsY.Lecun,et al.,“Gradient-based learning ap
5、plied to document recognition,”Proc.IEEE,1998Increasing number of layers and parametersNew network architecturesCompression techniquesNew programming environments(ResNet VGG,Transformers)(GNN,LSTM,Reinforcement Learning)(Quantization,pruning)(TensorFlow,PyTorch,MXNet)5Motivations Existing accelerato
6、rs start from specific models(e.g.,CNNs)or only try to accelerate specific computational patterns Designing hardware accelerators by hand is complex and time-consuming Hardware designers may want to explore different design trade-offs,depending on the application requirements Agile Hardware Design a
7、nd Prototyping is required Quickly transition from algorithm formulation to accelerator implementation Sufficient design space exploration knobs Minimal human interactionLeNet architectureASIC6Our solution:the SODA Synthesizer The SODA Synthesizer is a modular,multi-level,interoperable,extensible,op
8、en-source hardware compiler from high-level programming frameworks to silicon Compiler-based frontend,leveraging the MultiLevelIntermediate Representation(MLIR)Compiler-based backend,leveraging state-of-the-art High-Level Synthesis(HLS)techniques Generates synthesizable Verilog for a variety of targ
9、ets,from Field Programmable Gate Arrays(FPGAs)to Application Specific Integrated Circuits(ASICs)Optimizations at all levels are performed as compiler optimization passesJ.Zhang,N.Bohm Agostini,S.Song,C.Tan,A.Limaye,V.Amatya,J.B.Manzano,M.Minutoli,V.G.Castellana,A.Tumeo,G.Wei,D.Brooks:Towards Automat
10、ic and Agile AI/ML Accelerator Design with End-to-End Synthesis.ASAP 2021:218-225N.Bohm Agostini,S.Curzel,J.Zhang,A.Limaye,C.Tan,V.Amatya,M.Minutoli,V.G.Castellana,J.B.Manzano,D.Brooks,G.Wei,A.Tumeo:Bridging Python to Silicon:The SODA Toolchain.To appear in IEEE Micro Magazine7 SODA-OPT:Search,Outli
11、ne,Dispatch,Accelerate frontend optimizer Employs and embraces the MLIR framework MLIR:Multi-Level Intermediate Representation Used in TensorFlow,TFRT,ONNX-MLIR,others Uses MLIR and compiler passes to:Identify code regions for hardware generation Perform high-level optimizations(dataflow transformat
12、ions,data-level and instruction-level parallelism extraction)Generate interfacing code and runtime calls for microcontrollerN.Bohm Agostini,S.Curzel,V.C.Amatya,C.Tan,M.Minutoli,V.G.Castellana,J.Manzano,D.Kaeli,A.Tumeo,An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration.To a
13、ppear at ICCAD 2022Frontend:SODA-OPT8 SODA-OPT implements optimizations as compiler passesSingle basic block containing the compute intensive part of the kernelMore freedom to schedule operationsIncreased instruction-level parallelismSchedule independent arithmetic operations on the same cycle when
14、their inputs are availableIncreased data-level parallelismSchedule operations into different memory units on the same cycleAvoid unnecessary reads from kernel argumentsReduce expensive accesses to external memoryReuse read results,aggregate on scalarsSave scalar values loaded from memory and interme
15、diate results in registers rather than performing repeated memory accessesEarly alias analysisSchedule memory operations independently on regions that dont aliasRemove redundant or unnecessary operationsAvoid wasting resourcesTilingUnrollingTemporary Buffer AllocationAlloca Buffer PromotionScalar Re
16、placement of AggregatesEarly Alias AnalysisOutliningCommon Sub-expression EliminationDead Code EliminationStructuralMemoryAvoid Redundancy and Promote ReuseAvoid Unnecessary OperationsFrontend:SODA-OPT9Backend:High-Level Synthesis The synthesizer backend takes as input the optimized low-level IR and
17、 generates the hardware descriptions of the accelerators The main HLS backend is PandA-Bambu,an open-source state-state-of-the-art high-level synthesis(HLS)toolWe are key contributors to Bambu,with parallel accelerator designs,modular HLS,and ASIC supportAutomated testing and verificationF.Ferrandi,
18、V.G.Castellana,S.Curzel,P.Fezzardi,M.Fiorito,M.Lattuada,M.Minutoli,C.Pilato,A.Tumeo:Invited:Bambu:an Open-Source Research Framework for the High-Level Synthesis of Complex Applications.DAC 2021:Backend:High-Level Synthesis We also support integration with Xilinx Vitis HLS through its open
19、-source LLVM frontend The SODA Synthesizer has interfaces with multiple open-source and commercial backendsXilinx Vivado,Intel Quartus(FPGA)OpenROAD,Synopsys Design Compiler(ASIC)Automated path to FPGA bitstream or GDS2 filesBackend:HLSIR downgradingTo:FPGA DesignVitis LLVM frontendhttps:/ of genera
20、ted acceleratorsLeNet model imported from TensorFlowEach operator is synthesized to an ASIC accelerator(OpenROAD FreePDK 45nm)SODA-OPT optimized accelerators are bigger,but also much faster12Examples of generated acceleratorsPolyBench kernelsOutperforming state-of-the-art HLS tools and frontendsN.Bo
21、hm Agostini,S.Curzel,V.C.Amatya,C.Tan,M.Minutoli,V.G.Castellana,J.Manzano,D.Kaeli,A.Tumeo,An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration.To appear at ICCAD 202213ConclusionsThe SODA toolchain provides an end-to-end compiler-based design flow from the formulation of an
22、algorithm to the deployment of custom hardware acceleratorsMulti-level,modular,and extensiblePromotes agile hardware designBased on open-source technologies,and integrated with proprietary toolsStart using SODA today with these links:SODA-OPTSODA Docker ImagePanda-Bambu HLS(v 0.9.7)SODA Tutorial:DATE 2022