《HC2022.SiFive-MSFT.LenharthDemme.v1.pdf》由会员分享,可在线阅读,更多相关《HC2022.SiFive-MSFT.LenharthDemme.v1.pdf(48页珍藏版)》请在三个皮匠报告上搜索。
1、Circuit IR for Compilers and ToolsHeterogeneous Compilation in MLIRHot Chips 34 TutorialAndrew Lenharth(SiFive)John Demme(Microsoft)Demo:Creating hardware for MLPyTorch to SystemVerilog with cosimulationMLIR-based tech for HW Design and Verification Composable toolchain for hardware design/EDA proce
2、sses Focuses:High quality,usability,performanceModular library based design to power next-gen ecosystem:Drive an innovation explosion for HW(like LLVM did for SW)LLVM Incubator Project:CIRCT:Circuit IR for Compilers and Toolshttps:/circt.llvm.orgMaking a chip is easy,right?GDSIIFPGASynthesis+SDFP&RH
3、W DesignerSimulationHardware design is a team sport;not one thingVerification Engineers*FormalMethodsGDSIIFPGAsAnalysis Tools(e.g.clock,power domains,etc)EmulationSynthesis+SDFP&RDEFHW DesignerDesign is more than just RTL and testbenchesVerification is a whole set of sub-disciplines,each with their
4、own standardsPhysical DesignerPower Engineer*Approximately to scaleSimulationPervasive redundancy,no single source of truth,little consistencyEach aspect of the design has different(sub-)languagesMany languages are vendor or tool-dependentSpecs are not orthogonal:reuse abstractions(despite poor abst
5、raction capability)Redundancy:multiple sources of truthThese IRs are loosely coupled to the original design intentLong turn around and lots of effort to make changesFragile layeringDesigns become a mess of scripts,TCL,vendor-specific files,and duct-tapeAll aspects of a chip need specificationLots of
6、 interacting tools and development flows!Tools are great,but not as great as they could be!Wonderful ecosystem of tools,but:Not always using best practices in software/compiler engineeringMonolithic designs connected by unfortunate standards like VerilogEach framework/tool/tech stack is its own tech
7、nology islandEach has a small developer community:Little shared code slows progress,each tool is missing featuresFeatures,quality of results,and user experience trails proprietary toolsPoor SystemVerilog compatibility harms interoperationProblem:no one is tackling the IR/representational issues!Prob
8、lem:duplication of effort on“uninteresting”parts!Library based design in LLVM enabled a technology explosion!OpenCL,CUDA,HLS,JITd database query engines,new languages like Swift,Rust,Julia,.We need this for hardware design!Clipart by Pedro Neves See LLVM 2021 Developers Conference Keynote:Non-DAG lo
9、gic(feedback)Hierarchical path specificationSV Verification constructsSide-band design informationCode generator integrationHuman-readable source code generationInteroperating with diverse,fragmented,and idiosyncratic toolsVery large designsCIRCT is Tackling Interesting ProblemsCIRCT has useful libr
10、aries and tech!CIRCT dialects+implementations:Full,production FIRRTL implementationCore dialects stableExcellent SystemVerilog generation pipelineActive work on all dialectsUsed both in production and research projectsFrontend use-cases stable and well-used:Tools that want to generate SystemVerilogE
11、arly interest in simulation,synthesis,P&R,etcActive CIRCT sub-projectsDetails laterCore dialectsEmitting SystemVerilogHLS flowsPyCDE(CIRCT Design Entry)Elastic Silicon InterconnectFIRRTL/ChiselHonorable mentionFast simulationScheduling frameworkFSMSystemCFPGA physical design/high level constructs+se
12、veral moreHonorable mentions(selection of projects we wont be discussing further)Fast simulationKeep high-level dialect ops for fast functional simulation.(Linear algebra runs a lot faster when compiled directly rather than in RTL simulation.)Lower to RTL to get cycle-level timing then correlate.Sch
13、eduling framework1HLS and other high-level lowerings need a scheduling system.The CIRCT scheduling framework abstracts away specific schedulers into a diverse set of“problem”models to suit numerous uses.FSMAbstraction for finite state machines,allowing easier reasoning and manipulation.Target-agnost
14、ic,generating efficient code for simulation and target hardware(e.g.ASIC,AMD/Intel FPGA).SystemCModels SystemC constructs and enables generating SystemC implementations.Work includes automatically generating systemC models from core dialects.FPGA physical design2Specify the placements of instances i
15、n a design instance hierarchy.Allows PD-conscious designers to define application-specific heuristics then produce a set of constraints with correct instance paths.High level constructs2Design systems at the level of unscheduled FSMs,data pipelines,broadcasts,systolic arrays,etc.Compiler can schedul
16、e/pipeline correctly,while physically optimizing.“Placement-first pipelining”1”How to Make Hardware with Maths:An Introduction to CIRCTs Scheduling Infrastructure”Video Slides 2022 European LLVM Developers Meeting,Julian Oppermann,Technical University of Darmstadt2”Using CIRCT for FPGA Physical Desi
17、gn”Paper Video LATTE22,John Demme and Aaron Landy(Microsoft)*Various levels of stability/maturity,tending towards the experimental/infant end.Core dialects:the common denominatorHW:core abstractionsOperations like module,instance(of a module)Also contains standard data types(int,array,struct,etc.)St
18、atus:complete and stableCombinational:computational ops without a sense of cycles or timeOperations like add,shift,multiply,etc.Status:complete and stableSequential:contains clocked storage elementsIntroduces a sense of time measured in cycles.Status:incomplete but stableSV:System Verilog weirdnessE
19、xporting SystemVerilog:the good,the bad,and the ugly.TL;DR:No tool implement the whole“standard”and no two in the same way corner cases abound!CIRCT abstracts away(most of)the painGoals:readable yet compatible with most tools.Requires supporting tool-specific SV“variants”.Tool specific annoyances:To
20、ken parse lengthRegister syntactic form for some power toolsAsync reset register syntactic form for some synthesis and lint toolsAutomatic logic compatibility,or lack thereofNo multi-dimensional arrays,no unpacked arrays,no structs(!)Tool specific optimizations:Muxes-so much complexity,compounded by
21、 weak pattern matching in some synthesis toolsVendor-specific annotations needed to get desired outputDemo detailsPyTorch codeimport torchclass DotModule(torch.nn.Module):def forward(self,a,b):return torch.matmul(a,b)import torchimport torch_mlirfrom dot import DotModuleshape=torch_mlir.TensorPlaceh
22、older(5,torch.int32)module=torch_pile(DotModule(),shape,shape,output_type=linalg-on-tensors)print(module)PyTorch kernel(vector dot product)Compiling with MLIRStep 1:Get to SystemVerilogStep 2:Assemble the systemHandshake:a dynamic dataflow IRDynamically-scheduled dataflow.Rather than a pre-computed
23、schedule,tokens run through a dataflow graph at runtime.Contains operations necessary to implement control constructs via dataflow.(e.g.fork,join,conditional branch,conditional merge,etc.)Lowered to valid/ready semantics per operation.The handshake implementation of the PyTorch dot product is on the
24、 right.Status:experimentalPyCDE:CIRCT Design Entry Python APISystem assemblyUsed to stitch together IP blocks and host connectivity with either:ESI(used in the demo)Ad-hoc wiringThe demo imports the PyTorch kernel module,wraps the raw wire ports in ESI then specifies ESI cosim as the external interf
25、ace.Future work:have the HLS lowering produce ESI“channel”ports to eliminate manual wrapping.Design entryCIRCT operations with a lot of syntactic sugar.NOT a Python-based HDL!Replace ad-hoc“generator”scripts which spit out RTL code.Gets all the goodness of the full CIRCT stack.(optimizations,SV comp
26、atibility,etc.)Has an instance hierarchy browser API through which physical placements can be given,typically with a user-defined heuristic.(Used inside of Microsoft to physically optimize a 60%FMax increase of an internal FPGA design.)Enter any CIRCT operation including the aforementioned high-leve
27、l constructs.Status:experimentalHand-written PyCDE wrapping code.In the future,this will not be necessary.moduleclass HandshakeToESIWrapper:#Control Ports#Generic ports always present clock=Input(types.i1)reset=Input(types.i1)#Go signal go=InputChannel(types.i1)#Done signal done=OutputChannel(types.
28、i1)#Input 0 Ports#Channels from Memory in0_ld_data0=InputChannel(types.i32)#Channels to Memory in0_ld_addr0=OutputChannel(types.i64)#Input 1 Ports#Channels from Memory in1_ld_data0=InputChannel(types.i32)#Channels to Memory in1_ld_addr0=OutputChannel(types.i64)#Output 0 Ports#Channels to Host result
29、=OutputChannel(types.i32)generator def generate(ports):#Typedefs ctrl_channel_type=types.channel(types.i1)i32_channel_type=types.channel(types.i32)i64_channel_type=types.channel(types.i64)#Instantiate the top-level module to wrap with backedges for most ports.wrapped_top=top(clock=ports.clock,reset=
30、ports.reset)#Control Ports#Go signal _,in_ctrl_valid=ports.go.unwrap(wrapped_top.inCtrl_ready)wrapped_top.inCtrl_valid.connect(in_ctrl_valid)#Done signal out_ctrl_channel,out_ctrl_ready=ctrl_channel_type.wrap(1,wrapped_top.outCtrl_valid)wrapped_top.outCtrl_ready.connect(out_ctrl_ready)ports.done=out
31、_ctrl_channel#Input 0 Ports#Channels from Memory in0_ready=wrapped_top.in0_ldData0_ready&wrapped_top.in0_ldDone0_ready in0_ld_data0_data,in0_ld_data0_valid=ports.in0_ld_data0.unwrap(in0_ready)wrapped_top.in0_ldData0_data.connect(in0_ld_data0_data)in0_valid=Mux(in0_ready,types.i1(0),in0_ld_data0_vali
32、d)wrapped_top.in0_ldData0_valid.connect(in0_valid)wrapped_top.in0_ldDone0_valid.connect(in0_valid)#Channels to Memory in0_ld_addr0_channel,in0_ld_addr0_ready=i64_channel_type.wrap(wrapped_top.in0_ldAddr0_data,wrapped_top.in0_ldAddr0_valid)wrapped_top.in0_ldAddr0_ready.connect(in0_ld_addr0_ready)port
33、s.in0_ld_addr0=in0_ld_addr0_channel#Input 1 Ports#Channels from Memory in1_ready=wrapped_top.in1_ldData0_ready&wrapped_top.in1_ldDone0_ready in1_ld_data0_data,in1_ld_data0_valid=ports.in1_ld_data0.unwrap(in1_ready)wrapped_top.in1_ldData0_data.connect(in1_ld_data0_data)in1_valid=Mux(in0_ready,types.i
34、1(0),in1_ld_data0_valid)wrapped_top.in1_ldData0_valid.connect(in1_valid)wrapped_top.in1_ldDone0_valid.connect(in1_valid)#Channels to Memory in1_ld_addr0_channel,in1_ld_addr0_ready=i64_channel_type.wrap(wrapped_top.in1_ldAddr0_data,wrapped_top.in1_ldAddr0_valid)wrapped_top.in1_ldAddr0_ready.connect(i
35、n1_ld_addr0_ready)ports.in1_ld_addr0=in1_ld_addr0_channel#Output 0 Ports out0_channel,out0_ready=i32_channel_type.wrap(wrapped_top.out0_data,wrapped_top.out0_valid)wrapped_top.out0_ready.connect(out0_ready)ports.result=out0_channelIP composability:Elastic Silicon InterconnectOur FPGA system assembly
36、 dialect“Plumbing”is tedious and error-prone.Endianness,off-by-one cycle bugs,type mismatches,etc.CDCs!Too low-level!Solution:raise the level of abstraction!Compiler technology to the rescueESI:static type safety,high-level typesUntyped buses(i.e.bundle of wires)are very brittle!When the data types
37、on the wires change,consumers misinterpret the data-bug!Static type safety has been wildly successful in software at preventing bugs.ESI has(plans for)a rich,hardware-centric type system to enable modeling accurately.Structs,array,enums,ints,etc.(table stakes)currently.Coming:Variable-length types.“
38、Data windows”to specify source,destination port bandwidth.Allows ESI to build rich,system-tailored software APIs automatically.Extend static type safety into software.Same API regardless of hardware bridge PCIe,network,simulation,etc.“Elastic Silicon Interconnects:Abstracting Communication in Accele
39、rator Design”J.Demme,LATTE21 paper talkStatus:experimentalESI:latency-insensitive modelCommunications modeled as latency-insensitive“channels”.Gives the compiler flexibility in IP-to-IP comms.(e.g.pipeline links,optimize bus widths,etc.)Build host or inter-device communication bridges anywhere.HLS a
40、nd some newer HDLs provide easy host connectivity(which is great),BUT:Can create vendor lock-in.Since they need control of the top-level,creates“language lock-in”.Each HDL/HLS needs to add support for every FPGA board(many-to-many problem).ESI provides a language and vendor agnostic system assembly
41、system:HDLs/HLS produce IP blocks with ESI interfaces.Board vendors provide an ESI board support package.Use ESI to assemble a language heterogeneous system and wire it to any board!Status:experimental“Elastic Silicon Interconnects:Abstracting Communication in Accelerator Design”J.Demme,LATTE21 pape
42、r talkPCIeHardware acceleratorDataCompressor(C+via HLS)XML Encoder(OpenCL)JSON Encoder(Chisel)TCP/IP network iface(Verilog)NetworkHostPCStructured data producer(C#)Automatable complexityESI:latency-insensitive modelESI provides a language and vendor agnostic system assembly system:HDLs/HLS produce I
43、P blocks with ESI interfaces.Board vendors provide an ESI board support package.Use ESI to assemble a language-heterogeneous system and wire it to any board!Status:experimental“Elastic Silicon Interconnects:Abstracting Communication in Accelerator Design”J.Demme,LATTE21 paper talkDemo ESI system1.Ve
44、ctor inputs sent from Python to(hand-built,future auto)Python API.2.Transferred via co-simulation RPC/DPI link to the PyTorch kernel.3.Results transferred back through same links.Host PC RTL simulationPython shellESI generated APICapN Proto RPCDPI simulation bridgeDot Product(PyTorch)Vector inputsDo
45、t product resultStatus:experimental(simplified)Host PCAzure cloud FPGAPython shellESI generated APIPCIeComing soon to a cloud FPGA near you!Dot Product(PyTorch)Vector inputsDot product resultHost runtimeFPGA“runtime”in a regionclose toESI system assembly through PyCDE cosim editionesi.ServiceDeclcla
46、ss HandshakeServices:go=esi.FromServer(types.i1)done=esi.ToServer(types.i1)read_mem=esi.ToFromServer(to_server_type=types.i64,to_client_type=types.i32)result=esi.ToServer(types.i32)moduleclass Top:clock=Input(types.i1)reset=Input(types.i1)generator def generate(ports):DotProduct(clock=ports.clock,re
47、set=ports.reset)esi.Cosim(HandshakeServices,ports.clock,ports.reset)moduleclass DotProduct:An ESI-enabled module which only communicates with the host and computes dot products.clock=Input(types.i1)reset=Input(types.i1)generator def generate(ports):#Get the go signal from the host.go=HandshakeServic
48、es.go(dotprod_go)#Instantiate the wrapped PyTorch dot product module.wrapped_top=HandshakeToESIWrapper(clock=ports.clock,reset=ports.reset,go=go)#Connect up the channels from the pytorch module.HandshakeServices.done(dotprod_done,wrapped_top.done)HandshakeServices.result(result,wrapped_top.result)#C
49、onnect up the memory ports.port0_data=HandshakeServices.read_mem(port0,wrapped_top.in0_ld_addr0)wrapped_top.in0_ld_data0.connect(port0_data)port1_data=HandshakeServices.read_mem(port1,wrapped_top.in1_ld_addr0)wrapped_top.in1_ld_data0.connect(port1_data)Uses ESI“services”.Wont have time to go into to
50、day.Using the system in Pythonimport torchimport numpy as npfrom esi_cosim import HandshakeCosimBase,get_cosim_portfrom dot import DotModuleclass DotProduct(HandshakeCosimBase):pytorch_dot=DotModule()def run(self,a,b):self.memories0=a self.memories1=b self.go()return self.read_result()def run_checke
51、d(self,a,b):print(fComputing dot product of a and b)result=self.run(a,b)tensor_a=torch.IntTensor(a)tensor_b=torch.IntTensor(b)dot=self.pytorch_dot.forward(tensor_a,tensor_b)print(ffrom cosim:result,from pytorch:dot)def rand_vec():return np.random.randint(0,100)for _ in range(5)cosim=DotProduct(get_c
52、osim_port()cosim.run_checked(rand_vec(),rand_vec()Hand-built API.In the future,this will be built automatically.class HandshakeCosimBase(CosimBase):def _init_(self,port):super()._init_(PyCDESystem/schema.capnp,fos.uname()1:port)self.done=self.openEP(1001,sendType=self.schema.I1,recvType=self.schema.
53、I1)self.memory_ports=self.openEP(1003,sendType=self.schema.I64,recvType=self.schema.I32),self.openEP(1004,sendType=self.schema.I64,recvType=self.schema.I32)self.result=self.openEP(1002,sendType=self.schema.I32,recvType=self.schema.I1)self.go_chan=self.openEP(1005,sendType=self.schema.I1,recvType=sel
54、f.schema.I1)self.memories=0,0,0,0,0,0,0,0,0,0 def go(self):self.go_chan.send(self.schema.I1.new_message(i=False)while self.readMsg(self.done,self.schema.I1)is None:self.service_memories()time.sleep(0.01)def read_result(self):result=None while result is None:result=self.readMsg(self.result,self.schem
55、a.I32)time.sleep(0.01)return result.i def service_memories(self):def service(mem,port):addr=self.readMsg(port,self.schema.I64)if addr is not None:port.send(self.schema.I64.new_message(i=memaddr.i)for port,mem in zip(self.memory_ports,self.memories):service(mem,port)$./cosim_demo.sh#Generating the ES
56、I system with PyCDE.#Outputs:SystemVerilog and Cosimulation schema.#.done.#Compiling the SystemVerilog to simulation with Verilator.#make:Entering directory/code/hot-chips-2022-pytorch-circt-hls-demo/obj_dirccache g+-I.-MMD-I/code/circt/ext/share/verilator/include.make:Leaving directory/code/hot-chi
57、ps-2022-pytorch-circt-hls-demo/obj_dir.done.#Running the Verilator simulation.#.started.#Connecting to the simulation via ESI cosim,and dropping into a python shell.#Computing dot product of 42,90,24,13,38 and 21,41,12,2,14from cosim:5418,from numpy:5418 cosim.run_checked(rand_vec(),rand_vec()Comput
58、ing dot product of 19,52,11,75,80 and 21,59,58,92,56from cosim:15485,from numpy:15485Killing simulation.killed.import torchclass DotModule(torch.nn.Module):def forward(self,a,b):return torch.matmul(a,b)PyTorch kernel(vector dot product)Demo“caveats”Doesnt produce efficient HW.Only works for a subset
59、 of PyTorch kernels.A few days worth of ad-hoc scripting(e.g.wrapping the kernel).BUTThe PyTorch/HLS flow doesnt have major funding(interns,students,and part-time).Demonstrates the potential of CIRCT!-Chisel/FIRRTL CompilerScala API for generating SystemVerilog:Support high abstraction design,type c
60、hecking to detect errors,application of SW techniquesSiFive uses Chisel pervasively:All SiFive RISC-V Cores and several SoCs built with ChiselWe have many extensions and custom things built into and around itChisel Generator Framework .svHW Designer.scalaChisel is a compiler built on the FIRRTL IROn
61、e Shot Lowering to Verilog was too complicated,so FIRRTL was introduced:Progressive(multi-level)lowering of complex types and operationsAnalyses like width inference,a form of dataflow-based type checkingCorrect generation of Verilog text is more complicated than it should beThe imperative code that
62、 builds a graph model crosscuts domains:e.g.Imperative Keras Python API TensorFlow graphWell designed IRs make it much easier to analyse and transform the design!.sv.firHW Designer.scalaBuilding robust tools is fast and cheap using the FIRRTL IR:DFT/Scan chain insertion,time-multiplexing transformat
63、ions,Host/Target clock decoupling for pause-able models,module hierarchy transforms(e.g.for power domains),run-time fault injection,circuit obfuscation,etcCustom checks:clock domain crossing,clock/reset synchronization safety,width inference checks,.Simulators:ESSENT Simulator,AWS F1 FPGA Accelerate
64、d System Simulator,.A compiler IR for hardware enabled many new tools!.sv.firCustom TransformsOther ToolsHW Designer.scalaDrop in replacement for the Scala FIRRTL compiler:Lives entirely in the CIRCT project,heavily builds(and often drives)its infrastructure work Production quality for SiFive flows(
65、among others)Will be the standard Chisel FIRRTL compiler in future chisel releasefirtool is an implementation of FIRRTL compilerLowering,Transforms,CheckersDialect.firfileParserHWDialectSystemVerilog,IP-XACT,JSON,.Other ToolsCustom TransformsCIRCT DialectsCombDialectSVDialectfirtoolStandardEDA Tools
66、External Memory Compilers etcHW/Comb/SV DialectsFIRRTL DialectLegendGenerate all the metadata in the HW dialect next to the FIRRTL dialectBoth can coexist in the same module!Progressively lower in passes by mixing dialects.fir fileHW/SV DialectsLower AnnotationsCSE/etcLower CHIRRTLInfer WidthsBlackb
67、ox MemsLower TypesExpand WhensCanonicalizeModule InliningIM ConstantPropBlackBoxReaderVerbatim MetadataGrand CentralSV Tap InterfacesCanonicalizeCreate SiFive MDMany VerbatimsEmit OMIROMIR JSON VerbatimsFIRRTL to HWCore Hardware IR Lower Mem SimulatnCIRCT PassMLIR PassIR ConstructsExtract Test CodeH
68、W CleanupsCSE/CanonicalizeLegalize ModulesPrettify VerilogExport VerilogExport HierarchySystemVerilog,IP-XACT,JSON,other text files.JSONFIR ParserThis cuts 10 minutes out of iteration cycle for large config of our OoO core!Directly drives increased designer and verif productivity,faster design space
69、 explorationMLIR/CIRCT rapidly accelerates designer iteration cycle6.5x11.2x9.9x5x6.7x9.4xChisel/FIRRTL Demo/SPDX-License-Identifier:Apache-2.0import chisel3._import chisel3.stage.ChiselStageobject NoSourceInfo implicit val noInfo=chisel3.internal.sourceinfo.UnlocatableSourceInfoimport NoSourceInfo.
70、_/*A totally generic ALU*/class ALUGenericA A,ops:Seq(A,A)=UInt)extends RawModule val a,b=IO(Input(gen)val width=a.getWidth val f_lo,f_hi=IO(Output(UInt(width.W)val opcode=IO(Input(UInt(util.log2Up(ops.size).W)val result=if(ops.size=1)ops(0)(a,b).asUInt else val _ops=ops.zipWithIndex.map case(f,op)=
71、(op.U,f(a,b)util.MuxLookup(opcode,0.U,_ops).asUInt f_hi:=result(width*2-1,width)f_lo:=result(width-1,0)object Ops val Ops8=Seq(a:SInt,b:SInt)=(a%b).asUInt,(a:SInt,b:SInt)=(a*b).asUInt,(a:SInt,b:SInt)=(a+b).asUInt,(a:SInt,b:SInt)=(a-b).asUInt,(a:SInt,b:SInt)=(a/b).asUInt,(a:SInt,b:SInt)=(b.abs()#a.ab
72、s().asUInt,(a:SInt,b:SInt)=a.max(b).asUInt,(a:SInt,b:SInt)=a.min(b).asUInt )class ALUGeneric_S8_8op extends ALUGeneric(SInt(8.W),Ops.Ops8)println(ChiselStage.emitChirrtl(new ALUGeneric_S8_8op)Scala SourceFIRRTL for ALUcircuit ALUGeneric_S8_8op:module ALUGeneric_S8_8op:input a:SInt input b:SInt outpu
73、t f_lo:UInt output f_hi:UInt input opcode:UInt node _result_T=rem(a,b)node result_defaultx=asUInt(_result_T)node _result_T_1=mul(a,b)node _result_T_2=asUInt(_result_T_1)node _result_T_3=add(a,b)node _result_T_4=tail(_result_T_3,1)node _result_T_5=asSInt(_result_T_4)node _result_T_6=asUInt(_result_T_
74、5)node _result_T_7=sub(a,b)node _result_T_8=tail(_result_T_7,1)node _result_T_9=asSInt(_result_T_8)node _result_T_10=asUInt(_result_T_9)node _result_T_11=div(a,b)node _result_T_12=asUInt(_result_T_11)node _result_T_13=lt(b,asSInt(UInt(h0)node _result_T_14=sub(asSInt(UInt(h0),b)node _result_T_15=tail
75、(_result_T_14,1)node _result_T_16=asSInt(_result_T_15)node _result_T_17=mux(_result_T_13,_result_T_16,b)node _result_T_18=lt(a,asSInt(UInt(h0)node _result_T_19=sub(asSInt(UInt(h0),a)node _result_T_20=tail(_result_T_19,1)node _result_T_21=asSInt(_result_T_20)node _result_T_22=mux(_result_T_18,_result
76、_T_21,a)node _result_T_23=cat(_result_T_17,_result_T_22)node _result_T_24=lt(a,b)node _result_T_25=mux(_result_T_24,b,a)node _result_T_26=asUInt(_result_T_25)node _result_T_27=lt(a,b)node _result_T_28=mux(_result_T_27,a,b)node _result_T_29=asUInt(_result_T_28)node _result_T_30=eq(UInt(h1),opcode)nod
77、e _result_T_31=mux(_result_T_30,_result_T_2,result_defaultx)node _result_T_32=eq(UInt(h2),opcode)node _result_T_33=mux(_result_T_32,_result_T_6,_result_T_31)node _result_T_34=eq(UInt(h3),opcode)node _result_T_35=mux(_result_T_34,_result_T_10,_result_T_33)node _result_T_36=eq(UInt(h4),opcode)node _re
78、sult_T_37=mux(_result_T_36,_result_T_12,_result_T_35)node _result_T_38=eq(UInt(h5),opcode)node _result_T_39=mux(_result_T_38,_result_T_23,_result_T_37)node _result_T_40=eq(UInt(h6),opcode)node _result_T_41=mux(_result_T_40,_result_T_26,_result_T_39)node _result_T_42=eq(UInt(h7),opcode)node result=mu
79、x(_result_T_42,_result_T_29,_result_T_41)node _f_hi_T=bits(result,15,8)f_hi=_f_hi_T node _f_lo_T=bits(result,7,0)f_lo=_f_lo_TVerilog for ALU/Generated by CIRCT sifive/1/10/0-72-gd4dbdb1fdmodule ALUGeneric_S8_8op(input 7:0 a,b,input 2:0 opcode,output 7:0 f_lo,f_hi);wire 7:0 result_defaultx;wire 15:0
80、result;/Mux.scala:80:57 assign result_defaultx=$signed(a)%$signed(b);wire _result_T_24=$signed(a)$signed(b);wire 7:015:0 _GEN=8h0,_result_T_24?a:b,8h0,_result_T_24?b:a,$signed(b)8sh0?8h0-b:b,$signed(a)8sh0?8h0-a:a,7h0,$signed(a7,a)/$signed(b7,b),8h0,a-b,8h0,a+b,8a7,a*8b7,b,8h0,result_defaultx;/Mux.s
81、cala:80:57,60 assign result=_GENopcode;/Mux.scala:80:57 assign f_lo=result7:0;/Mux.scala:80:57 assign f_hi=result15:8;/Mux.scala:80:57endmoduleCall To ActionVerification Engineers*FormalMethodsGDSIIFPGA PrototypingAnalysis Tools(e.g.clock,power domains,etc)EmulationSynthesis+SDFP&RDEFHW DesignerPhys
82、ical DesignerPower Engineer*Approximately to scaleSimulationProblem areas CIRCT is tackling so farSo far,we are just scratching the surface!Still early days:many open frontiers yet to be explored!Standardized dialects for key HW design features:SoC assembly(IP-XACT)and power modeling(UPF)dialectsLib
83、raries for key ecosystem features:VLang-Clang-like Verilog parserFormal verification tools,high performance simulatorsPhysical design backend technologies:Floor planning,synthesis,place and route algorithms,Technology specific MLIR dialects(e.g.iCE40 FPGA,Skywater PDK,TSMC 5,)New design approaches:N
84、ew approaches for MLIR-based high level synthesis(HLS)New generator frameworks that expose and utilize these capabilities!Integrate first class verification system into the design flowFostering collaboration in HW tools communityWeekly Open Design Meeting:Topic:Broad discussions about hardware desig
85、n tools,challenges,and technologiesFlexible format:spontaneous discussions,invited talks,discussions about patches,etcPublic Zoom meeting,recorded:Meeting notes include videosHistory goes back to May 2020Both industrial and academic attendees:20-40 people/weekEveryone is welcome to attend,lurk,or pr
86、esentUse/knowledge of CIRCT is not required!The future is built by an open and collaborative community:Pulling together the small group of passionate HW tool engineersThe future is built from large amounts of shared code:Extended,improved,and leveraged across the ecosystem in many toolsThe future ha
87、s high quality implementations:Fast compile times,great Clang-like error messages,hackable code baseCIRCT:Lifting hardware development out of the 20th centuryJoin us!https:/circt.llvm.orgCIRCT wants YOU!https:/circt.llvm.org/https:/ discussion boardWeekly discussions Wed.11am PTJoin us in disrupting the hardware world!“If you want to go fast,go alone.If you want to go far,go together.”CreditsMike Urbach(SiFive)wielded the duct tape for the demo.(and some of the zip ties.)All the CIRCT contributors!most