《30-Compiler opt challenges Yakushkin RVSC2023 Final.pdf》由会员分享,可在线阅读,更多相关《30-Compiler opt challenges Yakushkin RVSC2023 Final.pdf(12页珍藏版)》请在三个皮匠报告上搜索。
1、Copyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.Compiler Optimizations Challenges for High-Performance RISC-V CoresAugust 25,2023Presenter:Sergey YakushkinContributors:Konstantin Vladimirov and Syntacore Compiler TCopyright 2023 Syntacore.All tradema
2、rks,product,and brand names belong to their respective owners.2Example High-Performance Core-SCR9Linux-capable application CPU with entry-level server class features:8-16 cores per cluster(SMP and heterogeneous)Multi-issue OOO uArchCoherent NoC-based L3CHI external i/fSV39,SV48,SV57RVVHypervisorAIAA
3、ccelerators supportEarly access program*(*)some features may be not available in the initial release Copyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.3Fully-integrated system and DevKitbased on VCU118 https:/ http:/ to 24GB RAM,up to 100-150 MHz,1GB E
4、thernet,PCI/SSD storageBoots upstream Debian Linux kernel 5.15/6.1 LTSIntegrated toolchain with IDE(supports Bare Metal and Linux targets)Extra SW including OpenJDK stable buildsSCR7/9 FPGA-based DevKitCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.
5、4HostsSyntacore Development ToolkitSC-DT 2023.08 release:LLVM 16.x with optimizationsGNU GDB 13.2Open On-Chip Debugger 0.12.xQEMU 8.0.3GCC 12.2.1GNU Binutils 2.38Newlib4.10Visual Studio Code and EclipseSimulators:QEMUSpikeSAIL3rdparty vendorsJTAG-based debug solutions:SeggerJ-linkOlimexARM-USB-OCD f
6、amilyDigilentJTAG-HS2more vendors soonAlso available:COMPCERTProfiling toolsOpenJDKBare Metal+Linuxhttps:/ 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.5LLVM Compiler and OptimizationsUpstream LLVM for RISC-V in 2022Gaps vs AArch64/GCC,10%diffs on some work
7、loadsABI and conventions are evolvingFunctional issues,e.g.vectorizationSyntacore LLVM ToolchainImproved stabilitySCR uArch-aware optimizationsAdvanced optimizations for RISC-VEnhanced RVV auto-vectorizationExaCompilerMiddle-endFrontendsBackendsExecutable code(RISC-V,x86,ARM,)High level languages(/C
8、+,RUST,Kotlin,)passpasspasspassExample contributions from SW/HW vendorsLLVM is open-source compiler framework:20+years,500 000+commits,1500+developersprimary for OS(Android),languages(Swift,Kotlin,RUST),commercial tools(Intel C+Compiler),and code analyzers(Coverity)Copyright 2023 Syntacore.All trade
9、marks,product,and brand names belong to their respective owners.6Compiler optimization improves codegen for 32-bit loop counters:Identify code fragmentsAdd dynamic range checksGenerate multiple specialized code versions for different rangesApply further transformations for optimal code generationImp
10、roves some industry benchmarks up to 18%(config without Bitmanip)Patch submitted to LLVMhttps:/reviews.llvm.org/D132208Sourcefor(unsigned i=0;i N;+i)for(unsigned j=0;j N;+j)Ci*N+j=foo();UnswitchingDynamic range checks(new)IndVar simplifyfor(unsigned i=0;i N;+i)for(unsigned j=0;j N;+j)if(overflow(N*N
11、)*(C+zext(i*N+j)=foo();else/*nuw*/*(C+zext(i*N+j)=foo();if(overflow(N*N)two loops with unsigned wrapelsefor(unsigned i=0;i N;+i)for(unsigned j=0;j N;+j)*(C+zext(i*N+j/*nuw*/)=foo();if(overflow(N*N)two loops with unsigned wrapelsefor(uint64_t i=0;i N;+i)for(uint64_t j=0;j N;+j)*(C+(i*N+j)=foo();Loop
12、Optimizations:Unsigned WrapExtra 32/64 conversions inside loopCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.7Loop Optimizations:CRC-like PatternsCompilerdetects loop patterns with generalized CRC computationsTransforms code and utilize Bitmanipinst
13、ruction CLMULImproves function by 10 x,and some industry benchmarks up to 15%CRCU16CRCU8CRCU8for(i=0;i=1;if(x16=1)crc=0 x4002;carry=1;else carry=0;crc=1;if(carry)crc|=0 x8000;else crc&=0 x7fff;Compiler detects CRC-like loopxorzextclmulandclmullshrtruncResultCopyright 2023 Syntacore.All trademarks,pr
14、oduct,and brand names belong to their respective owners.8Original GP-relaxation breaks codein this exampleWrong(original):Correct(expected):Global Array Access OptimizationExample:Assembler:Optimization attempt(RISCVMergeBaseOffset):char arr42;char foo(int i)return arri;lui a5,%hi(arr)#R_RISCV_HI20,
15、R_RISCV_RELAXaddi a5,a5,%lo(arr)#R_RISCV_LO12_I,R_RISCV_RELAXadd a5,a5,a0lbu a0,0(a5)retlui a5,%hi(arr)#R_RISCV_HI20,R_RISCV_RELAXaddi a5,a5,%lo(arr)add a5,a5,a0lbu a0,%lo(arr)(a5)#R_RISCV_LO12_I,R_RISCV_RELAXretlui a5,%hi(arr)add a5,a5,a0lbu a0,offset(gp)retlui a5,%hi(arr)add a5,gp,a0lbu a0,offset(
16、a5)retlui a5,%hi(arr)#R_RISCV_HI20,R_RISCV_RELAXadd a5,a5,a0lbu a0,%lo(arr)(a5)#R_RISCV_LO12_I,R_RISCV_RELAXretRelaxation:linkerattempts to find shorter instructionsCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.9New GPREL_*relocationsThree new relo
17、cation types are supported in Syntacore LLVM toolchain:R_RISCV_GPREL_ADDR_RISCV_GPREL_LO12_IR_RISCV_GPREL_LO12_SGenerated code example(3 instructions instead of 4):Similar proposal to fold a non-constant offset into an lui/addi/lw sequence in RISC-V GNU toolchain:linkExample impact:3%for 445.gobmk S
18、PEC2klui a5,%hi(arr)#R_RISCV_HI20,R_RISCV_RELAXadd a5,a5,a0,%gprel_add(arr)#R_RISCV_GPREL_ADD,R_RISCV_RELAXlbu a0,%gprel_lo(arr)(a5)#R_RISCV_GPREL_LO12_I,R_RISCV_RELAXretCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.10More Compiler OptimizationsPar
19、tial Redundancy Elimination for Loads:11 patches(D141664 D143255)3%on SPEC2k6 471.omnetppFold terminating condition for any icmp(-lsr-term-fold D145929)0.5%overall on SPEC2k6RVV Fixes and ImprovementsFixed unwinding for RVV spills using DWARF CFA(D136263 D136264)Proposed further optimizations for ta
20、il-agnostic policy VMV/VFMV(D130895)Copyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.11SummarySyntacore LLVM is close/better than GCC,many patches submitted upstream10%+on some SPEC benchmarksFixed functional bugs,e.g.RVV spill/unwindingAreas for furt
21、her compiler improvements vs AArch64/x86 and GCCEnhancement of generic LLVM passes,MachineCombiner,Peephole,new ISA extensionsRVV generic and TA-specific optimizationsLoop optimizations and recognition of patternsOpportunities for new link-time optimizations with GPREL_*relocationsCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.Thank you!