上海品茶

您的当前位置:上海品茶 > 报告分类 > PDF报告下载

30-Compiler opt challenges Yakushkin RVSC2023 Final.pdf

编号:155468 PDF 12页 1.09MB 下载积分:VIP专享
下载报告请您先登录!

30-Compiler opt challenges Yakushkin RVSC2023 Final.pdf

1、Copyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.Compiler Optimizations Challenges for High-Performance RISC-V CoresAugust 25,2023Presenter:Sergey YakushkinContributors:Konstantin Vladimirov and Syntacore Compiler TCopyright 2023 Syntacore.All tradema

2、rks,product,and brand names belong to their respective owners.2Example High-Performance Core-SCR9Linux-capable application CPU with entry-level server class features:8-16 cores per cluster(SMP and heterogeneous)Multi-issue OOO uArchCoherent NoC-based L3CHI external i/fSV39,SV48,SV57RVVHypervisorAIAA

3、ccelerators supportEarly access program*(*)some features may be not available in the initial release Copyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.3Fully-integrated system and DevKitbased on VCU118 https:/ http:/ to 24GB RAM,up to 100-150 MHz,1GB E

4、thernet,PCI/SSD storageBoots upstream Debian Linux kernel 5.15/6.1 LTSIntegrated toolchain with IDE(supports Bare Metal and Linux targets)Extra SW including OpenJDK stable buildsSCR7/9 FPGA-based DevKitCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.

5、4HostsSyntacore Development ToolkitSC-DT 2023.08 release:LLVM 16.x with optimizationsGNU GDB 13.2Open On-Chip Debugger 0.12.xQEMU 8.0.3GCC 12.2.1GNU Binutils 2.38Newlib4.10Visual Studio Code and EclipseSimulators:QEMUSpikeSAIL3rdparty vendorsJTAG-based debug solutions:SeggerJ-linkOlimexARM-USB-OCD f

6、amilyDigilentJTAG-HS2more vendors soonAlso available:COMPCERTProfiling toolsOpenJDKBare Metal+Linuxhttps:/ 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.5LLVM Compiler and OptimizationsUpstream LLVM for RISC-V in 2022Gaps vs AArch64/GCC,10%diffs on some work

7、loadsABI and conventions are evolvingFunctional issues,e.g.vectorizationSyntacore LLVM ToolchainImproved stabilitySCR uArch-aware optimizationsAdvanced optimizations for RISC-VEnhanced RVV auto-vectorizationExaCompilerMiddle-endFrontendsBackendsExecutable code(RISC-V,x86,ARM,)High level languages(/C

8、+,RUST,Kotlin,)passpasspasspassExample contributions from SW/HW vendorsLLVM is open-source compiler framework:20+years,500 000+commits,1500+developersprimary for OS(Android),languages(Swift,Kotlin,RUST),commercial tools(Intel C+Compiler),and code analyzers(Coverity)Copyright 2023 Syntacore.All trade

9、marks,product,and brand names belong to their respective owners.6Compiler optimization improves codegen for 32-bit loop counters:Identify code fragmentsAdd dynamic range checksGenerate multiple specialized code versions for different rangesApply further transformations for optimal code generationImp

10、roves some industry benchmarks up to 18%(config without Bitmanip)Patch submitted to LLVMhttps:/reviews.llvm.org/D132208Sourcefor(unsigned i=0;i N;+i)for(unsigned j=0;j N;+j)Ci*N+j=foo();UnswitchingDynamic range checks(new)IndVar simplifyfor(unsigned i=0;i N;+i)for(unsigned j=0;j N;+j)if(overflow(N*N

11、)*(C+zext(i*N+j)=foo();else/*nuw*/*(C+zext(i*N+j)=foo();if(overflow(N*N)two loops with unsigned wrapelsefor(unsigned i=0;i N;+i)for(unsigned j=0;j N;+j)*(C+zext(i*N+j/*nuw*/)=foo();if(overflow(N*N)two loops with unsigned wrapelsefor(uint64_t i=0;i N;+i)for(uint64_t j=0;j N;+j)*(C+(i*N+j)=foo();Loop

12、Optimizations:Unsigned WrapExtra 32/64 conversions inside loopCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.7Loop Optimizations:CRC-like PatternsCompilerdetects loop patterns with generalized CRC computationsTransforms code and utilize Bitmanipinst

13、ruction CLMULImproves function by 10 x,and some industry benchmarks up to 15%CRCU16CRCU8CRCU8for(i=0;i=1;if(x16=1)crc=0 x4002;carry=1;else carry=0;crc=1;if(carry)crc|=0 x8000;else crc&=0 x7fff;Compiler detects CRC-like loopxorzextclmulandclmullshrtruncResultCopyright 2023 Syntacore.All trademarks,pr

14、oduct,and brand names belong to their respective owners.8Original GP-relaxation breaks codein this exampleWrong(original):Correct(expected):Global Array Access OptimizationExample:Assembler:Optimization attempt(RISCVMergeBaseOffset):char arr42;char foo(int i)return arri;lui a5,%hi(arr)#R_RISCV_HI20,

15、R_RISCV_RELAXaddi a5,a5,%lo(arr)#R_RISCV_LO12_I,R_RISCV_RELAXadd a5,a5,a0lbu a0,0(a5)retlui a5,%hi(arr)#R_RISCV_HI20,R_RISCV_RELAXaddi a5,a5,%lo(arr)add a5,a5,a0lbu a0,%lo(arr)(a5)#R_RISCV_LO12_I,R_RISCV_RELAXretlui a5,%hi(arr)add a5,a5,a0lbu a0,offset(gp)retlui a5,%hi(arr)add a5,gp,a0lbu a0,offset(

16、a5)retlui a5,%hi(arr)#R_RISCV_HI20,R_RISCV_RELAXadd a5,a5,a0lbu a0,%lo(arr)(a5)#R_RISCV_LO12_I,R_RISCV_RELAXretRelaxation:linkerattempts to find shorter instructionsCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.9New GPREL_*relocationsThree new relo

17、cation types are supported in Syntacore LLVM toolchain:R_RISCV_GPREL_ADDR_RISCV_GPREL_LO12_IR_RISCV_GPREL_LO12_SGenerated code example(3 instructions instead of 4):Similar proposal to fold a non-constant offset into an lui/addi/lw sequence in RISC-V GNU toolchain:linkExample impact:3%for 445.gobmk S

18、PEC2klui a5,%hi(arr)#R_RISCV_HI20,R_RISCV_RELAXadd a5,a5,a0,%gprel_add(arr)#R_RISCV_GPREL_ADD,R_RISCV_RELAXlbu a0,%gprel_lo(arr)(a5)#R_RISCV_GPREL_LO12_I,R_RISCV_RELAXretCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.10More Compiler OptimizationsPar

19、tial Redundancy Elimination for Loads:11 patches(D141664 D143255)3%on SPEC2k6 471.omnetppFold terminating condition for any icmp(-lsr-term-fold D145929)0.5%overall on SPEC2k6RVV Fixes and ImprovementsFixed unwinding for RVV spills using DWARF CFA(D136263 D136264)Proposed further optimizations for ta

20、il-agnostic policy VMV/VFMV(D130895)Copyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.11SummarySyntacore LLVM is close/better than GCC,many patches submitted upstream10%+on some SPEC benchmarksFixed functional bugs,e.g.RVV spill/unwindingAreas for furt

21、her compiler improvements vs AArch64/x86 and GCCEnhancement of generic LLVM passes,MachineCombiner,Peephole,new ISA extensionsRVV generic and TA-specific optimizationsLoop optimizations and recognition of patternsOpportunities for new link-time optimizations with GPREL_*relocationsCopyright 2023 Syntacore.All trademarks,product,and brand names belong to their respective owners.Thank you!

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(30-Compiler opt challenges Yakushkin RVSC2023 Final.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
会员购买
客服

专属顾问

商务合作

机构入驻、侵权投诉、商务合作

服务号

三个皮匠报告官方公众号

回到顶部