上海品茶

您的当前位置:上海品茶 > 报告分类 > PDF报告下载

25-RISC-V中国峰会英文版-20230825终版.pdf

编号:155458 PDF 28页 13.44MB 下载积分:VIP专享
下载报告请您先登录!

25-RISC-V中国峰会英文版-20230825终版.pdf

1、Ubiquitous Intelligent Computing on RISC-V:Challenges,Methods and ToolsPresenter:Prof.Xichuan ZhouChair of AI-ML SIGCONTENTSl Background Informationl Challenges of Ubiquitous Intelligent Researchesl Ubiquitous Intelligent Computing:Technology and Software3Background|Our TeamUbiquitous Intelligent Co

2、mputing:Algorithms,AI chips&Software Tools ResearchNearly 400P domestic AI computing power,ranking 2nd nationwide and 1st in the west.Affiliation:Institute of Science on Brain Inspired Intelligence&College of Microelectronics and Communication Engineering,Chongqing University,120 faculties,500+maste

3、r and phd students AICC Center:Chongqing Artificial Intelligence Innovation Center,300+TOPs for AI TrainingThank Youl 曾Background|Personal InformationChair of RISC-V AI/ML SIGLeading to promote global RISC-V AI hardware&software research and standardizationDirector of the Institute of Science on Bra

4、in Inspired Intelligence in Chongqing UniversityDean of School of Microelectronics and Communication Engineering,Chongqing UniversityOutstanding Scientist of Chinese Institute of ElectronicsProf.Xichuan ZhouWelcome to Join RISC-V AI/ML SIGuParticipate in international and industrial standardization

5、of embedded AI technology,and promote ecological development of AI chipsStart 3 study groups on emerging technologies (AutoML SG,SNN SG,Matrix Multiply Extension SG)Charter (https:/ for AI:Organizing global research,development and evaluation of RISC-V ISA extensions for AI/ML acceleration;AI Softwa

6、re:Optimize a full software stack especially graph compiler to translate ONNX graph models to RISC-V architecture to support portability of ML,AI and NLP applications;Systems&Applications:Specify the benchmarks,performance metrics,and the evaluation methodology that will be used to track performance

7、 comparisons of optimized vs.current implementations and vs.other popular architectures;Co-optimization:Propose software-hardware co-optimizations for better support of the target applications.CONTENTSl Background Informationl Challenges of Ubiquitous Intelligent Researchesl Ubiquitous Intelligent C

8、omputing:Technology and SoftwareResearch Background|Ubiquitous ComputingOrigin:Mark Weiser proposed the concept of Ubiquitous Computing in 1991Concept:Emphasis on integrating computing and communication technologies into various physical environments to achieve ubiquitous intelligenceMilestone 1:The

9、 AIoT(Artificial Intelligence of Things)technology developed rapidly in 2017,and estimated 500 billion devices are expected to be connected by 2025Ref*Milestone 2:Large-scale model technology emerged in 2022,AI+Everything becomes possibleEntering the Era of Ubiquitous Computing82017Timeli

10、ne of Ubiquitous Intelligence Technology Development2022Mark Weiser,Ubiquitous ComputingThe Computer for the 21st Century,Scientific AmericanKevin Ashton,Internet of ThingsMITIBM,Smart PlanetAgenda of Next Generation LeadersArtificial Intelligence of ThingsAIoT Summit:the New Era of Intelligent Ever

11、ythingOpenAI,ChatGPTBaccour E,Mhaisen N,Abdellatif A A,et al.Pervasive AI for IoT applications:A survey on resource-efficient distributed artificial intelligenceJ.IEEE Communications Surveys&Tutorials,2022.Research Background|Development StatusAIoT RISC-V Demand:As the AI market continues to expand,

12、the demand for RISC-V edge devices is growing.It is predicted that RISC-V architecture chips will exceed 80 billion in 2025,accounting for 28%of the market in the future.RISC-V Drives the Era of Ubiquitous IntelligenceIn 2022In 202510 billion80 billionGlobal shipments of 10 billion,half from ChinaRe

13、fRedmond C.Beyond Removing barriers,RISC-V fuels our community to size growing opportunities.World Internet Conference 2022,China.Research Background|Algorithm Explosion ChallengeEntering the Era of Ubiquitous ComputingNumber of AI algorithm increases exponentially since 2012Popular models change ev

14、eryday,i.e.CNN,YOLO,Transformer,LLMHardware implementation is relatively slower,e.g.ASIC time-to-market 15 months Parameters for LLMs models increaseAI Papers published increaseChallenges:Heterogeneous RISC-V hardware:CPU,NPU,SoC,NoC architecturesGrowing size of DNN model:CNNs&Transformers have MB t

15、o GB of weightsLimited power consumption and delay:Battery powered&real-time computing 3.Low-power&Latency1.Hardware Heterogeneous Design Tool Optimization2.Large-Scale ModelsHardware-aware Algorithm OptimizationResearch Background|Technical ChallengesnEmbedded deep learning:100+papers in IEEE TCAS-

16、1,ISSCC,and IEEE TNNLS et.al.nAcademic book:Deep Learning on Edge Computing DevicenPatent:20+patents in the field of ubiquitous intelligent computingOpenGADL Technological InnovationCONTENTSl Background Informationl Challenges of Ubiquitous Intelligent Researchesl Ubiquitous Intelligent Computing:Te

17、chnology and SoftwareChallenges:Heterogeneous RISC-V hardware:CPU,NPU,SoC,NoC architecturesGrowing size of DNN model:CNNs&Transformers have MB to GB of weightsLimited power consumption and delay:Battery powered&real-time computing 3.Low-power&Latency1.Hardware Heterogeneous Design Tool Optimization2

18、.Large-Scale ModelsResearch Background|Technical ChallengesHardware-aware Algorithm OptimizationTo address the challenges of heterogeneous hardware deployment A free-accessible,general compatible and easy-to-use software tool is the key to promotion and ecological construction of RISC-V AI software

19、and hardwareResearch Background|Key ProblemEasy to learn&useGeneral compatibleSoftwareHardwareTensorflowRISC-V CPUs PytorchNPUsMindsporeNoCsFree of charge for non-commercial use and educationOpen-source for optimizationOvercome the challenge of learning and using for application developersOvercome t

20、he challenge of heterogeneous computingNon-AIExpertsNon-RISCV ExpertsFree accessibleOvercome the challenge of algorithm design diversity orOpen General Automated Deep Learning(OpenGADL)Opensource models,Automated algorithm design,RISC-V compatibleAI chip algorithm design software tool:automatic,grap

21、hical,heterogeneous compatibleOpenGADL Software ToolsPC Deployment+Cloud AccelerationFull stack:From labeling,training,evaluation to deploymentOpenGADL Data SoftwareGADL Wechat GroupOpenGADL DATA Management ToolAutomated data annotation using LLM technologyOpenGADL Model SoftwareOpenGADL Model Desig

22、n ToolEasy-to-Use Graphical Customized Model DesignGADL Wechat GroupOpenGADL Model SoftwareGADL Wechat GroupOpenGADL Model Training ToolAutomated Blackbox Model Design and QuantizationDeployment for Inference on RISC-V ChipsOpenGADL Model SoftwarePlatform:Allwinner NezhaD1 development board supports

23、 RISC-V V extension instructions with the Xuantie C906 processor at a clock speed of 1GHz and 1GB of memory.Training and deployment tools:OpenGADL&T-Head HHB Compiler.Comparison of performance under different configurationsQuantization Graph OptimizationOpenGADLGraph CodegenTarget DeployCSI-NN2 APIH

24、HBHeterogeneous SchedulerExecutorMatrixVectorVectorCPUNPU DriverHardwarePlatform盎Nezha D1CPU盎C906 1GHzMemory盎1GBQuantizationPruneTrainingEvaluationHardware DevelopmentAutomatic toolsModelQuantizationLatency(ms)FPSresnet18float32233.994.27float16108.729.2lraspp_mobilenet_v3float328228.030.12OpenGADL

25、Model Optimization and Deployment on RISC-VChallenges:Heterogeneous RISC-V hardware:CPU,NPU,SoC,NoC architecturesGrowing size of DNN model:CNNs&Transformers have MB to GB of weightsLimited power consumption and delay:Battery powered&real-time computing 3.Low-power&Latency1.Hardware Heterogeneous Sta

26、ndardization2.Large-Scale ModelsResearch Background|Technical ChallengesHardware-aware Algorithm OptimizationChallenge:It is difficult to achieve computational acceleration on hardware with unstructured sparseness.Method:We propose to select features with group-based sparsification and structurally

27、prune redundant connections of neural network,realizing model reasoning acceleration.Benefit:High Efficiency,Inference speed increased by 2x-5x,and the model as small as 50KB.1 Xichuan Zhou,et al.Deep learning with grouped features for spatial spectral classification of hyperspectral imagesJ.IEEE Ge

28、oscience and Remote Sensing Letters,2016,14(1):97-101.2 Xichuan Zhou,et al.MicroNet:Realizing micro neural network via binarizing GhostNetC.International Conference on Intelligent Computing and Signal Processing(ICSP).IEEE,2021:1340-1343.Progress and Main ChallengesGroup-basedSparsification1.Structu

29、red Model Sparsification Method1.Structured Model Sparsification2.Low-bit Quantization3.Parallel Enhancement4.Latency Constrained OptimizationChallenge:Simple applications on RISC-V have high-bit representation redundancy,and there is still room for further compression of neural network models.Metho

30、d:Co-optimization of structured sparseness and low-bit quantization(binary quantization of the weights).Benefit:Higher Efficiency,80%weights pruned,save upto 99%storage(compared to 16bit non-sparse model)Xichuan Zhou,et al.DANoC:an efficient algorithm and hardware codesign of deep neural networks on

31、 chipJ.IEEE Transactions on Neural Networks and Learning Systems,2017,29(7):3176-3187.DANoC system for multi-sensor fusion&visionFeedforward Sparse Network StructureProgress and Main Challenges2.Low-bit Quantization Method1.Structured Model Sparsification2.Low-bit Quantization3.Parallel Enhancement4

32、.Latency Constrained OptimizationChallenge:Low-bit quantization methods have serious performance degeneration on complex vision tasks.Method:Using parallel binarized sub-network structure and lateral connections to enhance the representation ability and reconfigurability of the binarized neural netw

33、orks.Benefit:Classification accuracy increased by 10.6%,with storage complexity is reduced by 16x,overall computational complexity is reduced by 58x.Downstream Vision TasksXichuan Zhou,et al.Cellular binary neural network for accurate image classification and semantic segmentationJ.IEEE Transactions

34、 on Multimedia,2022.Autonomous Driving ApplicationProgress and Main Challenges3.Parallel Enhancement Method1.Structured Model Sparsification2.Low-bit Quantization3.Parallel Enhancement4.Latency Constrained OptimizationChallenge:It is difficult to meet the different requirements in different applicat

35、ion scenarios for model deployment.Method:A dynamic reconfigurable binarized model computing architecture DRNoC is proposed to meet the model deployment requirements in different scenarios.Benefit:Hardware flexibility improved,FPGA implementation of binarized model can be dynamically reconfigured to

36、 achieve arange of accuracy and throughput in different scenarios.The overview of DRNoCHardware test platform constructionExperimental resultsXichuan Zhou,et al.DRNoC:A Dynamic Reconfigurable Binary Neural Network on Chip with Adjustable Throughput-Accruacy TradeoffJ.IEEE Transactions onCircuits and

37、 Systems for Video Technology.(Prepare to submit)Progress and Main Challenges3.Parallel Enhancement Architecture1.Structured Model Sparsification2.Low-bit Quantization3.Parallel Enhancement4.Latency Constrained OptimizationChallenge:Designing hardware-efficient networks manually requires significant

38、 human and material resources,and existing hardware-aware Neural Archetecture Search(NAS)methods have not been optimized for RISC-V devices.Method:We propose the first inference latency dataset tailored to RISC-V devices,design a DNN based nonlinear latency predictor,and seamlessly integrate it with

39、 NAS algorithms to reduce search time.Benefit:Latency reduced,with networks obtained from the search all meet the constrained latency.When the accuracy is similar,the NAS search time is reduced by 30%,and the inference latency is reduced by 20-30%.MethodSearch Time(s)Top-1 Acc(%)InferenceTime(ms)CIF

40、AR-10EPE-NAS(baseline)190.821.2791.502.04481314Ours(300ms)49.602.3689.071.3218531Ours(500ms)125.572.4290.851.88331115ImageNet16-120EPE-NAS(baseline)190.861.3737.596.7412254Ours(100ms)65.121.7034.784.676523Ours(150ms)125.812.2337.415.299437Xichuan Zhou,et al.Latency-Constrained NAS Method for Efficie

41、nt Model Deployment on RISC-V Devices.IEEE Trans.on Circuits and Systems II(Submitted)Progress and Main Challenges4.Latency Constrained Optimization Method1.Structured Model Sparsification2.Low-bit Quantization3.Parallel Enhancement4.Latency Constrained OptimizationFuture Work LLM Training:The openG

42、ADL Pro version will be released in October 2023,supporting vertical field fine-tuning training of large models.RISC-V Inference:OpenGADL intends to be compatible with mainstream RISC-V chips in 2025.SNN ISA extension(from SIG vice chair Suresh):Instruction extension for SNN inference.Other interest

43、ing researches including spiking LLMs and RISC-V implementation for low-power ubiquitous computing.RISC-V SNN ISA ExtensionsLLM inference on RISC-VSNN on RISC-VConclusionl Discuss three main challenges of implementing artificial intelligence on RISC-V,including hardware heterogeneity,high model comp

44、lexity and restricted hardware resources;l To address these challenges,propose Open General Automated Deep Learning(OpenGADL)software platform;l Introduce our latest researches to improve the computing efficiency of deep learning on RISC-V devices,i.e.technologies ranging from structured model sparsification,low-bit quantization,parallel enhancement and latency constrained optimization.Thank You

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(25-RISC-V中国峰会英文版-20230825终版.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
会员购买
客服

专属顾问

商务合作

机构入驻、侵权投诉、商务合作

服务号

三个皮匠报告官方公众号

回到顶部