1、A High-fidelity Flow for High-Performance RISC-V CPU DesignLuke Yen,Yuanbo Fan,Wei-Han LienConfidentialChallenges in high-performance RISC-V CPU Design2Tedious Process Years of efforts Design,implementation&tape-out Many turn-arounds uArch tuning based on PPA goals IP progress tracking Internal veri
2、fication®ression External visibility&validationPerformance Projections Starting from pre-silicon stage Target applications/benchmarksCollaboration Customized features Branch predictor&prefetcher Sub-system Cache&memory Vector unit designConfidentialMethodology3ConfidentialInstruction Set Simulato
3、r(i.e.Whisper)4Available:Github LinkSimulation Speed:100M instructions per second Static trace generation Co-simulation(with performance model)Tracing Methodology Random/systematic sampling Representative phases(i.e.simpoints)Warmup traces(for branch and I/D cache)Limitationq Static linkingq Single-
4、thread simulationConfidentialCycle-Accurate Performance Model5Cycle-accurate,event-driven ModelHighly flexible and configurableTrace-driven simulationPerformance MetricsSuch as IPC,total execution cycles,cache misses,TLB misses,branch mispredictsVisualization&DebugPipetrace:a visual representation o
5、f the pipeline execution over timeConfidentialPPA-orientated Tuning6Tradeoff:Projection Accuracy vs.Simulation SpeedThe tracing flow consists of several steps including:Whisper simulationBenchmark profilingSnapshot generationTrace generationCustomized microbenchmarkC/C+testsMajor uarch components&ti
6、ming pathsCritical perf.metrics(e.g.data cache hit latency,issue bandwidth)Calibration&Validationv Tradeoffs:Low projection error vs 10 x simulation speedupv End-to-end co-simulationNote:normalized by performance projected by 100M simpointsConfidentialTenstorrent RISC-V O-o-O Processor Family7Higher
7、 PerformanceOpen&FreePerformanceDecode Width4-Wide Decode Sonic Boom with Vector6-Wide Decode AlastorClient and Edge8-Wide Decode AscalonServer,Laptop,and HPC4-Wide Decode 3-Wide Decode 2-Wide Decode One Design and 5 IPs in a yearFuture Collaboration8Instruction Set Simulator(Whisper)Target applications/benchmarks&sharable tracesCompiler development&optimization