《HC2022.KAIST.SangyeobKim.v1.pdf》由会员分享,可在线阅读,更多相关《HC2022.KAIST.SangyeobKim.v1.pdf(25页珍藏版)》请在三个皮匠报告上搜索。
1、1 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringNeuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringSangyeob K
2、im1,Sangjin Kim1,Soyeon Um1,Soyeon Kim1,Kwantae Kim2,and Hoi-Jun Yoo11School of Electrical Engineering,KAIST2Institute of Neuroinformatics,University of Zurich and ETH Zurich2 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-An
3、alog Mixed-mode Neuron FiringComputing-in-Memory(CIM)Accelerator Multi WLs Driving Low Energy Efficiency by ADC(100 TOPS/W)Cons1 WL Multi Cells Active1 Col.Multi WLs ActiveADC/DAC Large Power Large AreaADCW01W02W0MW10W11W12W1MW20W21W22W2MWN0WN1WN2WNM DACDigital Input MemoryDigital Output MemoryCIM A
4、rchitectureInput MemoryWeightMemoryBottleneckMACMACMACMACMACMACMACMACMACMACMACMACMAC ArrayNPU ArchitectureOutput MemoryXi x WijW00 Xi x WijDigitizationProsMEM Access ReductionAnalog Accumulation3 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL act
5、ivity and Digital-Analog Mixed-mode Neuron FiringLimitation of Previous CIMs1.High Precision ADC is Required for Digital Output Activations2.Low Energy Efficiency due to Low Sparsity in Real ConditionsADC59%Driver21%Controller20%0Energy Efficiency(TOPS/W)2040608010035.675.96.22.85.8Peak Efficiency C
6、IFAR-10 ImageNetISSCC 21ISSCC 204 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringNeuromorphic CIM Processor ADC and DAC are Not Necessary Power/Area Reduction Event-driven operation Input sparsi
7、ty,but low weight sparsityCons1 WL Multi Cells1 Col.Multi WLsLimited range of VBLProsHigh Input SparsityNo ADC/DAC ADCW01W02W10W11W12W20W21W22WN0WN1WN2 Digital Output MemoryW00CIM Architecture Input MemoryNeuromorphic CIMUsing Binary SpikesDACDigital Input MemoryW0MW1MW2MWNMW01W02W0MW10W11W12W1MW20W
8、21W22W2MWN0WN1WN2WNMW00 Si(t)&WijSi(t)&Wij1b Compartor Si(t)&Wij Threshold Output Spike Gen.5 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringLimited VBLRange of Neuromorphic CIM VBLResolution an
9、d Range are Limited by the Number of WL(NWL)Range of VBL:0 NWLx VBL,Resolution of VBL:VBL,MAX/NWL Output MemoryNeuromorphic CIMNWL Word Lines00.80.70.60.50.10.20.30.40000001110111Bit line Voltage(V)Ideal CaseN Increase Error IncreaseErrorOutput CodeW01W02W10W11W12W20W21W22WN0WN1WN2W00Spik
10、eW0MW1MW2MWNM I(t)dtVSCBLStep1.VBLNWL Cells AccumW1CBL0TWL I(t)dtW1CBL0TWL IBLStep2.VBL0NResolution=VBL,MAX/NWLVMAXVBL,MAX6 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringHigh Accuracy of Neurom
11、orphic CIM Spiking-Neural-Network(SNN)Conversion from trained CNN After CNN training,transfer weight to SNN Highly Accurate SNN 1,21 J.Wu et al.“Progressive tandem”,TPAMI 20212 N.Rathi et al.“DIET-SNN”,TNNLS 2021TypeDatasetAcc.(%)CNNSNNCNNSNNCNNSNNImageNetCifar-100Cifar-1090.8090.05171.8269.67270.08
12、69.002CNN9 4 11 1 11 2 10 2 14 1 01 0 1*InputWeight16OutputMax(0,Output)V(t+1)=V(t)+Si(t+1)&WiS0(t)Sn(t)S1(t)IF Out S(t)WnW0W1V(t)ThresholdSNNReLURef)https:/snntoolbox.readthedocs.io/en/latest/7 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL acti
13、vity and Digital-Analog Mixed-mode Neuron FiringOverall Architecture 16 Macros with 16 Banks Each Bank has 64x10 8T cell array Cap-based Adder,Voltage Folding Logic,Comparator array,PGABank 0Macro 08T8T8TCell Array(64 x 10)Peri8T8TRWL0CWL08T8T8T8T8TRWL63CWL63Bank 1Bank 15Sub-CWL0,0IMEM(16KB)&AP Gene
14、ratorMacro 0Bank0PeriPGABank1PeriPGABank15PeriPGAMacro 1Bank0PeriPGABank1PeriPGABank15PeriPGAMacro 15Bank0PeriPGABank1PeriPGABank15PeriPGAOMEM(16KB)Cap-based AdderVoltage FoldingProgrammable Gain Amplifier(PGA)8 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor wi
15、th Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringOverall ArchitectureBank 0Macro 08T8T8TCell Array(64 x 10)Peri8T8TRWL0CWL08T8T8T8T8TRWL63CWL63Bank 1Bank 158T SRAMSub-CWL0,0-1 Flag Bit Line4 Nominal Bit LinesCap-based AdderProgrammable Gain Amplifier(PGA)Voltage Folding8T SRAMVDD8T S
16、RAMVDDVDDSub-CWLRWL1.MSB Word Skipping w/-1 Flag2.Early Stopping w/Sub-WL driver3.Mixed-mode Firingw/Voltage Folding9 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringCharacteristics of Weight Sto
17、red in CIM High Negative Sign Extended Bits(1111xxxx)Ratio of MSB Part Weight has gaussian distribution Most MSB has negative sign extended bits 45%of total computation power is consumed by processing negative sign#of Layer(ResNet-18 for CIFAR-10)#of Weight043200Ratio(%)95%4b M
18、SB=1111/0000LSB8b 1111xxxx8bit Weight ValueMaxMin8b 0000 xxxxMSB8bit Weight4b MSB4b LSB10 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringMSB Word Skipping(MWS)with-1 Flag MSB BL Activity Reducti
19、on MSB-1 Flag BL enables BL voltage not to switch.4bit Negative sign extend bits(1111)5bit-1 flag+zeros(10000)DataDataCellCellCellCellCellCellCellCellCellCellCellCellCellCellCellCellCellCell-1 Flag8bit Weight(11111010)Peripheral Logic000010101Main CWLMain CWLSub-CWL 111110101Sub-CWL 2Sub-CWL 1Sub-CW
20、L 2RWLRWLEnable-1 FlagDisableEnableBit ConversionNo Bit ConversionSign11 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringMSB Word Skipping(MWS)with-1 Flag MSB BL Activity Reduction Two-1 Flag BLs
21、 enable BL voltage not to switch.2bit Negative sign extend bits(11)3bit-1 flag+zeros(100)DataDataCellCellCellCellCellCellCellCellCellCellCellCell-1 Flag4bit Weight(1101)Peripheral Logic001014bit Weight(1110)-1 FlagCellCell01001Main CWLMain CWLSub-CWL 101011011CellCellCellCellSub-CWL 2Sub-CWL 1Sub-CW
22、L 2RWLRWLEnableEnableEnableConversionNo ConversionConversionNo ConversionSignSign12 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringMSB Word Skipping(MWS)with-1 Flag Performance of MSB Word Skipp
23、ing 38%power consumption reduction 8b weight mode case 25%power consumption reduction 4b weight mode caseVoltage(V)0.50.40.30.20.10.005540-0.00026-0.00024-0.00022-0.00020-0.00018-0.00016-0.00014-0.00012-0.000100.00.10.20.30.40.5BA BVBL w/MWSVBL w/o MWSVBL SwitchingReductionTime(ns)0Power
24、Consumption(mW)5001122411698b Mode4b Mode38%8425%w/MWSw/o MWS13 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringMotivation of Early Stopping(ES)Power Reduction by Eliminating Redundant
25、 Operation Small membrane voltage(VMEM)neuron No output spike If VMEM VES(Hyper Param)TES Early stoppingPre 0Pre 1Pre 2NeuronVESTimeVMEMNo Spike!Stop NextOperationstVoltagetPre 0Pre 1Pre 2NeuronVTHTimeVMEMVoltage SPre*WPos SPre*WNegEarly Stopping Power ReductionVerySmall!NextNext14 of 25HOTCHIPS 202
26、2Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron Firing(1)VPOSGeneration Connecting Bridge Cap.and Accumulating 2s Complement Weight Only BL voltage of data bit lines pos.adder(Flag&Sign bit not transferred)Positive add
27、er generates positive part of(W&S-Threshold)CellCellCellCellCellCellCellCellCellCellCellCellCellCellCellCellCellCell8C4C2C1CGND16/15CPositive Adder8C4C2C1CControl Logic-1 FlagSignData-1 FlagDataVPOSSub-WL ENNegative Value of Threshold15 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computi
28、ng-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron Firing(2)VNEGGeneration Connecting Bridge Cap.and Accumulating 2s Complement Weight BL voltage of-1 flag bits and sign bit are transferred to neg.adder Negative adder generates negative part of(W&S-Threshold)CellCell
29、CellCellCellCellCellCellCellCellCellCellCellCellCellCellCellCell4C8CGND16/15CNegative Adder1CControl Logic1C2C8C4C2C|VNEG|-1 FlagSignData-1 FlagDataSub-WL ENNegative Value of Threshold16 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and
30、 Digital-Analog Mixed-mode Neuron Firing(3)Stopping before TES 1-bit Analog Comparator Generates Output Spike VPOS VNEGin comparator output spike firing Spike counter stores the number of output spikes for output memoryMembrane Voltage(V)0.60.50.40.30.20.10.0055404550556065707580VPOSVNEG-
31、0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A-
32、0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A+VTH+VNEG+VPOS-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.
33、4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.
34、4Smoothed Y1A-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1ATime(ns)-0.0004-0.00020.00000.00020.00040.00060.0008-0.10.00.10.20.30.4Smoothed Y1AVPOS VNEG+VTHMEM Potential VTHFireCellCellControl LogicPos/Neg AdderHighSpikeCounter+-VPOSVNEGFiring Logic1b CompAccumulationVPOS
35、 VNEG Output Spike17 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron Firing(4)Stopping TES Predicting Non-Firing Neuron and Stopping Neuron Operation At TES,compare VDIFFand early stop voltage(VES)If V
36、DIFF VES,processing is stopped to reduce powerVoltage(V)0.60.50.40.30.20.10.005540Threshold for NeuronVESVDIFFVDIFF VES Early StopTES(ESEN High)Time(ns)VDIFF VES Sub-WL EN LowCellCellControl LogicPos/Neg AdderSpikeCounter+-VPOSVNEG1b Comp+-ES LogicVESESEN TESGND(VPOSVNEG)VDIFFFiring Logic
37、 GatingLowGating+VPOS+VNEG18 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringPerformance of Early Stopping Reducing Power Consumption by Early Stopping(CIFAR-10)5070%of neurons in each layer are
38、early terminated by prediction 37.6%power consumption is reduced by early StoppingAccuracy(%)Power(mw)BaseBaseESES1140800Early Stop Ratio(%)Layer(ResNet-12)02000904550100150168.8105.41%lossEarly Stopping Ratio19 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory P
39、rocessor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringMotivation of Voltage Folding Virtually Increasing VLSBof Membrane Voltage(VMEM)By Voltage Folding VMEM Folding Count+Residue Voltage Amplifying the residue voltage Increasing the range and VLSBof VMEMVBL,MAXVoltageVFOLDAcc.
40、After FoldingRange1Range2Range3VMAX(0.7V)VoltageCellCellFolding Logic I(t)dtW1CBL0TWL 0N I(t)dtW1CBL0TWL VBLAccum VMEMPos/Neg AdderVoltage Rangex3 Increase063VMAX(0.7V)VoltageOriginal VMEMTarget VMEMAcc.Acc.Multi MacroAggegation3VMAX20 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computin
41、g-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringVoltage Folding Virtually Increasing Range of Membrane Voltage(VMEM)VINis folded (V1+V2)/2,(V2+V3)/2,(V3+V4)/2 Generating two voltages(VXand VY)with a phase difference of 180 degreesVYVDDVINVXVoltage Folding Cir
42、cuitVINVYVIN(V1+V2)/2VX(V2+V3)/2(V3+V4)/2V4V3V2V1Q1Q2Q3Q4Q5Q6Q7Q821 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringVoltage Folding Virtually Increasing Range of Membrane Voltage(VMEM)Positive sl
43、ope voltage selection between VXand VY in Folding Circuit increasing range of VIN continuouslyVYVDDVINVXVoltage Folding CircuitVINVYVIN(V1+V2)/2VX(V2+V3)/2(V3+V4)/2Rangex1Rangex2Rangex3Residue:Positive SlopeV4V3V2V1Q1Q2Q3Q4Q5Q6Q7Q822 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-
44、in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringMeasured Waveforms of Voltage Folding Increasing the Range of Voltage and Generating Folding Count VMEM Analog Folded output voltage(VSEL)+Digital folding count High Virtual Range Multi-Macro Aggregation w/o High
45、Precision ADCVoltage(V)2.11.40.70.005540455055606570(ns)Ideal slope for VPOS,SEL w/o foldingCount 00Count 01Count 10VX SelectedVY SelectedVY Selected(Range 0)(Range 1)(Range 2)VFOLDVPOS,SEL w/foldingSingle-MacroFiringCell ArrayCell ArrayFiringSpikes OCH0Spikes OCH1ICH063ICH063Multi-MacroD
46、isalbedCell ArrayICH063ICH64127Cell ArrayDisalbedCell ArrayFiringSpikes OCH0ICH128191FiringCell ArrayCell ArrayFiringSpikes OCH2ICH063High Reconfigurability23 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode N
47、euron FiringChip Summary Chip Photograph and PerformanceFPGA SystemDC Power AnalyzerClassification PlatformNeuro-CIMTechnology28nmDie Area3228m x 900mDigital SRAM32 KBSupply1.1VFrequency200MHzSingle Macro Area0.048System Energy Efficiency(TOPS/W)I=4b,W=1b:310.43)I=4b,W=4b:124.23)I=4b,W=8b:62.13)0Mac
48、ro Power(mW)15.81)36.22)System Power(mW)105.41)241.42)Total CIM Storage32 KB1)w/MWS&ES 2)w/o MSB&ES 3)CIFAR-10 with ResNet-18(64x160 x16)24 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity and Digital-Analog Mixed-mode Neuron FiringConclus
49、ion Neuro-CIM:An Energy-Efficient Neuromorphic CIM+SNN Processor CIM:Reducing memory access and multiple WL driving SNN:Generating input sparsity and eliminating high-precision ADC For Energy-Efficient Neuromorphic CIM Processing MSB Word Skipping Reducing 2538%power consumption Early Stopping Reduc
50、ing 37%power consumption Mixed-mode Neuron Firing Increasing the voltage range x3A 310.4 TOPS/W 1034.6 TOPS/W Bank Neuromorphic CIM Processorfor Energy Efficient Neural Network Processing 25 of 25HOTCHIPS 2022Neuro-CIM:A 310.4 TOPS/W Neuromorphic Computing-in-Memory Processor with Low WL/BL activity
51、 and Digital-Analog Mixed-mode Neuron FiringThank You!Questions?Feel Free to Contact Me!E-mail:sangyeob.kimkaist.ac.kr LinkedIn:https:/ Meeting:https:/us05web.zoom.us/j/3753663353?pwd=dzNlMFh3M0pIeWJiM0dVZmxLNVVoUT09 Acknowledgement This work was supported by the Samsung Electronics.(IO201207-07799-01)