《SNIA-SDC23-Williams-InSRAM-Compute-For-GenAI-LLM_0.pdf》由会员分享,可在线阅读,更多相关《SNIA-SDC23-Williams-InSRAM-Compute-For-GenAI-LLM_0.pdf(53页珍藏版)》请在三个皮匠报告上搜索。
1、1|2023 SNIA.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021In-SRAM Computing For Lower Power LLMsGSI TechnologyGeorge Williams,Head of Embedded AI2|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Generative AI In The NewsIt Was The Best Of Times,It Was The W
2、orst Times3|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Generative AI ImpactMckinsey Co,2023It Was The Best Of Times4|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Energy Costs of Advanced Computinghttps:/www.nnlabs.org/power-requir
3、ements-of-large-language-modelsIt Was The Worst Of Times.5|2023 SNIA.All Rights Reserved.AgendaNext Word PredictionTransformer EssentialsVon-Neumann Architecture&BottleneckNew Paradigm:Adding Compute Into SRAMAssociative Compute Grid PowerModular IP For Size and Power BudgetsToken RatesTry It Out!6|
4、2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Next Word Prediction7|2023 SNIA.All Rights Reserved.Neural Language ModelingNextTokenPredictor8|2023 SNIA.All Rights Reserved.Neural Language ModelingNextTokenPredictortask:next token predictionidea dates back to 70s9|202
5、3 SNIA.All Rights Reserved.Neural Language Modelingtask:next token predictionidea dates back to 70s90s:RNNs,LSTMs,GRUsnothing works well until Transformerwaitjust next token?10|2023 SNIA.All Rights Reserved.Neural Language Modelinginference:more“context”is better11|2023 SNIA.All Rights Reserved.Neur
6、al Language Modelinginference:more“context”is betterpositional encoding“it”1st“it”7th12|2023 SNIA.All Rights Reserved.Neural Language Modelinginference:more“context”is betterpositional encodingattention:weighted focus0.20.40.30.020.020.040.0213|2023 SNIA.All Rights Reserved.Neural Language Modeling1
7、4|2023 SNIA.All Rights Reserved.Neural Language Modelingprompt phase:tokens can be processed in parallel(compute bound)completion phase:tokens generated 1 at a time(IO bound)15|2023 SNIA.All Rights Reserved.I Asked ChatGPT16|2023 SNIA.All Rights Reserved.I Asked ChatGPT17|2023 SNIA.All Rights Reserv
8、ed.I Asked ChatGPT18|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Transformer Essentials19|2023 SNIA.All Rights Reserved.Transformer2017:“Attention Is All You Need”,Vaswani,et.al.20|2023 SNIA.All Rights Reserved.Transformer2017:“Attention Is All You Need”,Vaswani,et
9、.al.2023:ChatGPT4,Llama 2,Palm 2,Claude 2,OpenAI:1 Billion in RevenueNvidia:100%YY Revenue21|2023 SNIA.All Rights Reserved.TransformerDecoder Is All You Need!2017:“Attention Is All You Need”,Vaswani,et.al.2023:ChatGPT4,Llama 2,Palm 2,Claude 2,OpenAI:1 Billion in RevenueNvidia:100%YY Revenue22|2023 S
10、NIA.All Rights Reserved.Transformer2017:“Attention Is All You Need”,Vaswani,et.al.2023:ChatGPT4,Llama 2,Palm 2,Claude 2,OpenAI:1 Billion in RevenueNvidia:100%YY RevenueParameter Scaling!23|2023 SNIA.All Rights Reserved.Example:ChatGPT396 layers96“attention heads”Next Token Prediction24|2023 SNIA.All
11、 Rights Reserved.Example:ChatGPT396 layers96“attention heads”175 billion parameters(“weights”)Next Token Prediction25|2023 SNIA.All Rights Reserved.Example:ChatGPT396 layers96“attention heads”175 billion parameters(“weights”)Training from scratch requires weeks on 10s-100s of GPUsNext Token Predicti
12、on26|2023 SNIA.All Rights Reserved.Example:ChatGPT327|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Compute-In-Memory For Transformer28|2023 SNIA.All Rights Reserved.Typical Von-Neumann ArchitectureThe dominate compute paradigm for 60 years!Intel Meteor Lake DieWhere
13、 is memory and where is compute?29|2023 SNIA.All Rights Reserved.Typical Von-Neumann ArchitectureThe dominate compute paradigm for 60 years!Intel Meteor Lake Die30|2023 SNIA.All Rights Reserved.Typical Von-Neumann BottleneckL1 SRAMComputeCoreIs there a better way?The dominate compute paradigm for 60
14、 years!Intel Meteor Lake Die31|2023 SNIA.All Rights Reserved.In-Memory-Computing Hardware LandscapeDigital IMCAnalog IMCAssociative IMC32|2023 SNIA.All Rights Reserved.GSI Technologys Associative ProcessorGSI APU(G1)33|2023 SNIA.All Rights Reserved.GSI APU(G1)Add Processors Into SRAMCompute-in-Memor
15、y paradigm34|2023 SNIA.All Rights Reserved.GSI APU(G1)Add Processors Into SRAMA“typical”SRAM gridSRAM Cells35|2023 SNIA.All Rights Reserved.GSI APU(G1)Add Processors Into SRAMA“typical”SRAM grid with interleaved processors.SRAM CellsBit ProcessorsBit Processors(BPs)are fully parallel and programmabl
16、e20 microns(avg)between BP and SRAM36|2023 SNIA.All Rights Reserved.GSI APU(G1)Associative Processing 37|2023 SNIA.All Rights Reserved.GSI APU(G1)Associative Processing Each BP is Simple38|2023 SNIA.All Rights Reserved.GSI APU(G1)MxN BPs forms a powerful compute grid(2M48Mb)Associative Processing 39
17、|2023 SNIA.All Rights Reserved.GSI APU(G1)Associative Processing L1 is interleaved too(96Mb)100 microns(avg)between BP and L140|2023 SNIA.All Rights Reserved.Example:ChatGPT396 layers96“attention heads”175 billion parameters(“weights”)Most operations are MAC for matrix multiplicationNext Token Predi
18、ction“its full of stars MatMul!”41|2023 SNIA.All Rights Reserved.Low Power LLM?42|2023 SNIA.All Rights Reserved.Low Power LLM?43|2023 SNIA.All Rights Reserved.6.1 TOPS(INT8)4.8 TOPS (FP8)5W TDPModular IP For Reticle and Power BudgetsExample:MatMul“Tiling”with 6MBMemory“bank”architecture accommodates
19、different size and power profiles44|2023 SNIA.All Rights Reserved.Modular IP For Reticle and Power BudgetsExample:MatMul“Tiling”with 6MB3.1 TOPS(INT8)2.4 TOPS(FP8)2.7W TDP6.1 TOPS(INT8)4.8 TOPS (FP8)5W TDP12.2 TOPS(INT8)9.5 TOPS (FP8)10W TDP45|2023 SNIA.All Rights Reserved.Llama2 Completion Phase To
20、ken Rates46|2023 SNIA.All Rights Reserved.Product AvailG1 with/2M BPs in PCIe NowG2 with/X10 L1 Interleaved Cache Q4Microcode Compiler For C/Python(OSS)NowModular IP Licensing Q4Try It Out!47|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.The End48|2023 SNIA.All Right
21、s Reserved.49|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Section TitleSection Subtitle50|2023 SNIA.All Rights Reserved.Section TitleSection Subtitle51|2023 SNIA.All Rights Reserved.Light Slide TitleBullets 1 Bullets 2 Bullets 3 Bullets 4 Bullets 552|2023 SNIA.All Rights Reserved.Dark Slide TitleBullets 1 Bullets 2 Bullets 3 Bullets 4 Bullets 553|2023 SNIA.All Rights Reserved.Please take a moment to rate this session.Your feedback is important to us.