《Adam’s Bridge Accelerator.pdf》由会员分享,可在线阅读,更多相关《Adam’s Bridge Accelerator.pdf(15页珍藏版)》请在三个皮匠报告上搜索。
1、Bridge the Gap Between Existing Public-Key Cryptography and Post-Quantum CryptographyAdams Bridge AcceleratorMojtaba Bisheh-Niasar,Senior Hardware Engineer,MicrosoftBharat Pillilli,Principal Hardware Engineer,MicrosoftBryan Kelly,Partner Software Engineer,MicrosoftAdams Bridge AcceleratorSECURITY AN
2、D DATA PROTECTIONSECURITYIntroductionNIST PQC standardization processOur MotivationAdams Bridge AcceleratorDilithium(ML-DSA)BackgroundNTT ArchitectureKeccak and Samplers DesignSide-Channel ConsiderationPerformanceConclusionOutlinePerformanceSecuritySilicon areaEnergyPowerTimeFrequencyDesignCostsDesi
3、gn TimeFlexibilitySCA leakageImportance of Public key cryptography Current public key cryptosystems are based on:Factoring large integers(RSA)Discrete logarithms(ECC)Post-quantum cryptography(PQC)is a public-key(asymmetric)crypto that resists attacks using classical and quantum computers.There are s
4、everal quantum-safe approaches.Lattice-based crypto is the most promising scheme.IntroductionThese problems would be easy to solve on a quantum computer.Why NOW?Record encrypted data now,decrypt it once you have a quantum computer!What is included?Kyber(ML-KEM)Dilithium(ML-DSA)NIST PQC Standardizati
5、on ProcessStart PQC Standardization Process(69 Candidates)Round 1(26 Candidates)Round 2(15 Candidates)Round 3(4+5 Candidates)Initial SelectionRound 4 July 20222020201920172016Quantum Computer2030?StandardizationdraftAugust 2023FIPS Documents2024PQC is necessary in quantum computing universe,particul
6、arly on hardware platform!Develop a PQC Accelerator to meet different performance level requirementPure Hardware Accelerator to enhance performance,SCA protection,Commencing development today ensures preparedness for the future need of PQCEnhance CALIPTRA to be a quantum resilient root of trust engi
7、neMotivationChallengesPQC is NOT standardized yetDiffer significantly from the current crypto systemsExisting designs are not suitable:focused on performance,reference,researchGap in the design trade-off exploration related to resource utilization and performanceDilithium(ML-DSA)Digital Signature Al
8、gorithmTwo performance levels target:Embedded ArchitectureHigh-Speed ArchitectureSupport all operations:KeyGenSigningVerifyingHands-Off interactionEmbedded SCA countermeasuresAdams Bridge AcceleratorMemoryHashingSamplersNTTPWMAdd/SubRejectionSample InBallSIPOKeccakPISOMakeHint/UseHintPack/UnpackEnco
9、de/DecodeComp./Decomp.Rejection BoundedAuxiliaryPolynomial ArithmeticAdams Bridge CtrlAPI Register MapLearning With Errors(LWE)Generate uniform matrix KL Generate secret vector s LGenerate noise e KCompute +Given blue,find red Challenges:Needs several random polynomials Keccak(SHA3)coupled with samp
10、lersNeeds several polynomial multiplication Number Theoretic Transform(NTT)Dilithium BackgroundHard!Easy!Ast=Aste=+A Quantum Safe Problem!Polynomial on Ring Number Theoretic Transform(NTT)Accelerated Polynomial MultiplicationPolynomial Multiplication :=01mod :=1=01mod =01=0+1+11,=0,1,1,=()(2)(log)De
11、veloped reconfigurable butterfly core to support NTT,INTT,Point-wise MultiplicationDeveloped hardware-friendly reduction technique without any multiplicationMerged NTT layers to have pipelined parallel architectureReduced the complexity from 2log to 8log75%performance improvement Enhanced memory ban
12、dwidth through a quadrupled bandwidth approachResolved memory conflict challengesNTT Designu00u01v00v010100u11v10v11u20u21v20v211011u10Optimized sampler architecture tightly coupled with Keccak core:Rejection_qRejBoundedExpandMaskSampleInBallBalanced the Keccak throughput and the samplers Matched th
13、e NTT throughput and pattern requirementRemoved the cost of memory from Keccak to samplersKeccak and SamplersMemoryHashingSamplersNTTPWMAdd/SubRejectionSample InBallSIPOKeccakPISOMakeHint/UseHintPack/UnpackEncode/DecodeComp./Decomp.Rejection BoundedAuxiliaryPolynomial Arithmetic Timing and Simple po
14、wer analysis attack(SPA)Constant-time computationWithout secret-dependent branching or accessing memory by designDifferential power analysis attack(DPA)Employing masked implementation at the cost of more resource utilizationTemplate attacksConstant-time and control flow countermeasuresSide-Channel C
15、onsiderationSigning in around 35,000 cycles 87.5 usec 400 MHzSigning Rejection loopAverage:3.85 signing rounds 336 usec 400 MHz99.99%success:31 Signing rounds 2.7 msec 400 MHzComparison with Secp384r1:2.5 msec 400 MHzPerformance0%10%20%30%40%50%60%70%80%90%100% 11 12 13 14 15Success RateS
16、igning RoundCumulative Success Rate for Signing RoundUrgency of implementing/optimizing post-quantum cryptography on hardwareProposing Adams Bridge AcceleratorThe first implementation of Dilithium for the cloud infrastructureTackling the challenges of performance,complexity,and SCA protectionHighly parallel and pipelined NTT architecture ConclusionNext StepEmbedded version of Adams Bridge on CALIPTRA 2.0Thank you!