《The Bridge Accelerator.pdf》由会员分享,可在线阅读,更多相关《The Bridge Accelerator.pdf(15页珍藏版)》请在三个皮匠报告上搜索。
1、Bridge the Gap Between Existing Public-Key Cryptography and Post-Quantum CryptographyAdams Bridge AcceleratorMojtaba Bisheh-Niasar,Senior Hardware Engineer,MicrosoftBharat Pillilli,Principal Hardware Engineer,MicrosoftBryan Kelly,Partner Software Engineer,MicrosoftAdams Bridge AcceleratorSECURITY AN
2、D DATA PROTECTIONSECURITYIntroductionNIST PQC standardization processOur MotivationAdams Bridge AcceleratorDilithium(ML-DSA)BackgroundNTT ArchitectureKeccak and Samplers DesignSide-Channel ConsiderationPerformanceConclusionOutlinePerformanceSecuritySilicon areaEnergyPowerTimeFrequencyDesignCostsDesi
3、gn TimeFlexibilitySCA leakageImportance of Public key cryptography Current public key cryptosystems are based on:Factoring large integers(RSA)Discrete logarithms(ECC)Post-quantum cryptography(PQC)is a public-key(asymmetric)crypto that resists attacks using classical and quantum computers.There are s
4、everal quantum-safe approaches.Lattice-based crypto is the most promising scheme.IntroductionThese problems would be easy to solve on a quantum computer.Why NOW?Record encrypted data now,decrypt it once you have a quantum computer!What is included?Kyber(ML-KEM)Dilithium(ML-DSA)NIST PQC Standardizati
5、on ProcessStart PQC Standardization Process(69 Candidates)Round 1(26 Candidates)Round 2(15 Candidates)Round 3(4+5 Candidates)Initial SelectionRound 4 July 20222020201920172016Quantum Computer2030?StandardizationdraftAugust 2023FIPS Documents2024Develop a PQC Accelerator to meet different performance
6、 level requirementPure Hardware Accelerator to enhance performance,SCA protection,Commencing development today ensures preparedness for the future need of PQCEnhance CALIPTRA to be a quantum resilient root of trust engineMotivationChallengesPQC is NOT standardized yetDiffer significantly from the cu
7、rrent crypto systemsExisting designs are not suitable:focused on performance,reference,researchGap in the design trade-off exploration related to resource utilization and performanceDilithium(ML-DSA)Digital Signature AlgorithmTwo performance levels target:Embedded ArchitectureHigh-Speed Architecture
8、Support all operations:KeyGenSigningVerifyingHands-Off interactionEmbedded SCA countermeasuresAdams Bridge AcceleratorMemoryHashingSamplersNTTPWMAdd/SubRejectionSample InBallSIPOKeccakPISOMakeHint/UseHintPack/UnpackEncode/DecodeComp./Decomp.Rejection BoundedAuxiliaryPolynomial ArithmeticAdams Bridge
9、 CtrlAPI Register MapLearning With Errors(LWE)Generate uniform matrix KL Generate secret vector s LGenerate noise e KCompute +Given blue,find red Challenges:Needs several random polynomials Keccak(SHA3)coupled with samplersNeeds several polynomial multiplication Number Theoretic Transform(NTT)Dilith
10、ium BackgroundHard!Easy!Ast=Aste=+A Quantum Safe Problem!Polynomial on Ring Number Theoretic Transform(NTT)Accelerated Polynomial MultiplicationPolynomial Multiplication :=01mod :=1=01mod =01=0+1+11,=0,1,1,=()(2)(log)Developed reconfigurable butterfly core to support NTT,INTT,Point-wise Multiplicati
11、onDeveloped hardware-friendly reduction technique without any multiplicationMerged NTT layers to have pipelined parallel architectureReduced the complexity from 2log to 8log75%performance improvement Enhanced memory bandwidth through a quadrupled bandwidth approachResolved memory conflict challenges
12、NTT Designu00u01v00v010100u11v10v11u20u21v20v211011u10Optimized sampler architecture tightly coupled with Keccak core:Rejection_qRejBoundedExpandMaskSampleInBallBalanced the Keccak throughput and the samplers Matched the NTT throughput and pattern requirementRemoved the cost of memory from Keccak to
13、 samplersKeccak and SamplersMemoryHashingSamplersNTTPWMAdd/SubRejectionSample InBallSIPOKeccakPISOMakeHint/UseHintPack/UnpackEncode/DecodeComp./Decomp.Rejection BoundedAuxiliaryPolynomial Arithmetic Timing and Simple power analysis attack(SPA)Constant-time computationWithout secret-dependent branchi
14、ng or accessing memory by designDifferential power analysis attack(DPA)Employing masked implementation at the cost of more resource utilizationTemplate attacksConstant-time and control flow countermeasuresSide-Channel ConsiderationSigning in around 35,000 cycles 87.5 usec 400 MHzSigning Rejection lo
15、opAverage:3.85 signing rounds 336 usec 400 MHz99.99%success:31 Signing rounds 2.7 msec 400 MHzComparison with Secp384r1:2.5 msec 400 MHzPerformance0%10%20%30%40%50%60%70%80%90%100% 11 12 13 14 15Success RateSigning RoundCumulative Success Rate for Signing RoundUrgency of implementing/optimizing post-quantum cryptography on hardwareProposing Adams Bridge AcceleratorThe first implementation of Dilithium for the cloud infrastructureTackling the challenges of performance,complexity,and SCA protectionHighly parallel and pipelined NTT architecture ConclusionThank you!