石野_Towards Responsible Decision and Control via Implicit Networks (2)_watermark.pdf

编号：155560

PDF 69页 13.69MB 下载积分：VIP专享

下载报告请您先登录！

石野_Towards Responsible Decision and Control via Implicit Networks (2)_watermark.pdf

1、RLChina 2023Towards Responsible Decision and Control via Implicit Networks石野助理教授，研究员，博导2023-11-25ShanghaiTechResponsible AI LabDecision and Control in Real WorldsRoboticsFinanceHealth-careAutonomous DrivingSmart GridChargingAgent负责任AI安全高效隐私Explicit ModelsExplicit ModelsTraditional deep learning mod

2、elsExplicitly construct the relationship between input and outputComputeAn explicit layer is a differentiable parametric function.Deep neural networks are typically constructed by composing many explicit layers,then training end-to-end via backpropagation.Problems in Explicit ModelsUnreliableMemory-

3、InefficientExplicit ModelsTraditional deep learning modelsExplicitly construct the relationship between input and outputComputeImplicit ModelsExplicit ModelsTraditional deep learning modelsExplicitly construct the relationship between input and outputComputeImplicit ModelsImplicitly define the relat

4、ionship between input and outputThe relationship may be given by the optimization problems,equations,etc.Need to give the implicit gradient flowFind solution ofGradient FlowImplicit Gradient FlowImplicit networks1.Powerful representations:represent complex operations such asintegrating differential

5、equations,solving optimization problems,etc.2.Memory efficiency:no need to backpropagate through intermediatecomponents,via implicit function theorem.3.Simplicity:Ease and elegance of designing architectures.Implicit NetworksDeep Equilibrium Model21 Chen R T Q,Rubanova Y,Bettencourt J,et al.Neural o

6、rdinary differential equationsJ.Advances in neural information processing systems,2018,31.2 Bai S,Kolter J Z,Koltun V.Deep equilibrium modelsJ.Advances in Neural Information Processing Systems,2019,32.3 Amos B,Kolter J Z.Optnet:Differentiable optimization as a layer in neural networksC/International

7、 Conference on Machine Learning.PMLR,2017:136-145.Neural ODE1Differentiable Optimization Layer3ReliableMemory-EfficientEasy to add constraintsmemory footprint,equivalent to infinite explicit layersDifferentiable OptimizationDifferentiable OptimizationAlt-Diff Sun and Shi,2023Alternating Differentiat

8、ion for Optimization LayersA toy example of decision and controlEnergy Generation Scheduling.A power system operator must decide the electricity generation to schedule for the next$24$hours based on some historical electricity demand information.We used the hourly electricity demand data over the pa

9、st 72 hours to predict the real power demand in the next 24 hours.The predicted electricity demand was then input into the following optimization problem to schedule the power generation:Predict-then-optimize is an end-to-end learning model that use the optimization loss to guide the prediction,rath

10、er than use the prediction loss as in the standard learning style.Two stage frameworkInputOptimization problemNeural Net(ML algorithm)*()x*()x(,)Lloss=lossEnd-to-end frameworkInputOptimization layerNeural Net(ML algorithm)Neural Net(ML algorithm)forwardbackward DefinitionA layer in the neural networ

11、k is defined as an optimization layer if its input is the optimization parameters and its output is the solution to the optimization problem.Differentiable optimization layersIts optimal is obtained by KKT conditions:Optimization layerHow about back-propagation?Implicit function theorem Credit by Du

12、venaudImplicit function theoremCredit by DuvenaudImplicit function theorem derivationCredit by DuvenaudInverse function theoremThe Jacobian matrix w.r.t the variables is:Let denote a continuously differentiable function with.Suppose the Jacobian of is invertible at ,then the derivative of the soluti

13、on with respect to is Too large!ADMM(Alt-Diff)The alternating method is applied to reduce the computational complexity.differentiate Forward pass:Backward pass:Haixiang Sun,Ye Shi*,Jingya Wang,Hoang Duong Tuan,H.Vincent Poor,Dacheng Tao,“Alternating Differentiation for Optimization Layers”,ICLR 2023

14、.PipelineAlt-Diffuntil convergence(or truncation)slack updatedual updateprimal updateInputOptimization layerforwardbackwardHaixiang Sun,Ye Shi*,Jingya Wang,Hoang Duong Tuan,H.Vincent Poor,Dacheng Tao,“Alternating Differentiation for Optimization Layers”,ICLR 2023.Truncated capability of Alt-Diff The

15、oremSuppose is the truncated solution at the k-th iteration,the error between the gradient obtained by the truncated Alt-Diff and the real gradient is bounded as follows:where is a constant.CorollaryThe error of the gradient w.r.t.in loss function R is bounded as follows:where is a constant.Comparis

16、on with existing solversIn sparse quadratic cases:Comparison with existing solversIn a non-quadratic case:In training neural netsFaster running speed！Predict-then-optimize:Similar resultsIn training neural netsFaster running speed！Predict-then-optimize:Similar resultsIn training neural netsAlso fast

17、er running speed！Similar resultsImage classificationDifferentiable Optimizationfor Safe RL Safe Reinforcement LearningSoft&Cumulative ConstraintsInequality onlyRL with Hard ConstraintsHard&Instantaneous ConstraintsEquality&InequalityExisting Works1.Traditional Safe RL cannot handle hard constraints2

18、.Existing RL for hard constraints can only solve specific problemsGeneralized Reduced GradientProblem Formulation:This formulation is general:Reduced Policy OptimizationIncorporate the GRG into RL to handle general hard constraintsShutong Ding,Jingya Wang,Yali Du,Ye Shi*,“Reduced Policy Optimization

19、 for Continuous Control with Hard Constraints”,NeurIPS 2023.Construction Stage for Equality ConstraintsDivide actions into basic&nonbasic actionsUtilize policy network to predict basic actionsSolve the nonbasic actions to ensure the equality constraints are satisfiedThe gradient flow of equation-sol

20、ving operation is defined asrepresent the tangent space of equality constraintsTheoretical AnalysisProof for Correctness of Gradient Flow Projection Stage for Inequality ConstraintsPerform GRG updates until all the inequality constraints are satisfiedGRG update:the summation of the constraint violat

21、ionCorrectness of GRG UpdatePerform GRG updates until all the inequality constraints are satisfiedGRG update:the summation of the constraint violationModified Lagrangian RelaxationModified Lagrangian Relaxation for Better Initial actions Training ProcedureRPO can be incorporate into any off-policy R

22、L algorithm such as DDPG,SAC,etc.Behavior PolicyTarget PolicyMismatchRPO+Off-policy RLBenchmarksDesign three environments with hard constraints to validate our algorithmSafe CartPoleDynamics:Constraints:Spring PendulumDynamics:Constraints:OPF with Battery Energy StorageConstraints&Dynamics:Compariso

23、n with Safe RL algorithmsLearning Curves of different algorithms on our three benchmarksComparison with Safe RL algorithmsMean evaluation performance of different algorithms in the three benchmarksDeep Equilibrium Models&Neural ODESImplicit ModelsDeep Equilibrium Model2Neural ODE1Implicit Modelsmemo

24、ry footprint,equivalent to infinite explicit layersExplicit ModelsComputeTraditional deep learning models,Explicitly construct the relationship between input and output1 Chen R T Q,Rubanova Y,Bettencourt J,et al.Neural ordinary differential equationsJ.Advances in neural information processing system

25、s,2018,31.2 Bai S,Kolter J Z,Koltun V.Deep equilibrium modelsJ.Advances in Neural Information Processing Systems,2019,32.Neural ODEsCredit by DuvenaudNeural ODENeural ODE1Formulate forward pass as ODE-solving procedurePreviously viewed as continuous version of ResNetBackpropagated with adjoint metho

26、dNeural ODEsCredit by DuvenaudNeural ODEsCredit by DuvenaudDeep Equilibrium ModelsCredit by KolterDeep Equilibrium ModelsCredit by KolterDeep Equilibrium ModelsFormulate forward pass as equation-solving procedureEquivalent to a deep model with infinite layersBackpropagated using implicit function th

27、eoremDeep Equilibrium Model2Motivation Deep Equilibrium ModelNonlinear EquationNewtons MethodBroyden SolverSolve usingMotivation Deep Equilibrium ModelSolve usingNonlinear EquationHomotopyContinuationODE SolverNeural ODE?Homotopy Continuation Homotopy Continuation solves the nonlinear equations by t

28、he following ODE.Initial equationEquation to solveConnection via HomotopyDeep Equilibrium ModelNeural ODEHomotopy Continuation?Connection via HomotopyRelationship between Neural ODE&DEQDerivationAdd auxiliary variable to ensure solvability of ODETwo initial conditions for Neural ODEEquation solving

29、to ODE solvingConnection between DEQ and Neural ODEConnection via HomotopyTreat the input as the condition like DEQImplicitly solve the equilibrium-point-finding problem using ODEShutong Ding,Tianyu Cui,Jingya Wang,Ye Shi*,“Two Sides of The Same Coin:Deep Equilibrium Models and Neural ODEs via Homot

30、opy Continuation”,NeurIPS 2023.AccelerationAcceleration with a learnable shared initial pointGood initial point will reduce the iteration timesResults on Image Classification TaskFaster InferenceBetter PerformanceLess memory consumption负责任 AI高效高效安全隐私隐私保护保护隐私保护Transformer Transformer 个性化联邦学习大模型个性化联邦学

31、习大模型111222MLPNormSelf-attentionNormTransformerMLPNormSelf-attentionNormClient 1Client 2Client NServer MLPNormSelf-attentionNorm Embedding InputAttention maps of different blocksLayer 1Layer 2Layer Projection MatricesLearn-to-personalizeParameters to AggregatePersonalized ParametersHypernetworkHongxi

32、a Li,#,Zhongyi Cai,#,Jingya Wang,Jiangnan Tang,Weiping Ding,Chin-Teng Lin,Ye Shi*，FedTP:Federated Learning by Transformer Personalization，IEEE Transactions on Neural Networks and Learning Systems,2023.Generalization BoundTheorem 1:Suppose1,2,denote the empirical data distribution of clients with ,th

33、e parameters learned by the corresponding empirical distributions.Denote as thepersonalized hypothesis and let be the VC-dimension of.Suppose Assumptions 1 and 2(described in paper)hold,with probability at least 1 ,we have=1 ;,=1;,2+where,and represent the optimal parameters corresponding to the rea

34、l distribution ofeach client,respectively.隐私保护联邦模糊规则进化学习联邦模糊规则进化学习Leijie Zhang,Ye Shi*,Yu-Cheng Chang,and Chin-Teng Lin*，Federated Fuzzy Neural Network With Evolutionary Rule Learning，IEEE Trans.Fuzzy Systems,2023.隐私保护联邦主动学习联邦主动学习Yu-Tong Cao,Ye Shi,Jingya Wang,Baosheng Yu,Dacheng Tao，Knowledge-Aware Federated Active Learning with Non-IID Data,ICCV 2023.隐私保护联邦离线在线合作学习联邦离线在线合作学习Zhongyi Cai,Ye Shi*,Wei Huang,Jingya Wang，Fed-CO2:Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated Learning,NeurIPS 2023.QuestionsShanghaiTechResponsible AI Lab

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（石野_Towards Responsible Decision and Control via Implicit Networks (2)_watermark.pdf）为本站（张5G）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。