1、RLChina 2023Towards Responsible Decision and Control via Implicit Networks石 野助理教授,研究员,博导2023-11-25ShanghaiTechResponsible AI LabDecision and Control in Real WorldsRoboticsFinanceHealth-careAutonomous DrivingSmart GridChargingAgent负责任AI安全高效隐私Explicit ModelsExplicit ModelsTraditional deep learning mod
2、elsExplicitly construct the relationship between input and outputComputeAn explicit layer is a differentiable parametric function.Deep neural networks are typically constructed by composing many explicit layers,then training end-to-end via backpropagation.Problems in Explicit ModelsUnreliableMemory-
3、InefficientExplicit ModelsTraditional deep learning modelsExplicitly construct the relationship between input and outputComputeImplicit ModelsExplicit ModelsTraditional deep learning modelsExplicitly construct the relationship between input and outputComputeImplicit ModelsImplicitly define the relat
4、ionship between input and outputThe relationship may be given by the optimization problems,equations,etc.Need to give the implicit gradient flowFind solution ofGradient FlowImplicit Gradient FlowImplicit networks1.Powerful representations:represent complex operations such asintegrating differential
5、equations,solving optimization problems,etc.2.Memory efficiency:no need to backpropagate through intermediatecomponents,via implicit function theorem.3.Simplicity:Ease and elegance of designing architectures.Implicit NetworksDeep Equilibrium Model21 Chen R T Q,Rubanova Y,Bettencourt J,et al.Neural o
6、rdinary differential equationsJ.Advances in neural information processing systems,2018,31.2 Bai S,Kolter J Z,Koltun V.Deep equilibrium modelsJ.Advances in Neural Information Processing Systems,2019,32.3 Amos B,Kolter J Z.Optnet:Differentiable optimization as a layer in neural networksC/International
7、 Conference on Machine Learning.PMLR,2017:136-145.Neural ODE1Differentiable Optimization Layer3ReliableMemory-EfficientEasy to add constraintsmemory footprint,equivalent to infinite explicit layersDifferentiable OptimizationDifferentiable OptimizationAlt-Diff Sun and Shi,2023Alternating Differentiat
8、ion for Optimization LayersA toy example of decision and controlEnergy Generation Scheduling.A power system operator must decide the electricity generation to schedule for the next$24$hours based on some historical electricity demand information.We used the hourly electricity demand data over the pa
9、st 72 hours to predict the real power demand in the next 24 hours.The predicted electricity demand was then input into the following optimization problem to schedule the power generation:Predict-then-optimize is an end-to-end learning model that use the optimization loss to guide the prediction,rath
10、er than use the prediction loss as in the standard learning style.Two stage frameworkInputOptimization problemNeural Net(ML algorithm)*()x*()x(,)Lloss=lossEnd-to-end frameworkInputOptimization layerNeural Net(ML algorithm)Neural Net(ML algorithm)forwardbackward DefinitionA layer in the neural networ
11、k is defined as an optimization layer if its input is the optimization parameters and its output is the solution to the optimization problem.Differentiable optimization layersIts optimal is obtained by KKT conditions:Optimization layerHow about back-propagation?Implicit function theorem Credit by Du
12、venaudImplicit function theoremCredit by DuvenaudImplicit function theorem derivationCredit by DuvenaudInverse function theoremThe Jacobian matrix w.r.t the variables is:Let denote a continuously differentiable function with.Suppose the Jacobian of is invertible at ,then the derivative of the soluti
13、on with respect to is Too large!ADMM(Alt-Diff)The alternating method is applied to reduce the computational complexity.differentiate Forward pass:Backward pass:Haixiang Sun,Ye Shi*,Jingya Wang,Hoang Duong Tuan,H.Vincent Poor,Dacheng Tao,“Alternating Differentiation for Optimization Layers”,ICLR 2023
14、.PipelineAlt-Diffuntil convergence(or truncation)slack updatedual updateprimal updateInputOptimization layerforwardbackwardHaixiang Sun,Ye Shi*,Jingya Wang,Hoang Duong Tuan,H.Vincent Poor,Dacheng Tao,“Alternating Differentiation for Optimization Layers”,ICLR 2023.Truncated capability of Alt-Diff The
15、oremSuppose is the truncated solution at the k-th iteration,the error between the gradient obtained by the truncated Alt-Diff and the real gradient is bounded as follows:where is a constant.CorollaryThe error of the gradient w.r.t.in loss function R is bounded as follows:where is a constant.Comparis
16、on with existing solversIn sparse quadratic cases:Comparison with existing solversIn a non-quadratic case:In training neural netsFaster running speed!Predict-then-optimize:Similar resultsIn training neural netsFaster running speed!Predict-then-optimize:Similar resultsIn training neural netsAlso fast
17、er running speed!Similar resultsImage classificationDifferentiable Optimizationfor Safe RL Safe Reinforcement LearningSoft&Cumulative ConstraintsInequality onlyRL with Hard ConstraintsHard&Instantaneous ConstraintsEquality&InequalityExisting Works1.Traditional Safe RL cannot handle hard constraints2
18、.Existing RL for hard constraints can only solve specific problemsGeneralized Reduced GradientProblem Formulation:This formulation is general:Reduced Policy OptimizationIncorporate the GRG into RL to handle general hard constraintsShutong Ding,Jingya Wang,Yali Du,Ye Shi*,“Reduced Policy Optimization
19、 for Continuous Control with Hard Constraints”,NeurIPS 2023.Construction Stage for Equality ConstraintsDivide actions into basic&nonbasic actionsUtilize policy network to predict basic actionsSolve the nonbasic actions to ensure the equality constraints are satisfiedThe gradient flow of equation-sol
20、ving operation is defined asrepresent the tangent space of equality constraintsTheoretical AnalysisProof for Correctness of Gradient Flow Projection Stage for Inequality ConstraintsPerform GRG updates until all the inequality constraints are satisfiedGRG update:the summation of the constraint violat
21、ionCorrectness of GRG UpdatePerform GRG updates until all the inequality constraints are satisfiedGRG update:the summation of the constraint violationModified Lagrangian RelaxationModified Lagrangian Relaxation for Better Initial actions Training ProcedureRPO can be incorporate into any off-policy R
22、L algorithm such as DDPG,SAC,etc.Behavior PolicyTarget PolicyMismatchRPO+Off-policy RLBenchmarksDesign three environments with hard constraints to validate our algorithmSafe CartPoleDynamics:Constraints:Spring PendulumDynamics:Constraints:OPF with Battery Energy StorageConstraints&Dynamics:Compariso
23、n with Safe RL algorithmsLearning Curves of different algorithms on our three benchmarksComparison with Safe RL algorithmsMean evaluation performance of different algorithms in the three benchmarksDeep Equilibrium Models&Neural ODESImplicit ModelsDeep Equilibrium Model2Neural ODE1Implicit Modelsmemo
24、ry footprint,equivalent to infinite explicit layersExplicit ModelsComputeTraditional deep learning models,Explicitly construct the relationship between input and output1 Chen R T Q,Rubanova Y,Bettencourt J,et al.Neural ordinary differential equationsJ.Advances in neural information processing system
25、s,2018,31.2 Bai S,Kolter J Z,Koltun V.Deep equilibrium modelsJ.Advances in Neural Information Processing Systems,2019,32.Neural ODEsCredit by DuvenaudNeural ODENeural ODE1Formulate forward pass as ODE-solving procedurePreviously viewed as continuous version of ResNetBackpropagated with adjoint metho
26、dNeural ODEsCredit by DuvenaudNeural ODEsCredit by DuvenaudDeep Equilibrium ModelsCredit by KolterDeep Equilibrium ModelsCredit by KolterDeep Equilibrium ModelsFormulate forward pass as equation-solving procedureEquivalent to a deep model with infinite layersBackpropagated using implicit function th
27、eoremDeep Equilibrium Model2Motivation Deep Equilibrium ModelNonlinear EquationNewtons MethodBroyden SolverSolve usingMotivation Deep Equilibrium ModelSolve usingNonlinear EquationHomotopyContinuationODE SolverNeural ODE?Homotopy Continuation Homotopy Continuation solves the nonlinear equations by t
28、he following ODE.Initial equationEquation to solveConnection via HomotopyDeep Equilibrium ModelNeural ODEHomotopy Continuation?Connection via HomotopyRelationship between Neural ODE&DEQDerivationAdd auxiliary variable to ensure solvability of ODETwo initial conditions for Neural ODEEquation solving
29、to ODE solvingConnection between DEQ and Neural ODEConnection via HomotopyTreat the input as the condition like DEQImplicitly solve the equilibrium-point-finding problem using ODEShutong Ding,Tianyu Cui,Jingya Wang,Ye Shi*,“Two Sides of The Same Coin:Deep Equilibrium Models and Neural ODEs via Homot
30、opy Continuation”,NeurIPS 2023.AccelerationAcceleration with a learnable shared initial pointGood initial point will reduce the iteration timesResults on Image Classification TaskFaster InferenceBetter PerformanceLess memory consumption负责任 AI高效高效安全隐私隐私保护保护隐私保护Transformer Transformer 个性化联邦学习大模型个性化联邦学
31、习大模型111222MLPNormSelf-attentionNormTransformerMLPNormSelf-attentionNormClient 1Client 2Client NServer MLPNormSelf-attentionNorm Embedding InputAttention maps of different blocksLayer 1Layer 2Layer Projection MatricesLearn-to-personalizeParameters to AggregatePersonalized ParametersHypernetworkHongxi
32、a Li,#,Zhongyi Cai,#,Jingya Wang,Jiangnan Tang,Weiping Ding,Chin-Teng Lin,Ye Shi*,FedTP:Federated Learning by Transformer Personalization,IEEE Transactions on Neural Networks and Learning Systems,2023.Generalization BoundTheorem 1:Suppose1,2,denote the empirical data distribution of clients with ,th
33、e parameters learned by the corresponding empirical distributions.Denote as thepersonalized hypothesis and let be the VC-dimension of.Suppose Assumptions 1 and 2(described in paper)hold,with probability at least 1 ,we have=1 ;,=1;,2+where,and represent the optimal parameters corresponding to the rea
34、l distribution ofeach client,respectively.隐私保护联邦模糊规则进化学习联邦模糊规则进化学习Leijie Zhang,Ye Shi*,Yu-Cheng Chang,and Chin-Teng Lin*,Federated Fuzzy Neural Network With Evolutionary Rule Learning,IEEE Trans.Fuzzy Systems,2023.隐私保护联邦主动学习联邦主动学习Yu-Tong Cao,Ye Shi,Jingya Wang,Baosheng Yu,Dacheng Tao,Knowledge-Aware Federated Active Learning with Non-IID Data,ICCV 2023.隐私保护联邦离线在线合作学习联邦离线在线合作学习Zhongyi Cai,Ye Shi*,Wei Huang,Jingya Wang,Fed-CO2:Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated Learning,NeurIPS 2023.QuestionsShanghaiTechResponsible AI Lab