上海品茶

Increasing Energy Efficiency of Server Cooling Over Traditional Methods with a Deep Reinforcement Learning Agents running on an OCP Compliant BMC platforms.pdf

编号:161456 PDF 17页 1.43MB 下载积分:VIP专享
下载报告请您先登录!

Increasing Energy Efficiency of Server Cooling Over Traditional Methods with a Deep Reinforcement Learning Agents running on an OCP Compliant BMC platforms.pdf

1、AI-ML model for dynamic server fans speed control achieves better energy efficiency than the traditional fans control methods.Model runs on an ML engine of a BMC chip.Increasing Energy Efficiency of Server Cooling Over Traditional Methods with a Deep Reinforcement Learning Agents Running on an OCP C

2、ompliant BMC PlatformsRaghu Kondapalli,Chief Technology Officer,Axiado CorporationSundaram Arumugasundaram,Principal Security Architect,Axiado CorporationZhichao Zhang,Principal Machine Learning Architect,Axiado CorporationIncreasing Energy Efficiency of Server Cooling Over Traditional Methods with

3、a Deep Reinforcement Learning Agents running on an OCP Compliant BMC platformsDC SustainabilitySUSTAINABILITYTCU chip consists of below components:1.App processors:cores for running apps like BMC,host vulnerability management,extended detection and Response(XDR)agents2.programmable AI engine to run

4、ML models like server thermal management3.Smart-NIC for control/management plane like BMC traffic4.hardware Root of trust(HRoT)and TPM(Trusted Platform Module)to enhance server securityAxiado offers Smart-SCM that is compliant with the Open Compute Project(OCP)datacenter-ready secure control module(

5、DC-SCM)standard.Trusted Control/Compute Unit(TCU)OverviewAI-Powered Dynamic Thermal Management(DTM)from BMC:BMC is ideal for server thermal management due to its existing role in various server management functions,including power control.Faster Thermal Prediction and Calibration:TCU collects sensor

6、 data directly,bypassing the host OS,enabling faster thermal prediction and fan speed calibration.Rich Dataset for Decision Making:As an OCP DC-SCM compliant BMC,TCU gathers comprehensive data from all chassis components(CPUs,GPUs,etc.)via diverse connections(I2C,eSPI,USB,PCI-e),providing a rich dat

7、aset for optimal fan control decisions.Next-Gen Thermal Management:The Power of ML on TCU/BMC OverviewNext-Gen Thermal Management:The Power of ML on TCU/BMC OverviewDedicated ML engine for DTM-ML model:As TCUs ML engine only runs the DTM-ML model,it offers timely inference and fan speed controlHardw

8、are-Based Security:Leveraging confidential computing and other security features,TCU protects the DTM-ML model from potential vulnerabilities on the host OS,offering a more secure solution.Proactive Management with PMC Data:TCU utilizes CPU and GPU Performance Monitoring Counters(PMC)to proactively

9、manage thermals based on workload demands.Integration of AXIADOs DTM-ML with the open standards like openBMC and DMTFs redfish is work in progress.ML-DTM Result Significant Energy Savings$0$50$100$1501 yr energy costDRL AgentMedium SpeedHigh speedFan Energy Savings:Energy savings up to 50%Annual sav

10、ings per server:$70Annual savings for 100K servers:$7 millionFAN cyclesKilowatt-hour per hour1 Year($0.10 per kwh)TCU DTM-MLOptimized0.076$67Medium Fan Speed65.5%0.131$115High Fan Speed80%0.160$140Data Collection:pulling data from sensors every 5 seconds for six monthsTrained on random and diverse i

11、ntensity workloads with a massive data setAnalysis and Prediction ML type:DRL(Deep Reinforcement Learning)Continuous Learning:improves energy efficiency over time.Surpasses PID Fan Controllers:delivers superior results to PID controllers through broader dataset correlation.Unlike reactive PID contro

12、llers,it proactively adjusts fan speeds based on anticipated workload demands.DTM-ML model training and deployment DetailsDRL is a revolutionary AI methodology that combines reinforcement learning and deep neural networks.By iteratively interacting with an environment and making choices that maximiz

13、e cumulative rewards,it enables agents to learn sophisticated strategies,directly learn rules from sensory inputs,which makes use of deep learnings ability to extract complex features from unstructured data https:/www.geeksforgeeks.org/what-is-reinforcement-learning/.DTM-DRL self learns from the env

14、ironment,continuously improves the efficiency of the balance between temperature and energy usage and proactively anticipates cooling needs based on workloads and other dynamic factors.Benefits of DRL(Deep Reinforcement Learning)AlphaGo ZeroReinforcement Learning solving many complex problemsTo get

15、the best saving,policies need to be learnt from different environment,hardware and workloadDynamic Thermal Management with Proactive Fan Speed Control Through Reinforcement LearningOne rule cant fit them all Mastering the game of Go without human knowledgeReinforcement learning to learn the meaning

16、of states from the environmenthttps:/blog.google/inside-google/infrastructure/safety-first-ai-autonomous-data-center-cooling-and-industrial-control/AI for Google data center cooling success storyRules dont get better over time,but AI does.Reinforcement Learning redefines DTM by replacing heuristic h

17、uman input with self-optimizing AI agents.Human vs.AI Agency:From manually tuned protocols to AI-driven,Q-Learning-based autonomous agents.AI Superiority:RL agents predictive management cuts fan power by 40%.Outcome:Autonomous agents offer continual learning,precision,and efficiency,redefining DTM i

18、n data-centric environments.AI-Powered Dynamic Thermal Management(DTM)1.Precision Control:The RL model develops fine-tuned cooling algorithms,directly improving energy management.2.Intelligent Adaptation:It swiftly adjusts to fluctuations,ensuring consistent performance under varying load conditions

19、.3.Sustainable Operations:Forecasts and adjusts to future demands,significantly reducing the carbon footprint and operating costs.Diverse Policies Learned by Deep Reinforcement LearningTemperatureFan SpeedThe real-time temperature data manually collected from Axiado HQs server room,its evident that

20、server temperatures vary significantly not only by rack position but also by time of day.For instance,the consistent decrease in temperatures from 10 PM to 7 AM across most servers suggests ambient factors,possibly related to lower night-time room temperatures or reduced server activity,greatly infl

21、uence server temperatures.Leveraging this data can inform efficient cooling,power usage,cost reduction and other server optimization strategies.Server Room Temperature MonitoringTCU BMC Integration:A power-efficient BMC controller that is OCP compliant,equipped with an on-chip NPU,requiring only 0.5

22、TOPs for this application.This integration is not just a step towards modernization but a leap towards cost-effective and green computing,given the necessity of BMC controllers in modern data centersSmart Scaling:Tailored AI dynamically adapts to diverse server configurations,ensuring optimal perfor

23、mance across any data center layout.Operational Excellence Reimagined:Transition from traditional,labor-intensive methods to AI-driven strategies.Our real-world deployments demonstrate how integrating AI with real-time sensor data and machine learning not only enhances system reliability but also si

24、gnificantly reduces operational costs.Energy Efficiency&Sustainability:Leveraging AI for real-time control of cooling systems results in up to 40%savings on cooling energy costs.This approach not only slashes energy bills but also substantially reduces the carbon footprint,contributing to greener da

25、ta center operations.Summary Redefining Data Centers with AI-Driven DTMAchieving up to 18.6%PUE ImprovementReallocating fan power the PUE changed from 1.09 to 1.61 A 50%reduction in fan power leads to a new PUE of 1.31.With a 5%reduction in fan power,the new PUE is 1.57With a 50%reduction in fan pow

26、er,the new PUE is 1.31Air cooling at the server level is widely used in data centers,particularly for configurations up to 10 kW per rack.While alternative cooling methods are gaining traction for higher-density setups,server-level air cooling remains a common,cost-effective choice.AI-Driven DTM PUE

27、 ImpactCall for ActionProblem to SolveLets collaborate to create an ML based fans speed control as part of OCP,OpenBMC and DMTF and save energy.How to get involved in the ProjectBy piloting the deployment of the DTM-ML model in your data center.Timeline for Contribution AvailabilityFrom now to end of 2025Timeline for Product AvailabilityFrom now to end of 2025Where to find additional information(URL links):Work In ProgressThank you!

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(Increasing Energy Efficiency of Server Cooling Over Traditional Methods with a Deep Reinforcement Learning Agents running on an OCP Compliant BMC platforms.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
会员动态
会员动态 会员动态:

 wei**n_... 升级为标准VIP  138**56...  升级为至尊VIP

wei**n_... 升级为至尊VIP  wei**n_...  升级为高级VIP

 wei**n_... 升级为至尊VIP  wei**n_... 升级为高级VIP

 wei**n_... 升级为标准VIP 冯** 升级为至尊VIP  

 wei**n_... 升级为至尊VIP  wei**n_...  升级为标准VIP

wei**n_...  升级为标准VIP wei**n_...  升级为高级VIP

old**nt... 升级为至尊VIP   wei**n_... 升级为至尊VIP

 150**62...  升级为标准VIP 俊**...  升级为标准VIP

微**...  升级为至尊VIP  131**94... 升级为高级VIP

wei**n_...   升级为高级VIP 微**...  升级为至尊VIP

151**34... 升级为高级VIP  wei**n_... 升级为标准VIP 

 186**03... 升级为至尊VIP  wei**n_... 升级为至尊VIP 

138**97...  升级为高级VIP  报**... 升级为至尊VIP

177**40... 升级为至尊VIP   189**24... 升级为高级VIP

 Jo**g  升级为至尊VIP 董杰  升级为高级VIP

159**76... 升级为至尊VIP  wei**n_...  升级为标准VIP

 186**81...  升级为高级VIP 198**12... 升级为高级VIP

 周阳 升级为至尊VIP   微**... 升级为标准VIP

 wei**n_... 升级为高级VIP wei**n_...  升级为标准VIP 

137**77...  升级为高级VIP  Ste** S... 升级为至尊VIP

ro**i 升级为高级VIP  186**53... 升级为至尊VIP

403**08...  升级为标准VIP wei**n_...  升级为标准VIP

 wei**n_... 升级为高级VIP wei**n_...  升级为高级VIP

wei**n_...  升级为至尊VIP 189**86... 升级为高级VIP 

 wei**n_...  升级为标准VIP 微**...   升级为标准VIP

 wei**n_... 升级为至尊VIP  骑**... 升级为高级VIP

 wei**n_...  升级为标准VIP  wei**n_... 升级为标准VIP

 138**22...  升级为标准VIP wei**n_...  升级为标准VIP

 186**23... 升级为至尊VIP gus**o8...  升级为至尊VIP

 159**77... 升级为至尊VIP  Kra**Ma...  升级为高级VIP

 wei**n_...  升级为高级VIP SMA**CH  升级为至尊VIP 

130**92... 升级为至尊VIP  wei**n_...  升级为高级VIP

wei**n_...  升级为高级VIP 181**79...  升级为高级VIP 

wei**n_...  升级为标准VIP wei**n_...  升级为至尊VIP 

Je**er   升级为高级VIP  182**85... 升级为至尊VIP

小** 升级为高级VIP wei**n_... 升级为标准VIP 

186**69...  升级为高级VIP  陆  升级为至尊VIP

wei**n_... 升级为标准VIP   微**... 升级为标准VIP

186**99... 升级为高级VIP   wei**n_... 升级为高级VIP 

 Nic**eZ 升级为至尊VIP   wei**n_... 升级为高级VIP

130**34...  升级为标准VIP  189**86... 升级为至尊VIP

 wei**n_... 升级为标准VIP 陶**... 升级为标准VIP 

159**63... 升级为至尊VIP wei**n_...  升级为标准VIP

wei**n_... 升级为至尊VIP  wei**n_... 升级为高级VIP

 江**... 升级为高级VIP  186**32... 升级为高级VIP

wei**n_...  升级为至尊VIP  微**...  升级为至尊VIP

182**17... 升级为标准VIP wei**n_... 升级为标准VIP 

 138**41... 升级为至尊VIP 138**39...   升级为至尊VIP

 wei**n_... 升级为至尊VIP wei**n_...  升级为标准VIP 

 136**29... 升级为标准VIP  186**28...   升级为标准VIP