《汪军_A Theory of AI Agent_watermark.pdf》由会员分享,可在线阅读,更多相关《汪军_A Theory of AI Agent_watermark.pdf(14页珍藏版)》请在三个皮匠报告上搜索。
1、On the Physical Foundation On the Physical Foundation of AI Agentsof AI AgentsJun Wang,UCLJun.wangcs.ucl.ac.ukTable of contentsTable of contents What is AI agent AI agent as a system Maxwells demon and energy bound for AI agents A simple example ConclusionsLearning Learning a agentgent In biology,le
2、arning means a change of behaviour as a result of experience In classical conditioning1,animals can learn to identify a useful pattern in the environment by associating onestimulus with another:repeated given ring-a-bell O,food X,a dog will start to salivate(anticipate the upcoming of the food)when
3、bell rings againX O X Learned behaviours are adaptive and thus are essential for animals to survive in the changing environment e.g.,they may learn not to eat certain foods if they have ever become ill after eating them more learned behaviours more intelligentIvan Petrovitch Pavlov and William Gantt
4、.“Lectures on conditioned reflexes:Twenty-five years of objective study of the higher nervous activity(behaviour)of animals.”.In:(1928).AI agent as a systemAI agent as a system!Agent(|)Perception:(|,)Actuator:WorldMutual information I(X,O)A systematic viewA systematic view The definition of an AI ag
5、ent depends on the existence of boundaries Closed systems Open systemAgent systemExchange of energyAgent systemExchange of energyExchange of matterThe second law of thermodynamicsThe second law of thermodynamics Closed systems !#=Agent systemExchange of energy!#:freeenergy:energy:temperature:entropy
6、 change!#Bejan,Adrian.Advanced engineering thermodynamics.John Wiley&Sons,2016.Maxwells demonMaxwells demonBy detecting the positions and velocities of gas molecules in two neighbouring chambers and using the information to control the door,the intelligent being could create a temperature difference
7、 across the chambers.Because of the intelligent agent,the environment decreases the total entropy of the system without applying any workThe intelligent beinghttps:/en.wikipedia.org/wiki/Maxwell%27s_demonMaxwell,J.C.,1872.Theory of Heat.Astronomical register,vol.10,pp.107-107,10,pp.107-107.The secon
8、d law with mutual informationThe second law with mutual information Closed systems !#=+$TI=+$TIAgent systemExchange of energy!#:freeenergy:energy:temperature:entropy change!#Sagawa,Takahiro,and Masahito Ueda.Nonequilibrium thermodynamics of feedback control.Physical Review E 85.2(2012):021104.Maxwel
9、ls demonMaxwells demonBuilding up the world model would consume energy,therefore the second law of thermodynamics still holdsThe maximum amount of energy that can be obtained from the learned one bit of information is kbT ln2 where kbis Boltzmanns constant and T is the temperature of the gas.W=kbT(H
10、(X)H(X|O)=kbT ln2X OXAThe intelligent beingAI agent as a systemAI agent as a system!Agent(|)Perception:(|,)Actuator:WorldMutual information I(X,O)=!T I(X,O)=!T(H(X)H(X|O)Master equation:Energy=Master equation:Energy=Intelligent?Intelligent?(|)Perception:(|)Actuator:!AgentWorld=!T H(X)H(X|O)=!T Intel
11、ligent Intelligent:=H(X)H(X|O)Szilard engine with an AI agentSzilard engine with an AI agent State:suppose Szilard engine has two states:x=0 or x=1 Observation:One can observe it with observation:o=0,or o=1 with prob.P(o=0|x=0)=P(o=1|x=1)=1-P(o=0|x=1)=P(o=1|x=0)=Action:move the barrier(isothermally)
12、If o=1,move the space to have where 0,1 If o=0,move the space to have A where A0,1 Sagawa,Takahiro,and Masahito Ueda.Nonequilibrium thermodynamics of feedback control.Physical Review E 85.2(2012):021104.xoaction$%Szilard engine with an AI agentSzilard engine with an AI agent State:suppose Szilard en
13、gine has two states:x=0 or x=1 Observation:One can observe it with observation:o=0,or o=1 with prob.P(o=0|x=0)=P(o=1|x=1)=1-P(o=0|x=1)=P(o=1|x=0)=Action:move the barrier(isothermally)If o=1,move the space to have where 0,1 If o=0,move the space to have A where A 0,1 Sagawa,Takahiro,and Masahito Ueda
14、.Nonequilibrium thermodynamics of feedback control.Physical Review E 85.2(2012):021104.=&T(ln2+1 2%+2(1%)+2(1%)+1 2$)maxing EW gives%=%=1-,thus!#=&T(ln2+(1 )ln(1 )+)We know that EI=(ln2+(1 )ln(1 )+),thus!#=ConclusionsConclusions A simple agent must contain a Perception-decision loop as a minimum One
15、 should consider an AI agent system as a whole An AI agent can reduce the entropy of the external world Consequently,1)the AI agent needs to maintain a world model p(x|o)and learn optimal policy p(a|!)2)the world model reduces the entropy of the agent self and requires energy(this would make the second law hold)We discuss the possible connections between Energy and Intelligent