the architecture of the future AI system will be like this: machine will be composed of really two modules, the agent that generates actions that you throw in the world and then the world gives it percepts or observations from which the agent infers the state of the world.
The agent is trying to optimize an objective the objective is sort of an immutable module inside of it that determine this morality if you want so it determines what what the agent lives to do and and so the goal of the agent is to basically keep itself happy which means minimize the output of the objective over the long run and so what it has to do is figure out what sequence of actions will produce the the proper process in the world that will put it in the state that will make its objective minimize which is you know it will maximize that is happiness if you want so basically what will drive the machine to do something or not do something is the design of this objective as well as how good it is at optimizing it and then how cooperative the world is obviously
The ability to predict is the essence of intelligence. so what does the agent have to to do to be able to act intelligently internal to the agent that has to be a couple modules that basically allow the engine to imagine or plan what a sequence of actions is going to cause in the world ok so internally to the agent that has to be some sort of world simulator that allows the you know a model of the world that allows the agent figure what’s going to happen when i take this action on the sequence of actions or what’s going to happen if they do I just watching the world does does what he wants so it’s the kind of model of the world I think it’s an essential piece of intelligence and that’s why i’m saying when i said before the ability to predict is the essence of intelligence internally there is an actor that generates action proposals that you know we can run for the world simulator which is kind of the way of imagining what’s gonna happen we take an action but there has to be also a critic where this is a term that’s used in the context of reinforcement honey it’s basically a module that attempts to predict what the value of the objective is going to be for a particular sequence of events and this is where the machine can design substitute objectives for the real objective.
let’s imagine a situation with any more humans where the real objective is really survive and reproduce there are several objectives to that which are you know some some some are hard-wired like you know eating but so not necessarily hardwired like you know for humans is it’s nowhere hardwired that we should you know go to school or make money or things like this but a lot of people can build this as a substitute objective function we again have examples you know max imagine money in the political world today but you know they have kind of substitute objective functions that it into to want to to optimize as a way to optimize the ultimate one so those are kind of things are developed on the way to learning or producing optimal behaviour so without getting technical, there would be internal to the the system again, the the world emulator running estimation of the world the actor producing a proposed sequence of action and the critic kind of figuring out is that going to be good for me
the reason we don’t know how to build World simulator is because the world has the bad idea of being not entirely predictable. let’s say that we want to teach a machine to predict the state of the world in a half a second Eg. There are two little videos where I put a pen on the table and ready go any false and machine maybe has been trained on on thousands of other small videos and happens to protect the the video at the bottom left where the pen falls to the left in the back so this is the prediction of a generator essentially a model of the world that from the past predicts the future and perhaps some for some you know sort of random variable as well but let’s say in this particular video the world actually degrees and produces the the one on the right where the pen falls to the back and to the right and so it’s the the prediction of the model was quantitatively wrong it’s the result is different but qualitatively right in the sense that he predicted the the pen was going to fall and you know maybe not in the right direction because that’s was essentially unpredictable given the limits of the perception of the system here and so how do we train the machine to in situations like this and this is something that humans animals deal with everyday wear the prediction is fuzzy is uncertain and we want to try to machine the the 22 we want to tell the Machine ok you know you got it wrong but really you get it right qualitatively so essentially the the possible futures are represented by this sort of ribbon here the surface of of possible futures and the real future is one point and that’s on that right.
This is solved using adversarial training which involves training two different learning machines to different neural Nets against each other so what is a predictor and another one is an Assessor called sometimes discriminator which basically tries to figure out if the the prediction is observing comes from the real world real data or comes from the generator and the generator is the generator tries to fool it try to generate points that look as much as possible as the real world so that the community can tell the difference
if you’re not careful if you train with that of a classical algorithm that doesn’t use this adversarial training to get is very blurry production at the top here the system can do nothing but predict an average of all the possible futures and that ends up being a blurry image that’s kind of the average of all the possible things that can happen with this adversarial trying to get fairly sharp predictions which may be wrong but they look possible/probable
one way to make sure that AI systems will have sort of a moral fiber if you want or four basic drives that are aligned with with human human morality essentially is to design this objective there was talking about earlier to be to do to get attention to do the right thing but also to be beautiful so that the machine can modify it and that’s basically the way we built those basic drives we can’t really modify them we can build on top of them we can’t really modify them so that would be one way of preventing sort of nafai have kinda good designs for those objectives and and and safeguards built into those objectives but what can possibly go wrong well are you know we can wrongly design the objective we can build incompetent agents or we can make them live in the world that’s trolling them and turning them into bad robots and that happens in fact that’s probably something that has consequences in the short term so there is you know the famous example of the the thai dialogue but that Microsoft deployed recently that got trolled immediately and they had to shut down in 24 hours so then there is the question okay so this is you know what if someone designs and I system to be purposely and furious and and sort of releases it releases into the world and then it becomes an AI cyberwar basically because you know a question of is my I stronger than your AI and here is an interesting thing I think defensive ki nai were will win because if you have to AI systems with the same amount of resources one is a generally I and therefore potentially dangerous the one it is a very very narrow AI and it’s only purpose is to destroy the first one the second one we win this one way the you know maybe a bacteria or virus can kill you doesn’t have to be that smart but it has to be specialized and so there is protection again against you rogue AI basically its specialized AI working for us
the emergence of human level ai would not be an event it will be progressive over several decades it will it will take multiple decades people disagree on how many it will take is is definitely not going to happen next 10 years 20 years some people ask questions the century perhaps the problem is as always in AI we see the obstacles we see what obstacle in front of us in my opinion and unsupervised running is one of them we see we see this big mountain we have to climb we don’t see all the mountains behind it and so it makes us a little optimistic because we think that by the time we’re passes first mountain will have solved the problem
part of intelligence is description being able to describe or explain things where does that fit within your worldview of AI so I think that’s part of human intelligence so we know he was an essential characteristic of humans is a social animals is the language but a lot of animals are pretty smart and can’t really describe things very well. Eg octopus are pretty smart and the the mother dies when the babies are born and so they never trained on social animals but they’re pretty smart eg. they can solve problems that can open jars to get crabs.
there’s different forms of intelligence, some that require interaction and some that don’t. The necessity for explanation or description is something that comes with language and being social animals, but there is plenty of animals that are very smart and not particularly social icons so so I think again it’s one of the examples where we think the human characteristics are necessary for intelligence but really they aren’t, eg. language is not necessary for intelligence
how you designed this objective right so in the objective there are very kind of low-level primary objectives like you know your paper clips or whatever but you know we can have other terms in your objective function that says you know don’t turn the whole universe into paper clips like you know limit the amount of resources use you know you know golden rule like like like like objectives in the in them so relatively simple things that can be implemented that will come to make themselves safe. Also, test it thoroughly without actually giving it real power right and and you have all kinds of safeguards if you really scared about it and that’s the way we build airplanes that’s the way we build just about anything that we use