A Path to AI | Yann LeCun

Slides: https://goo.gl/photos/k5DPx7RW14qRJyKN7


Sequential Hierarchical Models vs LSTM for future of AI

hierarchical hidden Markov models, the sequential model, unlike lstm doesn’t do long-term associations. it does local features but that’s actually what the brain is able to do so I wrote a book how to create a mind about the evidence we have from the neuroscience field and also from observing human brain in action as to why this why the Union human neocortex is organized this way. example the European forebrain reverse engineering project has identified modules of about a hundred neurons each that that are basically repeated over and over again through the neocortex and there’s no plasticity within those hundred neuron modules this plasticity rewiring between the modules a different project if it’s actually found how the modules connect to each other the vertical and horizontal connections are already there they’re made live there activated when the neocortex decides to connect this module to this module the vertical and horizontal connections are already there

we actually have twice as many of those as a newborn as as we do as an adult ones we never use died out so and also we can observe you in brains in action these model these models are sequential and then in order so we can recite the alphabet if I ask you to recite the alphabet backwards you can’t do that what you learned that as a new sequence that’s one of the reasons we could use computers to make up for these fairly simple weaknesses of human intelligence is beautiful to find some information you might not be able to remember a particular name because you have to actually be in the right branch of the right three to retrieve information the brain is not one big associative memory if i give you a hint that helps you to put you on the right branch then maybe you can order amber it so little hints like this

i make the case and how to create a mind that the neocortex is basically a big hierarchy of sequential models the neocortex emerged 200 million years ago its able to learn new skills it can learn from a small number of examples as i mentioned that’s one of the benefits something happened two million years ago remember the million years ago we were walking around we didn’t have these big foreheads we had a slanted brow as primates we then got evolution created the frontal cortex and we got these big foreheads and up until recently reported it must be qualitatively different must be organized differently because it does different things it turns out that it’s really just an additional quantity of neocortex is also been shown in these brain reverse engineering projects the layer that has the connections between the modules has somewhat more connections in the neocortex but otherwise it’s the same topology and organization of neurons and it’s basically learning a sequential model with local patterns of but not long-term patterns long-term relationships i learned by the hierarchy itself so we got an additional quantity of neocortex know what it what do we do with that well we already doing a very good job of being primate so we put it at the top of the neocortical hierarchy and if you go off the hair aqui issued get more abstract more interesting at the bottom of the higher can tell that straight line at the top of the hierarchy I can tell that something is funny or beautiful so that additional neocortex was enabling factor for us to invent language and art science and conferences music every human culture we’ve ever discovered has music

No other animal has music we can argue about well song but basically music is came with that additional neocortex the same with humor now that was a one-shot deal we couldn’t continue to increase the size of the neocortex already made childbirth challenging maybe there were a few humans that had morning your cortex but their offspring couldn’t be boring said skulls were too big so it’s a one-shot deal but it put it over that threshold so that we could invent for example technology and that’s formed its own accelerating exponential process so we’re going to do it again because the law of accelerating returns with computer on my valve is billions of times more powerful per dollar than a computer. But that’s actually not the most interesting thing about it i can expand the capabilities of this thousandfold i connect a thousand times more computation by connecting wirelessly to the cloud millions of times more in terms of accessing information

We can’t do that yet directly from my neocortex we do it indirectly through these divisive but in the future once we develop strong AI and I’m going to come back to talk about when I think that will happen that will be in the cloud and we will connect only a cortex the synthetic neocortex in the cloud and just as this multiplies itself thousands or millions fold by connecting wirelessly to similar complication in the cloud we will do that with a neocortex simulated neocortical modules from walking along in 20 35 and I look up and sell this max tegmark coming my way I better think of something clever to say I’ve got two seconds like 300 million near kartika modules and we gotta cut it i need a billionaire 10 billion for two seconds I’ll be able axes that’s just the way your phone does that when it needs to translate something to do is search where it needs additional computation so just as we did two million years ago we will extend our neocortex and have additional additional quantity of me cortex and just as we did two million years ago we’ll put it at the top of the neocortical hierarchy so we will be more musical will be funnier will basically exemplifies the things that we value but unlike a million years ago it won’t be a one-shot deal the cloud is subject to the law of accelerating insurance it is doubling in power every year as we speak that will continue so ultimately the bulk of our thinking I’m thinking there will be a hybrid of biological and non-biological thinking the non-biological part of the subject of the law of accelerating returns it will expand without limit altima tlie will predominate so let me come back and talk about when I think we will achieve human-level intelligence

At Google I’ve been implementing my ideas of hierarchical sequential models were preparing some papers Jory can talk in more detail about the actual results but we are finding that its orders of magnitude faster than LCM it can learn for much for your examples it can explain itself and this is the other weakness of that we found with lstm is it can do certain semantic distinctions i think if you can do language you can do everything I believe strings inside of basing its basic test of human intelligence based on language I think languages during complete we find that lkm can’t quite it does do some semantic distinctions but not others we’re finding more success with sequential hierarchical models and so we’re going to continue to move in that direction

So in terms of the future i think there’s general agreement that were close to the hardware requirements of strong AI with their already attempts but not at a reasonable cost think will be a point where you an equivalent is about a thousand dollars in the early 2020s my view that 1014 calculations per second I’ve got derivations of that in my book and more of it comes to the same conclusion using a whole different approach other friends for 10 to 15 but the hardware problem is is well in hand the third problem is more complex one of the points I make it that software progress is also exponential and there’s a lot of evidence of that was actually study done by the White House Science Advisory Board on this issue ever gotten more benefit from hardware software and on a range of about twenty engineering problems when they found that software and actually contributed more in terms of algorithmic improvement that comes in fits and starts it’s not quite as steady as the law of accelerating returns applied to hardware but I make the case that we will get there by 2029 side i’ve had that prediction going back to nineteen eighty-nine in my first book page intelligent machines that had that date within a decade so I said specifically 2029 in 1999 voltage of Scripture age of spiritual machines never had a comprehensive about this startling prediction of AI experts and the consensus then we didn’t have falling machines are online polls that was show of hands but the consensus was 500 years that was the median some people so it would never happen recently seen different analyses and poles of AIX range between 30 years and 50 years so take 50 years that the reason for that advance if you will bring in the date is the law of accelerating returns but in my view it’s not that people are really internalizing exponential growth there looking at the current rate of growth strategy at the current rate well in 1999 people just hydrate can take 500 years now people look at the concrete thing out at this rate of progress we’ll get there in 50 years i agree with that except that we’re going to accelerate that rate of progress that is not factoring in we’re going to make 50 years of progress at today’s rate in about 13 years and people to stop a linear thinking linear extra of expositions with the future trap elations is really hard wired

It’s not obvious that the hierarchical regression models which are basically pattern recognition models models would apply to things like planning and reasoning i maintain that they do a plan is a sequence of of goals and they each breakdown and sub goals and we’re showing that we can actually do that kind of goal tracking in pursuit with a with an appropriate hierarchical model I believe that’s how we think I’ve always thought that there was actually very little nutrients evidence 50 years ago that’s actually quite a bit now we contain these modules that they have actions coming in from other modules and it’s that it is basically organized that there is a hierarchy of of these mud of these modules and the modules are doing sequential pattern recognition and I talked about in the book if the longer explanation but basically plans are hierarchies we have we want to accomplish certain things each of these are composed of grub gold and it all breaks down into a hierarchy of of goals ultimately we try to match reality two goals and see where differs and then what we need to do to realize each sub goal unica okay max to tell me and told me and people to set up I’m so I actually wasn’t bad actually tried multiple layers it just didn’t have the running algorithms for it so there are several papers by him and his team with a reception with you know four five six layers followed by a trainable layers because they didn’t have back pockets didn’t have backpropagation had the square but again not be square the perceptron algorithm but didn’t have back problems your back popped up into control here is figure it out in the sixties that popped up in machine learning in the seventies but nobody paid attention and anyways popularized the eighties and there were no match with multiple layers already there comments on that 1989 had you know seven eight layers come you know regardless that you’re going and you’re working on at the lot of layers of course because they were going to unfold in time and and so what happened in 2010-11 12 is not a mathematical breakthrough with a practical breaking people realize they could render things in a much bigger scale on gpus with a lot of training sets that were kind of a few conceptual ideas like using values into the sigmoid it doesn’t make the cost function complex at all it’s still very non convex there are properties that you shouldn’t you be working on mathematical theories that shows that the non-competitive not a problem actually you don’t you don’t get try to local minima so you know it’s just kind of looks like historical correction and generally you’re gonna hear but well we can happen ok where we could debate that in my view there was there was a mathematical issue that prevented a scope was going beyond just a few layers as you know once we could really go 200 players and it was not just gpus I think there’s a mathematical issue that does have to do with the connection error surface but we that’s probably not a great interest to the group you but you’re gonna want entrance just adding something to be very interesting comment of a young i enjoyed greatly your reminiscent of what you hadn’t when you met Mickey back then and he is often credited with this with killing your network research because he wrote this book in 1969 right about the limitations of shallow network you know one day a perceptron but that was four years after the fact because of course there was the famous mathematician from the Ukrainian evac Lincoln Alex a category which mechanical who had deep learning that works in 1965 with a student lava mobile ad perceptrons eight layers you know and they learned not backdrop but incremental learning and it worked and was a player with me until 2000 something well I wrote in 19 and in the age of Scripture machines that the theorem in minsk in peppers book perceptrons in 1969 only applied to single layer girl net basically couldn’t tell the connectedness problem and Mickey was quite a bitterly opposed i think he didn’t like the hype that protect and forgetting at that time actually took out of the book a lot of visceral ranting against neural net you later on when i took them more recently regretted the success of that book in feeling a lot of funding for 20 years he became actually quite enthusiastic because of the tremendous progress that’s been made in this field [Music]

Inside Libratus, the Poker AI

  • Libratus, for one, did not use neural networks. Mainly, it relied on a form of AI known as reinforcement learning, a method of extreme trial-and-error. In essence, it played game after game against itself.
  • Google’s DeepMind lab used reinforcement learning in building AlphaGo, the system that that cracked the ancient game of Go ten years ahead of schedule, but there’s a key difference between the two systems. AlphaGo learned the game by analyzing 30 million Go moves from human players, before refining its skills by playing against itself. By contrast, Libratus learned from scratch.
  • Libratus relied on three different systems that worked together, a reminder that modern AI is driven not by one technology but many.
    1. Through an algorithm called counterfactual regret minimization, it began by playing at random, and eventually, after several months of training and trillions of hands of poker, it too reached a level where it could not just challenge the best humans but play in ways they couldn’t—playing a much wider range of bets and randomizing these bets, so that rivals have more trouble guessing what cards it holds.
    2. But that was just the first stage. During the games in Pittsburgh, a second system would analyze the state of play and focus the attention of the first. With help from the second—an “end-game solver” detailed in a research paperSandholm and Brown published late Monday—the first system didn’t have to run through all the possible scenarios it had explored in the past. It could run through just some of them. Libratus didn’t just learn before the match. It learned while it was playing.
    3. These two systems alone would have been effective. But Kim and the other players could still find patterns in the machine’s play and exploit them. That’s why Brown and Sandholm built a third system. Each evening, Brown would run an algorithm that could identify those patterns and remove them. “It could compute this overnight and have everything in place the next day,” he says.
  • Poker has been one of the hardest games for AI to crack, because you see only partial information about the game state
  • There is no single optimal move. Instead, an AI player has to randomize its actions so as to make opponents uncertain when it is bluffing
  • A finanical trader could work the same way. So could a diplomat. It’s a powerful and rather unsettling proposition: a machine that can out-bluff a human.

ref: https://www.wired.com/2017/02/libratus/