In Dec 2013, DeepMind released a ground-breaking paper called “Playing Atari with Deep Reinforcement Learning”. And just a little over a month later, Google announced that they had bought DeepMind for a really big sum of money. Since then, there’s been all kinds of talk about reinforcement learning in the field of AI. In January of 2016, Google announced that the appropriately named AlphaGo was able to beat the reigning Go champion of the world.
The story of reinforcement Learning goes all the way back to AI, animal psychology, and control theory. At the heart of it, it involves an autonomous agent like a person, animal, robot, or deep net – learning to navigate an uncertain environment with the goal of maximizing a numerical reward.
Sports are a great example of this. Just think of what our autonomous agent would have to deal with in a tennis match. The agent would have to consider its actions, like its serves, returns, and volleys. These actions change the state of the game, or in other words – the current set, the leading player, things like that. And every action is performed with a reward in mind – winning a point, in order to win the game, set, and match.
Our agent needs to follow a policy, or a set of rules and strategies, in order to maximize the final score. But if you were building an autonomous agent, how would you actually model this? We know that the agent’s actions will change the state of the environment. So a model would need to be able to take a state and an action as input, and generate the maximum expected reward as output. But since that only gets you to the next state, you’ll need to take into account the total expected reward for every action from the current till the end state.
The way this works will be different for every application, and you’re probably not surprised to know that building a Tennis agent is different from building an Atari agent. The researchers at DeepMind used a series of Atari screenshots to build a convolutional neural network, with a couple of tweaks. The output wasn’t a class, but instead it was a target number for the maximum reward. So it was actually dealing with regression, not classification. They also didn’t use pooling layers, since unlike image recognition, individual positions of game objects, like the player, are all important and can’t be reduced.
A recurrent net could have been used too, as long as the output layer was tailored for regression, and the input at each time step included the action and the environment state. There’s also the Deep Q-Network, or DQN for short. The DQN also uses the principle of predicting the maximum reward given a state and action.
It was actually patented by Google, and it’s seen a lot of improvements like the Experience Replay and the Dueling Network Architecture. Reinforcement learning isn’t just a fancy, smart-sounding way to say supervised learning. Supervised learning is all about making sense of the environment based on historical examples. But that isn’t always the best way to do things.
Imagine if you’re trying to drive a car in heavy traffic based on the road patterns you observed the week before when the roads were clear. That’s about as effective as driving when you’re only looking at the rear view mirror. Reinforcement learning on the other hand is all about reward. You get points for your actions – like staying in your lane, driving under the speed limit, signaling when you’re supposed to, things like that. But you can also lose points for dangerous actions like tailgating and speeding.
Your objective is to get the maximum number of points possible given the current state of the traffic on the road around you. Reinforcement learning emphasizes that an action results in a change of the state, which is something a supervised learning model doesn’t focus on.
With reinforcement learning, an agent can explore the trade-off between exploration and exploitation, and choose the path to the maximum expected reward. But reinforcement learning falls under the broader umbrella of artificial intelligence. It involves topics like goal setting, planning, and perception. And it can even form a bridge between AI and the engineering disciplines. Reinforcement learning is simple and powerful, and given the recent advances, it has the potential to become a big force in the field of Deep Learning.