Introduction to Reinforcement Learning

by Nimish Sanghi

Reinforcement Learning (RL) is a sub topic under Machine Learning. It is one of the fastest growing disciplines helping make AI real. Combining Deep Learning with Reinforcement Learning has led to many significant advances that are increasingly getting machines closer to act the way humans do. All intelligent beings start with a small knowledge. However, as they interact with the world and gain experiences, they learn to adapt to the environment becoming better at doing things. The modern concept of Reinforcement Learning is combination of two different threads through their individual development. 

First is the concept of optimal control especially the discipline of Dynamic Programming introduced by Richard Bellman in 1950. It is all about planning through the space of various options using Bellman recursive equations. The second thread is learning by trial and error which finds its origin in psychology of animal training. Edward Thorndike was the first one to express the concept of “trial and
error” in clear terms. In his words: In 1980’s these two fields merged together to give rise to the field of modern Reinforcement Learning. In last decade, with the emergence of powerful deep learning methods, reinforcement learning when combined with deep learning is giving rise to very powerful algorithms that could make Artificial Intelligence real in coming times.

Machine Learning Branches
Machine Learning involves learning from data presented to the system so that system can perform a specified task. The system is not explicitly told how to do the task. Rather, it is presented with the data and system learns to carry out some task based on a defined objective.

Supervised Learning
In Supervised Learning, the system is presented with the labelled data and the objective is to generalize knowledge so that new unlabeled data can be labelled. Consider that images of cats and dogs are presented to the system along with labels of which image shows a cat or a dog. The input data is represented as a set of data , where are the pixel values of individual images and are the labels of the respective images say value of 0 for an image of cat and value of 1 for image of a dog. The system takes this input and learns a mapping from image to label . Once trained, the system is presented with a new image to get a prediction of the label = 0 or 1 depending on whether the image is that of a cat or a dog. This is a classification problem where the system learns to classify an input into correct class. We have similar setup for regression where we want to predict a continuous output based on the vector of input values.

Unsupervised Learning
The second branch is Unsupervised Learning. Unsupervised Learning has no labels. It only has the inputs and no labels. The system uses this data to learn the hidden structure of the data so that it can cluster/categorize the data into some broad categories. Post learning, when the system is presented with a new data point , it can match the new data point to one of the learnt clusters. Unlike Supervised Learning, there is no well-defined meaning to each category. Once the data is clustered into category, based on most common attributes within a cluster we could assign some meaning to it. The other use of Unsupervised Learning is to use leverage underlying input data to learn the data distribution so that the system can be subsequently queried to produce a new synthetic data point.

Reinforcement Learning (RL)
While supervised Learning is learning with a teacher - the labelled data telling the system what the mapping form input to output is, RL is more like learning with a critic. The Critic gives feedback to the learner (the model) on how good or bad his knowledge is. The learner uses this feedback to incrementally improve its knowledge. In Reinforcement Learning, the agent does not have prior knowledge of the system. It gathers feedback and uses that feedback to plan/learn actions to maximize a specific objective. As it does not have enough information about the environment initially, it must explore to gather insights. Once it gathers “enough” knowledge, it needs to exploit that knowledge to start adjusting its behavior in order to maximize the objective it is chasing. The difficult part is that there is no way to know when the exploration is “enough. If the agent continues to explore even after it has obtained perfect knowledge, it is wasting resources trying to gather new information of which there is none left. On the other hand, if the agent prematurely assumes that it has gathered enough knowledge it may land up optimizing based on incomplete information and
may perform poorly. This dilemma of when to explore and when to exploit is the core recurring theme of Reinforcement Learning algorithms.

To motivate further, let us look at an example of how RL is being used today with recommendation systems. Today we see recommendation systems everywhere. Video sharing/hosting applications such as YouTube and Facebook suggest us the videos that
we would like to watch based on our viewing history. All such recommendation engines are increasing getting driven by Reinforcement Learning based systems. These systems continually learn from the way users respond to the suggestions presented by the engine. A user acting on the recommendation reinforces these actions as good actions given the context.

Reinforcement Learning is seeing significant advances. There is more to the basic RL which I cover in my book's last chapter. There are evolving disciplines like Imitation and Inverse Learning, Derivative free methods, Transfer and Multi Task Learning as well as Meta Learning. RL is finding increasing use in very diverse applications ranging from Health Care, Autonomous Vehicles, Robots, Finance and e-commerce as well as various other fields. In this blog I have tried to introduce the field of RL, which I go much further into in my new Apress book Deep Reinforcement Learning with Python.

About the Author

Nimish Sanghi is a passionate technical leader who brings to table extreme focus on use of technology for solving customer problems. He has over 25 years of work experience in the Software and Consulting. Nimish has held leadership roles with P&L responsibilities at PwC, IBM and Oracle. In 2006 he set out on his entrepreneurial journey in Software consulting at SOAIS with offices in Boston, Chicago and Bangalore. Today the firm provides Automation and Digital Transformation services to Fortune 100 companies helping them make the transition from on-premise applications to the cloud. He is also an angel investor in the space of AI and Automation driven startups. He has co-founded Paybooks, a SaaS HR and Payroll platform for Indian market. He has also cofounded a Boston based startup which offers ZipperAgent and ZipperHQ, a suite of AI driven workflow and video marketing automation platforms. He currently hold the position as CTO and Chief Data Scientist for both these platforms. Nimish has an MBA from Indian Institute of Management in Ahmedabad, India and a BS in Electrical Engineering from Indian Institute of Technology in Kanpur, India. He also holds multiple certifications in AI and Deep Learning.

This article was contributed by Nimish Sanghi, co-author of Deep Reinforcement Learning with Python.