So far in this tutorial series of an Autonomous self-driving car project using CARLA and python, we have mostly worked with predefined logic and packages to set up the car itself.
In this tutorial, we will begin working on the artificial intelligence aspect of the project that will make our car live up to its name of being Autonomous and self-driving.
This will be achieved using one of three basic machine learning paradigms known as reinforcement learning.
But first, let's learn a bit about reinforcement learning and what it entails.
What is Reinforcement Learning?
Reinforcement Learning is one of three basic machine learning method that is concerned with directing how machine learning models known as agents should behave in an environment.
To achieve this, reinforcement learning agents are trained to take a sequence of decisions and conditioned to get either a reward or a penalty for the actions it performs in an environment.
The goal of a reinforcement learning agent is to maximize the total reward without getting any hints or suggestions on how to behave or solve the problem.
This means that the agent ought to figure out how to perform the given task to maximize the reward, beginning from random trials and finishing with sophisticated tactics and extraordinary skills.
This can be positive or negative depending on the intended desire of the designer.
If a reinforcement learning agent runs on sufficiently powerful computer infrastructure, it can gather experience from thousands of parallel repetitions.
Common Terms Used in Reinforcement Learning
Below are some terms that are commonly used when working with Reinforcement Learning:
Agent: an agent is an object or an entity that performs an action in an environment for a positive or negative reward.
Environment: An environment is a scene or setting that an agent will thrive in or carry out tasks.
Reward: This is the compensation given to an agent when a specific action or task is performed. It can be negative or positive.
State: State refers to the current setting of the environment the agent is in.
Policy: This is a strategy which the agent generates and applies to decide the next action based on the current state.
Value: This is a long-term benefit an agent expects, as compared to the short-term reward.
Value Function: This specifies the amount of reward an agent which should be expecting from a state.
Model of the environment: This mimics the behavior of the environment to make inferences and determine how the environment will behave.
When Should You Use Reinforcement Learning?
When the problem you are trying to solve cannot be described as a supervised learning (SL) problem but can be described as a reinforcement learning (RL) problem.
When the environment in which you are training the agent has a natural reward signal(games for example) and actions that can be taken in said environment.
When your problem is not of a predictive nature but an actionable nature.
Used in cases where you have to learn a policy that maps an agent’s state to action to reach the desired goal
When Should You Not Use Reinforcement Learning?
When you do not have an adequate understanding of the reinforcement agent or the environment.
When you already have a sufficient amount of valuable data to solve a problem with a supervised learning method.
When you do not have enough computing power to handle running the model training due to the heavy computing demand and time-consuming operations of reinforcement learning.
Notable Example of Reinforcement Learning
If you’re an ardent follower of the innovations in Artificial Intelligence, you may have seen headlines or heard about Google's AlphaGo Zero that made quite the buzz in 2017.
Go is a popular Chinese board game invented more than 2,500 years ago.
AlphaGo Zero was a bot developed by Google that leveraged reinforcement learning to become the first computer program to defeat a world champion in the game of Go.
The first version of AlphaGo was the first computer Go program to defeat a professional Go player without handicaps.
The next version of the bot, AlphaGo Master, defeated the Go world champion, Ke Jie.
How Reinforcement Learning is Used in Autonomous Self-Driving Cars
Training the models or agents that control autonomous cars is an excellent example of a potential application of reinforcement learning.
In self-driving cars, there are various aspects to consider, such as speed limits at various places, drivable zones, road safety rules, traffic rules, avoiding collisions, traffic jams, etc.
Under normal circumstances, when training an autonomous self-driving car to use reinforcement learning, the computer or reinforcement learning agent should not get instructions on driving the car.
This allows the machine to learn from its own errors while the programmer or designer regulates this using the reward function.
To throw more light on this, under reasonable circumstances, we would require an autonomous self-driving car to make safety a priority, regulate speed, minimize ride time, reduce pollution, offer passengers comfort, and obey the rules of law.
However, cars aren't designed for domestic road users alone, and as a result, our earlier policy may not be generally applicable.
Take an autonomous race car for instance, in this case, speed will take much more precedence over the driver’s comfort, and in some cases, more than safety.
It is implausible to think that a programmer can predict everything that could happen on the road.
Trust me, a lot happens on the road especially in countries with little or no regard for rational driving behavior or countries that lack a good road infrastructure.
As a result, instead of building an unnecessarily lengthy “if-then” instructions for an autonomous self-driving car, the programmer prepares the reinforcement learning agent to be capable of learning from the system of rewards and penalties.
The agent gets rewards for reaching specific goals or penalties for failing a task.
Some of the tasks of an autonomous self-driving car where reinforcement learning could play a major role include trajectory optimization, motion planning, dynamic pathing, controller optimization, and scenario-based learning policies for highways.
For example, parking can be achieved by learning automatic parking policies.
Lane changing can be achieved using Q-Learning while overtaking can be implemented by learning an overtaking policy while avoiding collision and maintaining a steady speed thereafter.
Challenges of Reinforcement Learning
Here are the major challenges you will face while doing Reinforcement earning:
Too much Reinforcement may lead to an overload of states which can diminish the results.
Realistic environments can have partial observability to be non-stationary.
Some parameters may affect the speed at which the agents learn.
Now you've learned everything you need to know to get started with reinforcement learning, you are knowledgeable enough to begin the next tutorial.
In the next tutorial, we will be writing some codes to set up our environment for reinforcement learning and begin training our autonomous self-driving car.
If you run into errors or unable to complete this tutorial, feel free to contact us anytime, and we will instantly resolve it. You can also request clarification, download this tutorial as pdf, or report bugs using the buttons below.