Archangel Macsika Sikademy Image
Archangel Macsika

Reinforcement Learning Agent for CARLA Autonomous Self-Driving Car Using Python

Tutorial 7 of 9 | 15 minutes read
Reinforcement Learning Agent for CARLA Autonomous Self-Driving Car Using Python - Sikademy

In the last tutorial of our autonomous self-driving car using CARLA and Python, we developed the class needed for our reinforcement learning environment.

We also set up all the nitty-gritty needed for our reinforcement learning agent to thrive and interact with the reinforcement learning environment.

In this tutorial, we are going to be working on our reinforcement learning agent.

It is worth knowing that every step taken by a reinforcement learning agent comes with a plausible prediction.

The is the ability of the reinforcement learning to choose exploration or exploitation in decision making.

Let me explain this concept in the simplest term.

Daily, most people are confronted with the same dilemma of should I keep doing what I do, or should I try something else.

For instance, should I go to my favorite eatery or should I try a new eatery, should I continue visiting my preferred spa or should I find a new one, etc...

In Reinforcement Learning, this type of decision-making process is called exploitation, when you keep doing what you were doing, or exploration, when you try something new.

For this reason, the question of how much to exploit and how much to explore naturally arises.

This means that our reinforcement learning agent will be training and predicting at the same time. However, our reinforcement learning agent must get as many frames per second as possible.

However, this poses a problem that is inherent in reinforcement learning.

This is because we will be receiving and processing a large amount of data from the image module due to calculations and weight comparisons, and as a result, our reinforcement model will be large.

Also, we want the program to run in real-time.

We would also like our training process to go as quickly as possible. To achieve this, we can use either multiprocessing or threading. These are common programming concepts, you should know.

To make things simple, we will stick with threading.

Introducing Keras and Tensorflow

To create our reinforcement learning agent, we have to think about the model to be used. For that reason, we will be using Keras and Tensorflow.

If you already have Keras and Tensorflow installed, please skip this part of the tutorial. Otherwise, use the commands below to install them.

How to Install and Verify Tensorflow on a macOS

Use the code below to install Tensorflow on a macOS or visit the Official Tensorflow site for other OS platforms.

pip install --upgrade tensorflow

Run the code below to verify your Tensorflow installation.

python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

How to Install and Verify Keras on a macOS

Use the code below to install Keras on a macOS or visit the Official Tensorflow site for other OS platforms.

First, let’s install a few Python dependencies:

pip install numpy scipy
pip install scikit-learn
pip install pillow
pip install h5py

Followed by installing Keras itself:

pip install keras

That’s it! Keras and Tensorflow are now installed on your system!

Set Up the Reinforcement Learning Agent

To explain the concept, we will have the main network, which is constantly evolving, and a target network, which will update every n things, where n is whatever you want and things are something like steps, the sequence of events, or episodes.

We will begin by adding some important CONSTANTS in the file from the last tutorial.

We may not go into details of how they work now, you will learn about each CONSTANT as we use them.

epsilon = 1
TRAINING_BATCH_SIZE = MINIBATCH_SIZE // 4 # The double forward slash means that we do not want any remainder
UPDATE_TARGET_EVERY = 5 # Number of episodes before the prediction model is updated
MODEL_NAME = "Xception" # This is the name of the model will we use from Keras
MEMORY_FRACTION = 0.8 # how much GPU we will be allocating to prevent the program from getting more GPU's than what we can spare.

Next, we will create a class named CarAgent in the file from the last tutorial. This will hold our codes for the reinforcement learning agent.

class CarAgent:

Next, we will add an __init__ method in the class along with some codes.

What we are essentially trying to achieve from the code above is to create two models. One of the models is the training model and the order is the prediction model, which will be used to compare the result of our training.

One way to use the two models would be to constantly update the prediction model as we train but doing that might result in getting volatile results.

Hence, what we will do instead is constantly train with the training model while holding the prediction model from updating.

After a certain number of episodes or training turns, we will update the prediction model.

Next, still in the __init__ method, we will write in some codes that enable us to get a memory of previous actions as we train. This will help with stabilizing our training process between both models.

First, we get the necessary import needed.

from collections import deque

Next, we add the code in the __init__ method.

self.replay_memory = deque(maxlen=REPLAY_MEMORY_SIZE)

The constant REPLAY_MEMORY_SIZE specified above is used here. In the constant we specified, the underscore between the 5 and the 3 zeros serves as a separator like a comma. So, it can be interpreted as 5,000.

Next, we will modify the TensorBoard for speed and storage purposes, and wrap up our __init__ method by setting some final values.

    self.tensorboard = ModifiedTensorBoard(log_dir=f"logs/{MODEL_NAME}-{int(time.time())}")
    self.target_update_counter = 0 
    self.graph = tf.get_default_graph()
    self.terminate = False
    self.last_logged_episode = 0
    self.training_initialized = False

The first line prevents the Tensorboard from going out of control, we want to make sure the speed and use of storage are checked.

The second line serves as a counter that is updated after each episode runs. The third line will be useful since we are carrying out prediction and training in different Threads. The fourth line is one of the flags we will use for the Threads.

The fifth line helps to keep track of Tensorboard since some of the things we do will be asynchronous. We are going to use the last line self.training_initialized to track our progress when we start running simulations.

Final code for the CarAgent class up to this point.

Creating a Model for Reinforcement Learning Agent Using Keras

Here, we are just going to make use of the prebuilt Xception model from Tensorflow, but you could use some a different prebuilt model or import yours.

We are also adding GlobalAveragePooling to our output layer, as well as the three neuron output that represents each possible action that the reinforcement learning agent can take.

Let start by making some required imports for this section. Add the code below at the top of the file where we have other imports.

from keras.applications.xception import Xception
from keras.layers import Dense, GlobalAveragePooling2D
from keras.optimizers import Adam
from keras.models import Model

Now let's create our model:

The base_model specifies the prebuilt Xception model from Tensorflow as our default model. Whenever we use a prebuilt model from Tensorflow, we have to change the input and output layer. This is done by specifying what to use as input and output.

We used the output of the base_model, added the GlobalAveragePooling and dense layer to it.

The "3" specified in the predictions indicate the three possible turns (turn left, go straight, turn right). later on, we set the inputs for the model and the outputs as predictions.

Next, we will add a quick method in our CarAgent for updating the replay memory.

Creating the train method for our Reinforcement Learning Agent

Next, we will work on the train() method.

To begin, it is important to know that we only want to train if we have a bare minimum of samples in replay memory

Earlier on, we specified the MIN_REPLAY_MEMORY_SIZE constant as 1_000 (1000).


The if statement above ends the training if we do not have enough samples in replay memory because we do not want to do anything if we do not have enough replay memory samples (At least 1000) to work with.

However, if we do have enough replay memory (1000 and above), then we will begin our training.

To do that, we first need to grab a random minibatch.

minibatch = random.sample(self.replay_memory, MINIBATCH_SIZE)

As soon as we have our minibatch, we want to calculate our current and future states so we can do the training operation.

The current state uses the self.model to predict while the future state uses the target_model

        current_states = np.array([transition[0] for transition in minibatch])/255
        with self.graph.as_default():
            current_list = self.model.predict(current_states, PREDICTION_BATCH_SIZE)

        future_states = np.array([transition[3] for transition in minibatch])/255
        with self.graph.as_default():
            future_list = self.target_model.predict(future_states, PREDICTION_BATCH_SIZE)

Next, we want to enumerate over the batches and do our training.

we create our inputs (X) and outputs (y):

        X = []
        y = []

        for index, (current_state, action, reward, new_state, done) in enumerate(minibatch):
            if not done:
                max_future_q = np.max(future_list[index])
                new_q = reward + DISCOUNT * max_future_q
                new_q = reward

            current_qs = current_list[index]
            current_qs[action] = new_q


Next, we will need to set up a way to keep track of our training logs.

    log_this_step = False
        if self.tensorboard.step > self.last_logged_episode:
            log_this_step = True
            self.last_log_episode = self.tensorboard.step

Next, we will carry out the fitment. We will be setting the tensorboard callback only if log_this_step is true. If it's false, then we will still carry out fitment, but won't log to TensorBoard.

with self.graph.as_default():, np.array(y), batch_size=TRAINING_BATCH_SIZE, verbose=0, shuffle=False, callbacks=[self.tensorboard] if log_this_step else None)

Next, we want to continue tracking for logging:

if log_this_step:
     self.target_update_counter += 1

Finally, we'll check to see if it's time to update our target_model:

if self.target_update_counter > UPDATE_TARGET_EVERY:
    self.target_update_counter = 0

That concludes the train method.

Now, we need a method to get q values (basically to make a prediction).

Finally, we create a method that handles the training and prediction parts of the reinforcement learning agent.

To begin, we will use some random data to initialize, then we begin an infinite loop. An infinite loop is best used here to prevent reinitializing everything whenever we run the program, which speeds things up for us a bit.

With that, we are done with the reinforcement learning agent of our autonomous self-driving car using CARLA.

Full Code of Reinforcement Learning Agent.

Wrap Off

If you run into errors or unable to complete this tutorial, feel free to contact us anytime, and we will instantly resolve it. You can also request clarification, download this tutorial as a pdf, or report bugs using the buttons below.

Enjoy this Tutorial? Please don't forget to share.