Archangel Macsika Sikademy Image
Archangel Macsika

How to Integrate the Reinforcement Learning Agent in the Environment

Tutorial 8 of 9 | 13 minutes read
How to Integrate the Reinforcement Learning Agent in the Environment - Sikademy

In the last two tutorials of this tutorial series, we developed the reinforcement learning environment and reinforcement learning agent of our autonomous self-driving car using CARLA.

In this tutorial, we will be wrapping up the development of our autonomous self-driving car using CARLA and Python by putting up the pieces that tie the reinforcement learning environment to the reinforcement learning agent.

Setting Up a Modified Tensorboard Class

To begin, add these imports below to the list of imports in the autonomous-car-tutorial2.py file from the last tutorial.



from keras.callbacks import TensorBoard
import tensorflow as tf
import keras.backend.tensorflow_backend as backend
from threading import Thread
from tqdm import tqdm



Copy and paste the modified Tensorboard class below to the top of our file above the CarEnvironment class. This class will override the existing tensorboard, which is what we will be working on.

Let's explain the purpose of each method override.

The purpose of the code above is to simplify the amount of logging that TensorFlow/TensorBoard does.

Without that, there is a log file written per fitment, and a data point per step, which complicates things and makes them very difficult to manage with reinforcement learning.

We override the __init__() method to set the initial step and writer because we want one log file for all .fit() calls that are made.

We override the set_model() method to stop creating the default log writer.

We override the on_epoch_end() method to save logs with our step number otherwise every .fit() will start writing from the 0th step.

We override the on_batch_end() method because we train for one batch only, no need to save anything at epoch end.

We override the on_train_end() method to stop the writer from closing on its own.

The update_stats() is a custom method for saving our own metrics, create a writer, and write custom metrics, and close writer.

Next, add the following code at the bottom of our script:

In the code above, first, we set some FPS values (frames per second). When we begin the training, we will have a high epsilon value.

This means that there is a high probability that we will randomly choose an action, instead of predicting it with our neural network.

Using a random choice will be quicker than predicting an operation.

We also set some codes to handle repetitive results, memory faction when training multiple agents, create the model directory if it doesn't exist yet, this is where our models will go, and the last code block simply creates the reinforcement learning agent and the reinforcement learning environment.

Next, we will create the code that starts the training thread.



    trainer_thread = Thread(target=agent.train_in_loop, daemon=True)
    trainer_thread.start()
    while not agent.training_initialized:
        time.sleep(0.01)


The code above starts the training thread and wait for training to be initialized.

We also want to initialize predictions.



    agent.get_qs(np.ones((env.img_height, env.img_width, 3)))


The first prediction is usually slow. So, it is more efficient to do the first prediction before we start iterating over the episodes.

Now, we will start iterating over the episodes.

The code above iterates over the episodes using a for loop. During the looping, it updates the tensorboard step in each episode.

Then it restarts episodes, which resets episode reward and step number. Afterward, it gets the initial state of the reinforcement learning environment and resets it.

Finally, it resets the flag and starts iterating until the episode ends

We've set some initial values for our reinforcement learning environments, and now we're ready to run.

A reinforcement learning environment will run until it's done, so we can use a while True loop and break on our done flag.

As we run the program, we either want to take a random action or figure out our current action based on the model of our reinforcement learning agent.



    while True:
        if np.random.random() > epsilon:
            action = np.argmax(agent.get_qs(current_state))
        else:
            action = np.random.randint(0, 3)
            time.sleep(1/FPS)


In the code above, the model runs for a given number of seconds only. The if block gets an action from a Q-table, which is the action we predict ourselves.

In the else block, it gets a random action. Then we add a sleep() function since the prediction here doesn't take as much time as the first.

Next, we will get our information from our reinforcement learning environment's .step() method, which takes our action as a parameter.



    new_state, reward, done, _ = env.step(action)
    episode_reward += reward
    agent.update_replay_memory((current_state, action, reward, new_state, done))
    current_state = new_state
    step += 1
    if done:
        break


The code above transforms a new continuous state to a new discrete state and counts the reward. Next, we make every step we update to replay the memory.

Wrapping Up the Reinforcement Learning Agent of our Autonomous Self-driving Car

When we are done with the training and predicting, we need to carry out some actions.

First, we need to delete the actors from our reinforcement learning environment.



for actor in env.actor_list:
    actor.destroy()


Next, we will write the codes to handle some stats, and save the models that returned a good reward.

You can also use the opportunity to write any other rule you decide to set as an if statement.

In the code above, we append episode reward to a list and log the stats after every given number of episodes.

And then we save the model only when the minimum reward is greater or equal to a set value.

Next, we will decay epsilon.



            if epsilon > MIN_EPSILON:
                epsilon *= EPSILON_DECAY
                epsilon = max(MIN_EPSILON, epsilon)


Finally, if we have successfully iterated through all of our target episodes, we can exit the training and prediction by setting a termination flag for the training thread and waiting for it to finish before exiting.



    agent.terminate = True
    trainer_thread.join()
    agent.model.save(f'models/{MODEL_NAME}__{max_reward:_>7.2f}max_{average_reward:_>7.2f}avg_{min_reward:_>7.2f}min__{int(time.time())}.model')



And that concludes the development of the reinforcement learning environment and reinforcement learning agent for our autonomous self-driving car using CARLA and Python.

Full Code of Reinforcement Learning Agent and Reinforcement Learning Environment.

Wrap Off

In this tutorial, we finished off the development of our autonomous self-driving car using CARLA and Python by putting up the pieces that tie the reinforcement learning environment to the reinforcement learning agent.

You can proceed to run the program and test things out for yourself. However, we will be running the program in the next tutorial.

If you run into errors or unable to complete this tutorial, feel free to contact us anytime, and we will instantly resolve it. You can also request clarification, download this tutorial as a pdf, or report bugs using the buttons below.


Enjoy this Tutorial? Please don't forget to share.