Train our robot

In order to start training, we’ll first need to install the imitation library in the same venv / conda env where you installed Godot RL Agents by using: pip install imitation

Download a copy of the imitation learning script from the Godot RL Repository.

Run training using the arguments below:

sb3_imitation.py --env_path="path_to_ILTutorial_executable" --bc_epochs=100 --gail_timesteps=1450000 --demo_files "path_to_expert_demos.json" --n_parallel=4 --speedup=20 --onnx_export_path=model.onnx --experiment_name=ILTutorial

Set the env path to the exported game, and demo files path to the recorded demos. If you have multiple demo files add them with a space in between, e.g. --demo_files demos.json demos2.json.

You can also set a large amount of timesteps for --gail_timesteps and then manually stop training with CTRL+C. I used this method to stop training when the reward started to approach 3, which was at total_timesteps | 1.38e+06. That took ~41 minutes, and the BC pre-training took ~5.5 minutes on my PC using CPU for training.

To observe the environment while training, add the --viz argument. For the duration of the BC training, the env will be frozen as this stage doesn’t use the env except to get some information about the observation and action spaces. During the GAIL training stage, the env rendering will update.

Here are the ep_rew_mean and ep_rew_wrapped_mean stats from the logs displayed using tensorboard, we can see that they are closely matching in this case:

You can find the logs in `logs/ILTutorial` relative to the path you started training from. If making multiple runs, change the `--experiment_name` argument between each.

Even though setting the env rewards is not necessary and not used for the training here, a simple sparse reward was implemented to track success. Falling outside the map, in water, or traps sets reward += -1, while activating the lever, collecting the key, and opening the chest each set reward += 1. If the ep_rew_mean approaches 3, we are getting a good result. ep_rew_wrapped_mean is the reward from the GAIL discriminator, which does not directly tell us how successful the agent is at solving the environment.

Let’s test the trained agent

After training, you’ll find a model.onnx file in the folder you started the training script from (you can also find the full path to the .onnx file in the training log in the console, near the end). Copy it to the Godot game project folder.

Open the onnx inference scene

This scene, like the demo record scene, uses only one copy of the level. It also has it’s Sync node mode set to Onnx Inference.

Click on the Sync node and set the Onnx Model Path property to model.onnx.

Press F6 to start the scene and let’s see what the agent has learned!

Video of the trained agent:

It seems the agent is capable of collecting the key from both positions (left platform or right platform) and replicates the recorded behavior well. If you’re getting similar results, well done, you’ve successfully completed this tutorial! 🏆👏

If your results are significantly different, note that the amount and quality of recorded demos can affect the results, and adjusting the number of steps for BC/GAIL stages as well as modifying the hyper-parameters in the Python script can potentially help. There’s also some run-to-run variation, so sometimes the results can be slightly different even with the same settings.

< > Update on GitHub

Deep RL Course

Train our robot

Download a copy of the imitation learning script from the Godot RL Repository.

Run training using the arguments below:

Let’s test the trained agent

Open the onnx inference scene