PPO Agent playing AntBulletEnv-v0

This is a trained model of a PPO agent playing AntBulletEnv-v0 using the stable-baselines3 library.

Usage (with Stable-baselines3)

from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub

...

MODEL model = PPO(policy = "MlpPolicy", env = env, batch_size = 256, clip_range = 0.4, ent_coef = 0.0, gae_lambda = 0.92, gamma = 0.99, learning_rate = 3.0e-05, max_grad_norm = 0.5, n_epochs = 30, n_steps = 512, policy_kwargs = dict(log_std_init=-2, ortho_init=False, activation_fn=nn.ReLU, net_arch=[dict(pi=[256, 256], vf=[256, 256])] ), use_sde = True, sde_sample_freq = 4, vf_coef = 0.5, tensorboard_log = "./tensorboard", verbose=1)

model.learn(1_000_000)

Downloads last month: 2

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on AntBulletEnv-v0
self-reported

2447.40 +/- 23.14

View on Papers With Code