HAPPO on smacv2_zerg_5_vs_5
10 M environment steps Β· 16.85 h wall-clock Β· seed 1
This is a trained model of a HAPPO
agent playing smacv2_zerg_5_vs_5.
The model was produced with the open-source
marl-ppo-suite
training
code.
Usage β quick evaluation / replay
# 1. install the codebase (directly from GitHub)
pip install "marl-ppo-suite @ git+https://github.com/legalaspro/marl-ppo-suite"
# 2. get the weights & config from HF
wget https://huggingface.co/<repo-id>/resolve/main/final-torch.model
wget https://huggingface.co/<repo-id>/resolve/main/config.json
# 3-a. Generate a StarCraft II replay file 1 episode in starcraft replay folder
marl-train --mode render --model final-torch.model --config config.json --render_episodes 1 \
# 3-b. generate additionally video drawn from frames
marl-train --mode render --model final-torch.model --config config.json --render_episodes 1 --render_mode rgb_array
Files
final-torch.model
β PyTorch checkpointreplay.mp4
β gameplay of the final policyconfig.json
β training configtensorboard/
β full logs
Hyper-parameters
{
"clip_param": 0.05,
"data_chunk_length": 10,
"entropy_coef": 0.01,
"fc_layers": 2,
"gae_lambda": 0.95,
"gamma": 0.99,
"hidden_size": 64,
"lr": 0.0005,
"n_steps": 200,
"num_mini_batch": 1,
"ppo_epoch": 5,
"reward_norm_type": "efficient",
"seed": 1,
"state_type": "AS",
"use_reward_norm": true,
"use_rnn": true,
"use_value_norm": false,
"value_norm_type": "welford"
}
- Downloads last month
- 19
Evaluation results
- win-rate on zerg_5_vs_5self-reported0.397
- mean-reward on zerg_5_vs_5self-reported13.770
- mean-ep-length on zerg_5_vs_5self-reported26.800