metadata
license: mit
library_name: marl-ppo-suite
tags:
- reinforcement-learning
- starcraft-mac
- smacv2
- happo
- protoss_5_vs_5
- smacv2_protoss_5_vs_5
model-index:
- name: HAPPO on smacv2_protoss_5_vs_5
results:
- task:
type: reinforcement-learning
name: StarCraft Multi-Agent Challenge v2
dataset:
name: protoss_5_vs_5
type: smacv2
metrics:
- name: win-rate
type: win_rate
value: 0.353
- name: mean-reward
type: mean_reward
value: 16.92
- name: mean-ep-length
type: mean_episode_length
value: 57.1
HAPPO on smacv2_protoss_5_vs_5
10 M environment steps · 28.89 h wall-clock · seed 1
This is a trained model of a HAPPO
agent playing smacv2_protoss_5_vs_5.
The model was produced with the open-source
marl-ppo-suite
training
code.
Usage – quick evaluation / replay
# 1. install the codebase (directly from GitHub)
pip install "marl-ppo-suite @ git+https://github.com/legalaspro/marl-ppo-suite"
# 2. get the weights & config from HF
wget https://huggingface.co/<repo-id>/resolve/main/final-torch.model
wget https://huggingface.co/<repo-id>/resolve/main/config.json
# 3-a. Generate a StarCraft II replay file 1 episode in starcraft replay folder
marl-train --mode render --model final-torch.model --config config.json --render_episodes 1 \
# 3-b. generate additionally video drawn from frames
marl-train --mode render --model final-torch.model --config config.json --render_episodes 1 --render_mode rgb_array
Files
final-torch.model
– PyTorch checkpointreplay.mp4
– gameplay of the final policyconfig.json
– training configtensorboard/
– full logs
Hyper-parameters
{
"clip_param": 0.1,
"data_chunk_length": 10,
"entropy_coef": 0.01,
"fc_layers": 2,
"gae_lambda": 0.95,
"gamma": 0.99,
"hidden_size": 64,
"lr": 0.0005,
"n_steps": 400,
"num_mini_batch": 1,
"ppo_epoch": 5,
"reward_norm_type": "efficient",
"seed": 1,
"state_type": "AS",
"use_reward_norm": true,
"use_rnn": true,
"use_value_norm": false,
"value_norm_type": "welford"
}