🤖 Soft Actor-Critic (SAC) on Ant-v5 — Modernized OpenAI Spinning Up

This repository presents a fully trained Soft Actor-Critic (SAC) agent on the Ant-v5 environment using a modernized PyTorch-based version of OpenAI's Spinning Up in Deep RL educational framework.

Developed, trained, and maintained by MoniGarr — a self-directed AI researcher focused on NLP, multimodal systems, and RL control frameworks.

Project Mission

This work contributes to the revitalization of OpenAI’s highly respected Spinning Up in Deep RL codebase. The original repo no longer supported Python 3.8+, latest MuJoCo, or gymnasium. This project patches those limitations and showcases a reproducible, high-performing SAC agent for the modern Ant-v5 benchmark.

It also supports my broader mission: to demonstrate technical excellence and creativity in deep reinforcement learning and AI research while advancing open and inclusive access to intelligent systems. Some of my online students and clients use my demos for learning purposes.

Model Details

Attribute	Value
Algorithm	Soft Actor-Critic (SAC)
Framework	PyTorch (Modernized Spinning Up)
Environment	`Ant-v5` via `gymnasium[mujoco]`
Epochs	250
Action Space	Continuous (Box)
Observation Space	Continuous (Box)
Command Used	`python -m spinup.run sac --env Ant-v5 --epochs 250 --exp_name experiment_sac_antv5_july_20_2025`

Training Metrics Summary

Metric	Description
`AverageEpRet`	Average return per episode (training)
`StdEpRet`	Std deviation of return
`MaxEpRet`	Max episode return in this run
`MinEpRet`	Min episode return in this run
`AverageTestEpRet`	Average return on test episodes

Full logs: https://github.com/monigarr/spinningup/tree/monigarr-dev/data/experiment_sac_antv5_july_20_2025/progress.txt

🔍 Research Observations

Policy performance stabilized after ~200 epochs
Reward-to-noise ratio improved with tuned entropy coefficient (α = 0.2)
Robust gait developed for complex terrain and perturbations

🧪 Research Context

This experiment is part of a broader initiative to:

Modernize and benchmark deep RL frameworks
Create reproducible SAC baselines for MuJoCo control tasks
Prepare high-quality artifacts for hybrid/remote AI research roles (RL, multimodal AI, language models)

I am currently pursuing research roles, residencies and collaborations with a focus on intelligent control systems and language-grounded agents. I bring 30+ years of technical experience/ (previous lead mobile software architect / engineer / dev, XR producer, 3D Technical Artist), speak Kanien’kéha dialects (Mohawk Language), and a long-standing record of building ethical, useful, and inclusive AI.

🚀 Quickstart — Run the Model

# Install required libraries
pip install torch gymnasium[mujoco]

# Clone this repo (or download model + config)
git clone https://huggingface.co/MoniGarr/sac-antv5-modernized
cd sac-antv5-modernized

# Launch the SAC agent (interactive render)
python run_agent.py --env Ant-v5 --model_path ./pyt_save/model.pt


Author & Contact
MoniGarr

- AI Researcher — NLP · RL · Multimodal AI
- Based in Akwesasne / Massena, New York
- [email protected] | github.com/monigarr

I’m looking to collaborate with ethical AI teams, remote research labs, and mission-driven builders of intelligent systems.

monigarr
/

spinning-up-on-ant-sac