πŸ€– Soft Actor-Critic (SAC) on Ant-v5 β€” Modernized OpenAI Spinning Up

This repository presents a fully trained Soft Actor-Critic (SAC) agent on the Ant-v5 environment using a modernized PyTorch-based version of OpenAI's Spinning Up in Deep RL educational framework.

Developed, trained, and maintained by MoniGarr β€” a self-directed AI researcher focused on NLP, multimodal systems, and RL control frameworks.


Project Mission

This work contributes to the revitalization of OpenAI’s highly respected Spinning Up in Deep RL codebase. The original repo no longer supported Python 3.8+, latest MuJoCo, or gymnasium. This project patches those limitations and showcases a reproducible, high-performing SAC agent for the modern Ant-v5 benchmark.

It also supports my broader mission: to demonstrate technical excellence and creativity in deep reinforcement learning and AI research while advancing open and inclusive access to intelligent systems. Some of my online students and clients use my demos for learning purposes.


Model Details

Attribute Value
Algorithm Soft Actor-Critic (SAC)
Framework PyTorch (Modernized Spinning Up)
Environment Ant-v5 via gymnasium[mujoco]
Epochs 250
Action Space Continuous (Box)
Observation Space Continuous (Box)
Command Used python -m spinup.run sac --env Ant-v5 --epochs 250 --exp_name experiment_sac_antv5_july_20_2025

Training Metrics Summary

Metric Description
AverageEpRet Average return per episode (training)
StdEpRet Std deviation of return
MaxEpRet Max episode return in this run
MinEpRet Min episode return in this run
AverageTestEpRet Average return on test episodes

Full logs: https://github.com/monigarr/spinningup/tree/monigarr-dev/data/experiment_sac_antv5_july_20_2025/progress.txt


πŸ” Research Observations

  • Policy performance stabilized after ~200 epochs
  • Reward-to-noise ratio improved with tuned entropy coefficient (Ξ± = 0.2)
  • Robust gait developed for complex terrain and perturbations

πŸ§ͺ Research Context

This experiment is part of a broader initiative to:

  • Modernize and benchmark deep RL frameworks
  • Create reproducible SAC baselines for MuJoCo control tasks
  • Prepare high-quality artifacts for hybrid/remote AI research roles (RL, multimodal AI, language models)

I am currently pursuing research roles, residencies and collaborations with a focus on intelligent control systems and language-grounded agents. I bring 30+ years of technical experience/ (previous lead mobile software architect / engineer / dev, XR producer, 3D Technical Artist), speak Kanien’kΓ©ha dialects (Mohawk Language), and a long-standing record of building ethical, useful, and inclusive AI.


πŸš€ Quickstart β€” Run the Model

# Install required libraries
pip install torch gymnasium[mujoco]

# Clone this repo (or download model + config)
git clone https://huggingface.co/MoniGarr/sac-antv5-modernized
cd sac-antv5-modernized

# Launch the SAC agent (interactive render)
python run_agent.py --env Ant-v5 --model_path ./pyt_save/model.pt


Author & Contact
MoniGarr

- AI Researcher β€” NLP Β· RL Β· Multimodal AI
- Based in Akwesasne / Massena, New York
- [email protected] | github.com/monigarr

I’m looking to collaborate with ethical AI teams, remote research labs, and mission-driven builders of intelligent systems.
Downloads last month
7
Video Preview
loading

Space using monigarr/spinning-up-on-ant-sac 1