File size: 2,433 Bytes

e964946

---
tags:
  - stable-baselines3
  - power-grid
  - ppo
  - lstm
  - electricity
  - reinforcement-learning
  - forecasting
  - tensorflow
  - gym
license: mit
---

# ⚡ Power Grid Optimization with LSTM + PPO

This repository showcases a hybrid deep learning + reinforcement learning system for power grid optimization in Lauderdale County, AL. The system forecasts demand using a weather-informed LSTM model and trains a PPO-based agent to maintain stability and minimize blackout risk under stress.

---

## 📈 Models

- **LSTM Demand Predictor**  
  A deep bidirectional LSTM with attention, trained on 4 years of TVA and weather data.

- **PPO Grid Policy**  
  Trained in a custom `PowerGridEnv` with generator output, transformer tap, and load shedding control.

---

## 🧠 Dataset Overview

- **Demand Data:**  
  Sourced from the U.S. EIA (TVA region, 2021–2024)  
  - Demand, Net Generation, Day-Ahead Forecasts, Interchange

- **Weather Data:**  
  Daily min/max temperatures + precipitation  
  - From 5 major TVA-region airports via NOAA

---

## 🧮 LSTM Model

- **Architecture:**  
  2-layer bidirectional LSTM + attention, followed by global pooling and dense layers.

- **Key Features:**  
  - Rolling temperature windows, demand lags  
  - Weekly mean demand, change rate  
  - Temp volatility, extreme flags

- **Metrics:**  
  | Metric        | Value              |
  |---------------|--------------------|
  | R²            | 0.911              |
  | RMSE          | 19,565 MWh         |
  | Mean Error    | 713 MWh (overbias) |
  | Beats TVA Forecast | 70.08% of days |

---

## 🤖 PPO DRL Agent

- **Environment:**  
  PyPSA-based Lauderdale County grid  
  - 6 generators (Nuclear, Hydro, CCGT)  
  - Load centers with realistic demand shares  
  - Thermal constraints, ramp limits, marginal costs

- **Action Space:**  
  - Generator control  
  - Transformer tap shift  
  - Load shedding (up to 20%)

- **Reward Design:**  
  ✅ Balance demand/supply, low thermal overload  
  ❌ Penalize instability, overloads, excessive cost

- **Training:**  
  - Algorithm: PPO (SB3)  
  - Timesteps: 400,000  
  - VecNormalize, 5 eval episodes per 2048 steps

- **Metrics:**  
  | Metric             | Value     |
  |--------------------|-----------|
  | Mean Reward        | ~1480     |
  | Explained Variance | Up to 0.85 |
  | Blackout Risk      | < 5%      |
  | Load Shedding      | < 3% avg  |

---