|
--- |
|
tags: |
|
- stable-baselines3 |
|
- power-grid |
|
- ppo |
|
- lstm |
|
- electricity |
|
- reinforcement-learning |
|
- forecasting |
|
- tensorflow |
|
- gym |
|
license: mit |
|
--- |
|
|
|
# ⚡ Power Grid Optimization with LSTM + PPO |
|
|
|
This repository showcases a hybrid deep learning + reinforcement learning system for power grid optimization in Lauderdale County, AL. The system forecasts demand using a weather-informed LSTM model and trains a PPO-based agent to maintain stability and minimize blackout risk under stress. |
|
|
|
--- |
|
|
|
## 📈 Models |
|
|
|
- **LSTM Demand Predictor** |
|
A deep bidirectional LSTM with attention, trained on 4 years of TVA and weather data. |
|
|
|
- **PPO Grid Policy** |
|
Trained in a custom `PowerGridEnv` with generator output, transformer tap, and load shedding control. |
|
|
|
--- |
|
|
|
## 🧠 Dataset Overview |
|
|
|
- **Demand Data:** |
|
Sourced from the U.S. EIA (TVA region, 2021–2024) |
|
- Demand, Net Generation, Day-Ahead Forecasts, Interchange |
|
|
|
- **Weather Data:** |
|
Daily min/max temperatures + precipitation |
|
- From 5 major TVA-region airports via NOAA |
|
|
|
--- |
|
|
|
## 🧮 LSTM Model |
|
|
|
- **Architecture:** |
|
2-layer bidirectional LSTM + attention, followed by global pooling and dense layers. |
|
|
|
- **Key Features:** |
|
- Rolling temperature windows, demand lags |
|
- Weekly mean demand, change rate |
|
- Temp volatility, extreme flags |
|
|
|
- **Metrics:** |
|
| Metric | Value | |
|
|---------------|--------------------| |
|
| R² | 0.911 | |
|
| RMSE | 19,565 MWh | |
|
| Mean Error | 713 MWh (overbias) | |
|
| Beats TVA Forecast | 70.08% of days | |
|
|
|
--- |
|
|
|
## 🤖 PPO DRL Agent |
|
|
|
- **Environment:** |
|
PyPSA-based Lauderdale County grid |
|
- 6 generators (Nuclear, Hydro, CCGT) |
|
- Load centers with realistic demand shares |
|
- Thermal constraints, ramp limits, marginal costs |
|
|
|
- **Action Space:** |
|
- Generator control |
|
- Transformer tap shift |
|
- Load shedding (up to 20%) |
|
|
|
- **Reward Design:** |
|
✅ Balance demand/supply, low thermal overload |
|
❌ Penalize instability, overloads, excessive cost |
|
|
|
- **Training:** |
|
- Algorithm: PPO (SB3) |
|
- Timesteps: 400,000 |
|
- VecNormalize, 5 eval episodes per 2048 steps |
|
|
|
- **Metrics:** |
|
| Metric | Value | |
|
|--------------------|-----------| |
|
| Mean Reward | ~1480 | |
|
| Explained Variance | Up to 0.85 | |
|
| Blackout Risk | < 5% | |
|
| Load Shedding | < 3% avg | |
|
|
|
--- |