π Stock Trading RL Agent - Advanced PPO Implementation
A state-of-the-art reinforcement learning agent for algorithmic stock trading using Proximal Policy Optimization (PPO)
π₯ Quick Start β’ π Performance β’ π‘ Usage β’ π οΈ Technical Details
π Model Overview
This model represents a sophisticated reinforcement learning trading agent trained using the Proximal Policy Optimization (PPO) algorithm. The agent learns to make optimal trading decisions across multiple stocks by analyzing technical indicators, market patterns, and portfolio states.
π― Key Highlights
- π§ Algorithm: PPO with Multi-Layer Perceptron policy
- π° Action Space: Hybrid continuous/discrete (Action Type + Position Sizing)
- π Observation Space: 60-day lookback window with technical indicators
- π Training: 500,000 timesteps across 5 major stocks
- β‘ Performance: Up to 7,243% returns with risk management
π Quick Start
Installation
pip install stable-baselines3 yfinance pandas numpy scikit-learn
For data preparation, you can use Enhanced Enviroment and Stock data processor automated classes for data and enviroment preparation in python files provided in directory
Load and Use the Model
from stable_baselines3 import PPO
import pickle
import numpy as np
# Load the trained model
model = PPO.load("best_model.zip")
# Load the data scaler
with open("scaler.pkl", "rb") as f:
scaler = pickle.load(f)
# Example prediction
obs = your_observation_data # Shape: (n_features,)
action, _states = model.predict(obs, deterministic=True)
# Interpret action
action_type = int(action[0]) # 0: Hold, 1: Buy, 2: Sell
position_size = action[1] # 0-1: Fraction of available capital
π Performance Metrics
π Evaluation Results
Stock | Total Return | Sharpe Ratio | Max Drawdown | Win Rate | Status |
---|---|---|---|---|---|
MSFT | 7,243.44% | 0.56 | 164.60% | 52.11% | π Best Overall |
AMZN | 162.87% | 0.74 | 187.11% | 6.72% | π Best Risk-Adj. |
TSLA | 109.91% | -0.22 | 145.29% | 44.76% | β‘ Volatile |
AAPL | -74.02% | 0.65 | 157.07% | 7.01% | β οΈ Underperform |
GOOGL | 0.00% | 0.00 | 0.00% | 0.00% | π No Activity |
π― Key Performance Indicators
- π Maximum Return: 7,243.44% (MSFT)
- βοΈ Best Risk-Adjusted Return: 0.74 Sharpe Ratio (AMZN)
- π― Highest Win Rate: 52.11% (MSFT)
- π Lowest Drawdown: 145.29% (TSLA)
- πΌ Portfolio Coverage: 5 major stocks
π οΈ Technical Details
π§ Model Architecture
Algorithm: PPO (Proximal Policy Optimization)
Policy Network: Multi-Layer Perceptron
Action Space:
- Action Type: Discrete(3) [Hold, Buy, Sell]
- Position Size: Continuous[0,1]
Observation Space: Technical indicators + Portfolio state
Training Steps: 500,000
Batch Size: 64
Learning Rate: 0.0003
π Data Configuration
{
"tickers": ["AAPL", "MSFT", "GOOGL", "AMZN", "TSLA"],
"period": "5y",
"interval": "1d",
"use_sp500": false,
"lookback_window": 60
}
π Environment Setup
{
"initial_balance": 10000,
"transaction_cost": 0.001,
"max_position_size": 1.0,
"reward_type": "return",
"risk_adjustment": true
}
π Training Configuration
{
"algorithm": "PPO",
"total_timesteps": 500000,
"learning_rate": 0.0003,
"batch_size": 64,
"n_epochs": 10,
"gamma": 0.99,
"eval_freq": 1000,
"n_eval_episodes": 5,
"save_freq": 10000,
"seed": 42
}
π State Space & Features
π Technical Indicators
The agent observes the following features for each stock:
- π Trend Indicators: SMA (20, 50), EMA (12, 26)
- π Momentum: RSI, MACD, MACD Signal, MACD Histogram
- π― Volatility: Bollinger Bands (Upper, Lower, %B)
- πΉ Price/Volume: Open, High, Low, Close, Volume
- π° Portfolio State: Balance, Position, Net Worth, Returns
π Action Space
The agent outputs a 2-dimensional action:
Action Type (Discrete):
0
: Hold position1
: Buy signal2
: Sell signal
Position Size (Continuous):
- Range:
[0, 1]
- Represents fraction of available capital to use
- Range:
π― Usage Examples
π Basic Trading Loop
import yfinance as yf
import pandas as pd
from stable_baselines3 import PPO
# Load model and scaler
model = PPO.load("best_model.zip")
with open("scaler.pkl", "rb") as f:
scaler = pickle.load(f)
# Get live data
ticker = "AAPL"
data = yf.download(ticker, period="3mo", interval="1d")
# Prepare observation (implement your feature engineering)
obs = prepare_observation(data, scaler) # Your preprocessing function
# Get trading decision
action, _states = model.predict(obs, deterministic=True)
action_type = ["HOLD", "BUY", "SELL"][int(action[0])]
position_size = action[1]
print(f"Action: {action_type}, Size: {position_size:.2%}")
π Backtesting Framework
def backtest_strategy(model, data, initial_balance=10000):
"""
Backtest the trained model on historical data
"""
balance = initial_balance
position = 0
for i in range(len(data)):
obs = prepare_observation(data[:i+1])
action, _ = model.predict(obs, deterministic=True)
# Execute trading logic
action_type = int(action[0])
position_size = action[1]
if action_type == 1: # Buy
shares_to_buy = (balance * position_size) // data.iloc[i]['Close']
position += shares_to_buy
balance -= shares_to_buy * data.iloc[i]['Close']
elif action_type == 2: # Sell
shares_to_sell = position * position_size
position -= shares_to_sell
balance += shares_to_sell * data.iloc[i]['Close']
return balance + position * data.iloc[-1]['Close']
π Model Files
File | Description | Size |
---|---|---|
best_model.zip |
π Best performing model checkpoint | ~2.5MB |
final_model.zip |
π― Final trained model | ~2.5MB |
scaler.pkl |
π§ Data preprocessing scaler | ~50KB |
config.json |
βοΈ Complete training configuration | ~5KB |
evaluation_results.json |
π Detailed evaluation metrics | ~10KB |
training_summary.json |
π Training statistics | ~8KB |
π Training Details
π Training Process
- π― Evaluation Frequency: Every 1,000 steps
- πΎ Checkpoint Saving: Every 10,000 steps
- π² Random Seed: 42 (reproducible results)
- β±οΈ Training Time: ~6 hours on modern GPU
- π Convergence: Achieved after ~400,000 steps
π Performance During Training
The model showed consistent improvement during training:
- Early Stage (0-100k steps): Learning basic market patterns
- Mid Stage (100k-300k steps): Developing risk management
- Late Stage (300k-500k steps): Fine-tuning position sizing
β οΈ Important Disclaimers
π¨ Risk Warning: This model is for educational and research purposes only. Past performance does not guarantee future results. Cryptocurrency and stock trading involves substantial risk of loss.
π Data Limitations: The model was trained on historical data from 2019-2024. Market conditions may change, affecting model performance.
π§ Technical Limitations: The model requires proper preprocessing and feature engineering to work effectively in live trading environments.
π Advanced Usage
π― Custom Environment Integration
# Create custom trading environment
from stable_baselines3.common.env_checker import check_env
from your_trading_env import StockTradingEnv
env = StockTradingEnv(
tickers=["AAPL", "MSFT", "GOOGL"],
initial_balance=10000,
transaction_cost=0.001
)
# Verify environment
check_env(env)
# Load and test model
model = PPO.load("best_model.zip")
obs = env.reset()
action, _states = model.predict(obs)
π Real-time Trading Integration
import asyncio
import websocket
async def live_trading_loop():
"""
Example live trading implementation
"""
while True:
# Get real-time market data
market_data = await get_market_data()
# Prepare observation
obs = prepare_observation(market_data)
# Get model prediction
action, _ = model.predict(obs)
# Execute trade (implement your broker API)
if int(action[0]) != 0: # Not hold
await execute_trade(action)
await asyncio.sleep(60) # Wait 1 minute
π€ Contributing
We welcome contributions! Please feel free to:
- π Report bugs and issues
- π‘ Suggest new features
- π Improve documentation
- π§ Submit pull requests
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Links & Resources
- π Hugging Face Model: Adilbai/stock-trading-rl-20250704-171446
- π Stable-Baselines3: Documentation
- πΉ Yahoo Finance: API Documentation
- π PPO Paper: Proximal Policy Optimization
π Citation
If you use this model in your research, please cite:
@misc{stock-trading-rl-2025,
title={Stock Trading RL Agent using PPO},
author={Adilbai},
year={2025},
url={https://huggingface.co/Adilbai/stock-trading-rl-20250704-171446}
}
π Ready to revolutionize your trading strategy?
Get Started β’ View Performance β’ Technical Details
Generated on: 2025-07-04 17:14:46 UTC
- Downloads last month
- 183
Evaluation results
- Best Total Return (AMZN) on FAANG Stocks (5Y Historical Data)self-reported162.870
- Best Sharpe Ratio (AMZN) on FAANG Stocks (5Y Historical Data)self-reported0.740
- Best Max Drawdown (TSLA) on FAANG Stocks (5Y Historical Data)self-reported145.290
- Best Win Rate (MSFT) on FAANG Stocks (5Y Historical Data)self-reported52.110