πŸš€ Stock Trading RL Agent - Advanced PPO Implementation

Python Stable-Baselines3 License Status

A state-of-the-art reinforcement learning agent for algorithmic stock trading using Proximal Policy Optimization (PPO)

πŸ”₯ Quick Start β€’ πŸ“Š Performance β€’ πŸ’‘ Usage β€’ πŸ› οΈ Technical Details

πŸ“ˆ Model Overview

This model represents a sophisticated reinforcement learning trading agent trained using the Proximal Policy Optimization (PPO) algorithm. The agent learns to make optimal trading decisions across multiple stocks by analyzing technical indicators, market patterns, and portfolio states.

🎯 Key Highlights

  • 🧠 Algorithm: PPO with Multi-Layer Perceptron policy
  • πŸ’° Action Space: Hybrid continuous/discrete (Action Type + Position Sizing)
  • πŸ“Š Observation Space: 60-day lookback window with technical indicators
  • πŸ† Training: 500,000 timesteps across 5 major stocks
  • ⚑ Performance: Up to 7,243% returns with risk management

πŸš€ Quick Start

Installation

pip install stable-baselines3 yfinance pandas numpy scikit-learn

For data preparation, you can use Enhanced Enviroment and Stock data processor automated classes for data and enviroment preparation in python files provided in directory

Load and Use the Model

from stable_baselines3 import PPO
import pickle
import numpy as np

# Load the trained model
model = PPO.load("best_model.zip")

# Load the data scaler
with open("scaler.pkl", "rb") as f:
    scaler = pickle.load(f)

# Example prediction
obs = your_observation_data  # Shape: (n_features,)
action, _states = model.predict(obs, deterministic=True)

# Interpret action
action_type = int(action[0])  # 0: Hold, 1: Buy, 2: Sell
position_size = action[1]     # 0-1: Fraction of available capital

πŸ“Š Performance Metrics

πŸ“ˆ Evaluation Results

Stock Total Return Sharpe Ratio Max Drawdown Win Rate Status
MSFT 7,243.44% 0.56 164.60% 52.11% πŸ† Best Overall
AMZN 162.87% 0.74 187.11% 6.72% πŸ† Best Risk-Adj.
TSLA 109.91% -0.22 145.29% 44.76% ⚑ Volatile
AAPL -74.02% 0.65 157.07% 7.01% ⚠️ Underperform
GOOGL 0.00% 0.00 0.00% 0.00% πŸ”„ No Activity

🎯 Key Performance Indicators

  • πŸ“Š Maximum Return: 7,243.44% (MSFT)
  • βš–οΈ Best Risk-Adjusted Return: 0.74 Sharpe Ratio (AMZN)
  • 🎯 Highest Win Rate: 52.11% (MSFT)
  • πŸ“‰ Lowest Drawdown: 145.29% (TSLA)
  • πŸ’Ό Portfolio Coverage: 5 major stocks

πŸ› οΈ Technical Details

πŸ”§ Model Architecture

Algorithm: PPO (Proximal Policy Optimization)
Policy Network: Multi-Layer Perceptron
Action Space: 
  - Action Type: Discrete(3) [Hold, Buy, Sell]
  - Position Size: Continuous[0,1]
Observation Space: Technical indicators + Portfolio state
Training Steps: 500,000
Batch Size: 64
Learning Rate: 0.0003

πŸ“Š Data Configuration

{
  "tickers": ["AAPL", "MSFT", "GOOGL", "AMZN", "TSLA"],
  "period": "5y",
  "interval": "1d",
  "use_sp500": false,
  "lookback_window": 60
}

🌊 Environment Setup

{
  "initial_balance": 10000,
  "transaction_cost": 0.001,
  "max_position_size": 1.0,
  "reward_type": "return",
  "risk_adjustment": true
}

πŸŽ“ Training Configuration

{
  "algorithm": "PPO",
  "total_timesteps": 500000,
  "learning_rate": 0.0003,
  "batch_size": 64,
  "n_epochs": 10,
  "gamma": 0.99,
  "eval_freq": 1000,
  "n_eval_episodes": 5,
  "save_freq": 10000,
  "seed": 42
}

πŸ“‹ State Space & Features

πŸ“Š Technical Indicators

The agent observes the following features for each stock:

  • πŸ“ˆ Trend Indicators: SMA (20, 50), EMA (12, 26)
  • πŸ“Š Momentum: RSI, MACD, MACD Signal, MACD Histogram
  • 🎯 Volatility: Bollinger Bands (Upper, Lower, %B)
  • πŸ’Ή Price/Volume: Open, High, Low, Close, Volume
  • πŸ’° Portfolio State: Balance, Position, Net Worth, Returns

πŸ”„ Action Space

The agent outputs a 2-dimensional action:

  1. Action Type (Discrete):

    • 0: Hold position
    • 1: Buy signal
    • 2: Sell signal
  2. Position Size (Continuous):

    • Range: [0, 1]
    • Represents fraction of available capital to use

🎯 Usage Examples

πŸ“ˆ Basic Trading Loop

import yfinance as yf
import pandas as pd
from stable_baselines3 import PPO

# Load model and scaler
model = PPO.load("best_model.zip")
with open("scaler.pkl", "rb") as f:
    scaler = pickle.load(f)

# Get live data
ticker = "AAPL"
data = yf.download(ticker, period="3mo", interval="1d")

# Prepare observation (implement your feature engineering)
obs = prepare_observation(data, scaler)  # Your preprocessing function

# Get trading decision
action, _states = model.predict(obs, deterministic=True)
action_type = ["HOLD", "BUY", "SELL"][int(action[0])]
position_size = action[1]

print(f"Action: {action_type}, Size: {position_size:.2%}")

πŸ”„ Backtesting Framework

def backtest_strategy(model, data, initial_balance=10000):
    """
    Backtest the trained model on historical data
    """
    balance = initial_balance
    position = 0
    
    for i in range(len(data)):
        obs = prepare_observation(data[:i+1])
        action, _ = model.predict(obs, deterministic=True)
        
        # Execute trading logic
        action_type = int(action[0])
        position_size = action[1]
        
        if action_type == 1:  # Buy
            shares_to_buy = (balance * position_size) // data.iloc[i]['Close']
            position += shares_to_buy
            balance -= shares_to_buy * data.iloc[i]['Close']
        elif action_type == 2:  # Sell
            shares_to_sell = position * position_size
            position -= shares_to_sell
            balance += shares_to_sell * data.iloc[i]['Close']
    
    return balance + position * data.iloc[-1]['Close']

πŸ“ Model Files

File Description Size
best_model.zip πŸ† Best performing model checkpoint ~2.5MB
final_model.zip 🎯 Final trained model ~2.5MB
scaler.pkl πŸ”§ Data preprocessing scaler ~50KB
config.json βš™οΈ Complete training configuration ~5KB
evaluation_results.json πŸ“Š Detailed evaluation metrics ~10KB
training_summary.json πŸ“ˆ Training statistics ~8KB

πŸŽ“ Training Details

πŸ”„ Training Process

  • 🎯 Evaluation Frequency: Every 1,000 steps
  • πŸ’Ύ Checkpoint Saving: Every 10,000 steps
  • 🎲 Random Seed: 42 (reproducible results)
  • ⏱️ Training Time: ~6 hours on modern GPU
  • πŸ“Š Convergence: Achieved after ~400,000 steps

πŸ“ˆ Performance During Training

The model showed consistent improvement during training:

  • Early Stage (0-100k steps): Learning basic market patterns
  • Mid Stage (100k-300k steps): Developing risk management
  • Late Stage (300k-500k steps): Fine-tuning position sizing

⚠️ Important Disclaimers

🚨 Risk Warning: This model is for educational and research purposes only. Past performance does not guarantee future results. Cryptocurrency and stock trading involves substantial risk of loss.

πŸ“Š Data Limitations: The model was trained on historical data from 2019-2024. Market conditions may change, affecting model performance.

πŸ”§ Technical Limitations: The model requires proper preprocessing and feature engineering to work effectively in live trading environments.

πŸš€ Advanced Usage

🎯 Custom Environment Integration

# Create custom trading environment
from stable_baselines3.common.env_checker import check_env
from your_trading_env import StockTradingEnv

env = StockTradingEnv(
    tickers=["AAPL", "MSFT", "GOOGL"],
    initial_balance=10000,
    transaction_cost=0.001
)

# Verify environment
check_env(env)

# Load and test model
model = PPO.load("best_model.zip")
obs = env.reset()
action, _states = model.predict(obs)

πŸ“Š Real-time Trading Integration

import asyncio
import websocket

async def live_trading_loop():
    """
    Example live trading implementation
    """
    while True:
        # Get real-time market data
        market_data = await get_market_data()
        
        # Prepare observation
        obs = prepare_observation(market_data)
        
        # Get model prediction
        action, _ = model.predict(obs)
        
        # Execute trade (implement your broker API)
        if int(action[0]) != 0:  # Not hold
            await execute_trade(action)
        
        await asyncio.sleep(60)  # Wait 1 minute

🀝 Contributing

We welcome contributions! Please feel free to:

  • πŸ› Report bugs and issues
  • πŸ’‘ Suggest new features
  • πŸ“ Improve documentation
  • πŸ”§ Submit pull requests

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ”— Links & Resources

πŸ“Š Citation

If you use this model in your research, please cite:

@misc{stock-trading-rl-2025,
  title={Stock Trading RL Agent using PPO},
  author={Adilbai},
  year={2025},
  url={https://huggingface.co/Adilbai/stock-trading-rl-20250704-171446}
}

πŸš€ Ready to revolutionize your trading strategy?

Get Started β€’ View Performance β€’ Technical Details

Generated on: 2025-07-04 17:14:46 UTC

Downloads last month
183
Video Preview
loading

Evaluation results

  • Best Total Return (AMZN) on FAANG Stocks (5Y Historical Data)
    self-reported
    162.870
  • Best Sharpe Ratio (AMZN) on FAANG Stocks (5Y Historical Data)
    self-reported
    0.740
  • Best Max Drawdown (TSLA) on FAANG Stocks (5Y Historical Data)
    self-reported
    145.290
  • Best Win Rate (MSFT) on FAANG Stocks (5Y Historical Data)
    self-reported
    52.110