agent-2048-game-qwen-7b-2k-ds

This model is a specialized game-playing AI trained to master the 2048 puzzle game using advanced reinforcement learning techniques. Based on the Qwen-7B architecture, it demonstrates sophisticated strategic planning and spatial reasoning capabilities.

Model Description

  • Base Model: Qwen-7B-Instruct
  • Training Approach: Group Relative Policy Optimization (GRPO)
  • Training Dataset: 2,000 carefully curated game states
  • Hardware Used: Single RTX 4090 (24GB)
  • Training Time: ~10 hours
  • Framework: Implemented using trl library and accelerated by Unsloth

Training Configuration

  • Learning Rate: 4e-5 (optimized after extensive testing)
  • LoRA Rank: 16
  • Max Sequence Length: 1000 tokens
  • Batch Size: 1 (with gradient accumulation steps of 4)
  • Optimizer: paged_adamw_8bit

Intended Use

This model is designed to play the 2048 game by:

  1. Analyzing the current board state
  2. Planning strategic moves
  3. Maximizing score and achieving high-value tiles
  4. Maintaining efficient board organization

Training Data

The training data was generated through a sophisticated pipeline:

  • Simulated gameplay for realistic board states
  • Custom difficulty scoring system
  • 5-level difficulty classification
  • Balanced sampling across difficulty levels
  • Parallel processing for efficient generation

Training Approach

Reward System

The model was trained using multiple reward components:

  1. Density Reward: Encourages efficient tile merging and space utilization
  2. Highest Tile Reward: Incentivizes creation of high-value tiles
  3. Survival Reward: Promotes moves that maintain game continuity
  4. Format Compliance: Ensures proper response structure

Optimization

  • Utilized Unsloth for 2x faster fine-tuning
  • 4-bit quantization for efficient training
  • Implemented efficient LoRA adaptation

Performance and Limitations

Strengths

  • Strong strategic planning capabilities
  • Efficient tile merging and space management
  • Consistent high-score achievement
  • Structured decision-making process

Limitations

  • Performance may vary with random seeds
  • Success not guaranteed due to game's inherent randomness
  • Model requires specific input formatting

Example Usage

# Format your 4x4 game board as a string
board_state = """
2 | 4 | 8 | 16
. | . | 2 | 4
. | . | . | 2
. | . | . | .
"""

# Model will output one of: up, down, left, right

Citation

@misc{dalal2024agent2048blog,
    author = {Dalal, Hrishbh},
    title = {Agent 2048: Forging Strategic Gameplay in an AI Through Data, Rewards, and RL},
    year = {2024},
    month = {March},
    url = {https://yourwebsite.com/blog/ai-agent-plays-2048},
    note = {[Blog post] Accessed: March 30, 2024}
}

Author

Hrishbh Dalal

Acknowledgments

Special thanks to the research community on Twitter/X for valuable feedback on data generation strategies and training approaches.

License

This model is released under the Apache 2.0 license.

Downloads last month
32
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Video Preview
loading