Chess GPT-4.5M

Overview

Chess GPT-4.5M is a generative language model trained specifically to generate chess moves and analyze chess games. The model is based on the GPT architecture and was trained with a custom 32-token vocabulary reflecting key chess symbols and notations.

Model Details

  • Architecture: GPT-based language model (GPT2LMHeadModel)
  • Parameters: Approximately 4.5M parameters
  • Layers: 8 transformer layers
  • Heads: 4 attention heads per layer
  • Embedding Dimension: 256
  • Training Sequence Length: 1024 tokens per chess game
  • Vocabulary: 32 tokens (custom vocabulary)

Training Data

The model was trained on tokenized chess game data prepared from the Lichess dataset. The preparation process involved:

  • Tokenizing chess games using a custom 32-token vocabulary.
  • Creating binary training files (train.bin and val.bin).
  • Saving vocabulary information to meta.pkl.

Training Configuration

The training configuration, found in config/mac_chess_gpt.py, includes:

  • Dataset: lichess_hf_dataset
  • Batch Size: 2 (optimized for Mac's memory constraints)
  • Block Size: 1023 (1024 including the positional embedding)
  • Learning Rate: 3e-4
  • Max Iterations: 140,000
  • Device: 'mps' (Mac-specific settings)
  • Other Settings: No dropout and compile set to False for Mac compatibility

How to Use

Generating Chess Moves

After fine-tuning, use the generation script to sample chess moves. Example commands: bash Sample from the model without a provided prompt: python sample.py --out_dir=out-chess-mac Generate a chess game sequence starting with a custom prompt: python sample.py --out_dir=out-chess-mac --start=";1.e4"

Loading the Model in Transformers

Once the model card and converted model files are pushed to the Hugging Face Hub, you can load the model using:

python from transformers import GPT2LMHeadModel, GPT2Tokenizer model = GPT2LMHeadModel.from_pretrained("your-hf-username/chess-gpt-4.5M") tokenizer = GPT2Tokenizer.from_pretrained("your-hf-username/chess-gpt-4.5M")

Note: The tokenizer uses a custom vocabulary provided in vocab.json.

Intended Use

The model is intended for:

  • Generating chess move sequences.
  • Assisting in automated chess analysis.
  • Educational purposes in understanding language model training on specialized domains.

Limitations

  • The model is trained on a relatively small (4.5M parameter) architecture and may not capture extremely complex chess strategies.
  • It is specialized on chess move generation and may not generalize to standard language tasks.

Training Process Summary

  1. Data Preparation: Tokenized the Lichess chess game dataset using a 32-token vocabulary.
  2. Model Training: Used custom training configurations specified in config/mac_chess_gpt.py.
  3. Model Conversion: Converted added checkpoint from out-chess-mac/ckpt.pt into a Hugging Face compatible format with convert_to_hf.py.
  4. Repository Setup: Pushed the converted model files (including custom tokenizer vocab) to the Hugging Face Hub with Git LFS handling large files.

Acknowledgements

This model was developed following inspiration from GPT-2 and adapted for the chess domain.


Downloads last month
21
Safetensors
Model size
6.59M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for derickio/chess-gpt-4.5M

Finetuned
(1377)
this model