Uploaded model

  • Developed by: jayasuryajsk
  • License: apache-2.0
  • Finetuned from model : unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit

Chess Reasoner Model

This model is a fine-tuned version of {base_model_name} trained to play chess using Unsloth. It reasons okay in math.

Model Description

  • Base Model: {base_model_name}
  • Training: Fine-tuned using GRPO (Generative Reinforcement Learning with Proximal Optimization)
  • Task: Chess playing with reasoning

Usage

The model expects input in this format:

text = tokenizer.apply_chat_template([
    {{"role": "system", "content": SYSTEM_PROMPT}},
    {{"role": "user", "content": f"Analyze this chess position: {{board.fen()}}"}}
], tokenize=False)

The model will respond with:

<think>
[Analysis of the position]
</think>
<move>
[UCI format move e.g. e2e4]
</move>

Examples

[Add some example inputs and outputs here]

Limitations

  • The model might occasionally suggest illegal moves
  • Position evaluation might not be as strong as dedicated chess engines
  • Reasoning can sometimes include incorrect assessments

Training Details

  • Trained using Stockfish for move evaluation

It reasons surprisingly well in other domains too. Here are some examples.

  1. Question : Little's law can be applied to any part of the store, such as a particular department or the checkout lines. The store owner determines that, during business hours, approximately 84 shoppers per hour make a purchase and each of these shoppers spend an average of 5 minutes in the checkout line. At any time during business hours, about how many shoppers, on average, are waiting in the checkout line to make a purchase at the Good Deals Store?

    Response :

    Reasoning

    To solve this problem using Little's Law, we first need to understand that Little's Law states that the average number of items in a system (L) is equal to the average arrival rate (λ) multiplied by the average time spent in the system (W). Mathematically, it can be expressed as:

    L = lambda x W

    Where:

    L is the average number of shoppers in the checkout line, lambda is the average arrival rate (shoppers per unit time), W is the average time spent in the checkout line (units of time).

    In this scenario:

    The average arrival rate lambda is 84 shoppers per hour. The average time spent in the checkout line W is 5 minutes, which we need to convert to hours to match the units. Since there are 60 minutes in an hour, 5 minutes is (\frac{5}{60} = \frac{1}{12}) hours. Now, we can calculate the average number of shoppers in the checkout line (( L )):

    L = lambda x W = 84 x 1/ 12 = 7

    Thus, on average, there are 7 shoppers waiting in the checkout line.

    Answer:7

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
12
Safetensors
Model size
3.09B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support