E-Model-Reasoner-Math-V1

This is a fine-tuned version of Qwen3-0.6B specialized for mathematical reasoning tasks, trained on the NVIDIA OpenMathReasoning dataset. The model incorporates advanced reasoning capabilities with a "thinking" mechanism to provide step-by-step mathematical problem solving.

Model Details

Model Description

E-Model-Reasoner-Math-V1 is a mathematical reasoning model built upon Qwen/Qwen3-0.6B architecture. It has been specifically fine-tuned to excel at mathematical problem-solving tasks, featuring an integrated thinking process that allows users to see the model's reasoning steps before arriving at the final answer. This transparency makes it particularly valuable for educational applications and mathematical tutoring.

Developed by: ErenalpCet
Model type: Causal Language Model (Fine-tuned for Mathematical Reasoning)
Language(s) (NLP): English
License: MIT
Finetuned from model: Qwen/Qwen3-0.6B
Model ID: ErenalpCet/E-Model-Reasoner-Math-V1

Model Sources

Repository: https://huggingface.co/ErenalpCet/E-Model-Reasoner-Math-V1
Base Model: https://huggingface.co/Qwen/Qwen3-0.6B
Training Dataset: https://huggingface.co/datasets/nvidia/OpenMathReasoning

Uses

Direct Use

E-Model-Reasoner-Math-V1 is designed for direct mathematical problem-solving applications. It excels at:

Solving algebraic equations and inequalities
Arithmetic calculations with detailed explanations
Mathematical word problems
Step-by-step problem breakdown and reasoning
Educational math assistance with transparent thinking process

The model's thinking mechanism allows users to understand not just the answer, but the complete reasoning process, making it ideal for learning environments.

Downstream Use

This model can be integrated into various applications including:

Educational platforms and tutoring systems
Math homework assistance tools
Interactive learning applications
Mathematical reasoning benchmarks
Research tools for mathematical problem-solving analysis

Out-of-Scope Use

This model is specifically optimized for mathematical reasoning and may not perform optimally for:

Non-mathematical domain questions
Creative writing or storytelling
Code generation outside of mathematical contexts

Bias, Risks, and Limitations

The model inherits potential biases from both its base model (Qwen3-0.6B) and the OpenMathReasoning training dataset. Key considerations include:

Mathematical problem types may be skewed toward certain domains represented in the training data
Performance may vary across different mathematical complexity levels
Cultural or linguistic biases may affect word problem interpretation
The model should not be used as the sole source for critical mathematical calculations

Recommendations

Users should:

Verify important mathematical results independently
Use the model as an educational aid rather than a definitive mathematical authority
Be aware that the model's reasoning process, while helpful, may not always reflect optimal problem-solving approaches
Test the model's performance on their specific use cases before deployment

How to Get Started with the Model

Use the code below to get started with E-Model-Reasoner-Math-V1:

import os

os.environ["HF_HOME"]               = os.environ.get("HF_HOME", "E:\\hf_home")
os.environ["TRANSFORMERS_CACHE"]    = os.environ.get("TRANSFORMERS_CACHE", "E:\\cache\\transformers")
os.environ["HF_HUB_CACHE"]          = os.environ.get("HF_HUB_CACHE", "E:\\cache\\hub")
os.environ["HF_DATASETS_CACHE"]     = os.environ.get("HF_DATASETS_CACHE", "E:\\cache\\datasets")
os.environ["HF_METRICS_CACHE"]      = os.environ.get("HF_METRICS_CACHE", "E:\\cache\\metrics")
os.environ["HF_MODULES_CACHE"]      = os.environ.get("HF_MODULES_CACHE", "E:\\cache\\modules")
os.environ["TOKENIZERS_CACHE"]      = os.environ.get("TOKENIZERS_CACHE", "E:\\cache\\tokenizers")
os.environ["TORCH_EXTENSIONS_DIR"]  = os.environ.get("TORCH_EXTENSIONS_DIR", "E:\\cache\\torch_extensions")
cache_dirs = [
    os.environ["HF_HOME"], os.environ["TRANSFORMERS_CACHE"], os.environ["HF_HUB_CACHE"],
    os.environ["HF_DATASETS_CACHE"], os.environ["HF_METRICS_CACHE"], os.environ["HF_MODULES_CACHE"],
    os.environ["TOKENIZERS_CACHE"], os.environ["TORCH_EXTENSIONS_DIR"],
]
for d in cache_dirs:
    os.makedirs(d, exist_ok=True)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import gc

MODEL_DIR = "E:\\qwen3-math-reasoning-final"
USE_SPECIFIC_GPU = True
GPU_ID = "0"

def setup_device():
    if USE_SPECIFIC_GPU and torch.cuda.is_available():
        os.environ["CUDA_VISIBLE_DEVICES"] = GPU_ID
        device = torch.device(f"cuda:{GPU_ID}")
        print(f"Using specific GPU: {torch.cuda.get_device_name(device)}")
        print(f"Available VRAM: {torch.cuda.get_device_properties(device).total_memory / 1e9:.2f} GB")
    elif torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"Using default CUDA device: {torch.cuda.get_device_name(0)}")
        print(f"Available VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    else:
        device = torch.device("cpu")
        print("CUDA not available. Using CPU.")
    return device

def load_model_and_tokenizer(model_dir, device):
    best_checkpoint = model_dir
    print(f"\\nLoading model and tokenizer from: {best_checkpoint}...")
    try:
        tokenizer = AutoTokenizer.from_pretrained(best_checkpoint, trust_remote_code=True)
        if tokenizer.pad_token is None:
            print("Warning: Tokenizer does not have a pad token. Setting pad_token = eos_token.")
            tokenizer.pad_token = tokenizer.eos_token
        print(f"Tokenizer loaded. Pad token: '{tokenizer.pad_token}' (ID: {tokenizer.pad_token_id})")
        
        model_config = AutoModelForCausalLM.from_pretrained(best_checkpoint, trust_remote_code=True).config
        model_config.use_cache = True

        model = AutoModelForCausalLM.from_pretrained(
            best_checkpoint,
            torch_dtype=torch.bfloat16, 
            trust_remote_code=True,
            config=model_config,
            device_map="auto"
        )
        if model.config.pad_token_id is None:
            model.config.pad_token_id = tokenizer.pad_token_id
        print("Model loaded successfully.")
        print(f"Model is on device: {model.device}")
        return model, tokenizer
    except Exception as e:
        print(f"Error loading model or tokenizer: {e}")
        raise

def generate_response(model, tokenizer, user_prompt, device):
    try:
        messages = [
            {"role": "user", "content": user_prompt}
        ]
        text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True,
            enable_thinking=True
        )
        model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
        
        # Initialize streamer
        streamer = TextStreamer(tokenizer, skip_special_tokens=False)
        
        # Generate with streamer
        print("\nStreaming response:")
        print("-" * 50)
        
        generated_ids = model.generate(
            **model_inputs,
            temperature=0.6,
            top_p=0.95,
            top_k=20,
            min_p=0,
            max_new_tokens=32768,
            streamer=streamer
        )
        
        # Removed parameters for better performance.

        # Process the full output
        output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
        
        # Find thinking content and regular content
        thinking_content = ""
        content = ""
        
        # Find </think> token ID
        think_token_id = 151668
        try:
            # Find index of </think> token
            index = output_ids.index(think_token_id) + 1
        except ValueError:
            index = 0
            
        print("\n\nParsing response...")
        
        if index > 0:
            thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
            
        content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
        
        return thinking_content, content
        
    except Exception as e:
        print(f"Error during response generation: {e}")
        return "Error in generation.", "Sorry, I encountered an error while generating the response."

def chat_loop(model, tokenizer, device):
    print("\\n--- Chatbot CLI Started ---")
    print("Type your mathematical problem or 'quit'/'exit'/'q' to end.")
    while True:
        user_input = input("\\nYou: ")
        if user_input.lower() in ["quit", "exit", "q"]:
            print("Exiting chatbot. Goodbye!")
            break
        if not user_input.strip():
            print("Please enter a problem.")
            continue
        
        print("Bot is thinking...")
        thinking_response, final_response = generate_response(model, tokenizer, user_input, device)
        
        if thinking_response:
            print("\\nThinking content:")
            print("-" * 50)
            print(thinking_response)
            print("-" * 50)

        print("\\nBot:")
        print("-" * 50)
        print(final_response)
        print("-" * 50)
        
        if device.type == "cuda":
            gc.collect()
            torch.cuda.empty_cache()

if __name__ == "__main__":
    selected_device = None
    model_instance = None
    tokenizer_instance = None
    try:
        selected_device = setup_device()
        if not os.path.exists(MODEL_DIR):
            print(f"Error: Model directory not found: {MODEL_DIR}")
            print("Please ensure the MODEL_DIR variable is set correctly.")
        else:
            model_instance, tokenizer_instance = load_model_and_tokenizer(MODEL_DIR, selected_device)
            chat_loop(model_instance, tokenizer_instance, selected_device)
    except RuntimeError as e:
        print(f"A runtime error occurred: {e}")
        if "CUDA out of memory" in str(e):
            print("Consider reducing batch size if this were training, or model size for inference.")
            print("For inference, ensure your GPU has enough VRAM for the model (Qwen3-0.6B needs a few GBs).")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
    finally:
        print("\\nCleaning up...")
        del model_instance
        del tokenizer_instance
        gc.collect()
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        print("Cleanup complete. Exiting application.")

Training Details

Training Data

The model was fine-tuned on the NVIDIA OpenMathReasoning dataset, which contains a comprehensive collection of mathematical problems paired with detailed step-by-step solutions. This dataset covers various mathematical domains including algebra, arithmetic, geometry, and word problems.

Training Procedure

The fine-tuning process enhanced the base Qwen3-0.6B model's mathematical reasoning capabilities while preserving its general language understanding abilities.

Preprocessing

The training data was preprocessed to incorporate the thinking mechanism, allowing the model to generate internal reasoning steps before providing final answers.

Training Hyperparameters

Training regime: bfloat16 mixed precision
Base architecture: Qwen3-0.6B transformer
Optimization: Fine-tuning with mathematical reasoning focus
Special features: Thinking token integration (token ID: 151668)

Speeds, Sizes, Times

Model parameters: ~600M (0.6B)
Inference memory: 2–4 GB VRAM recommended for optimal performance
Processing: Supports streaming generation for real-time responses

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model should be evaluated on standard mathematical reasoning benchmarks including GSM8K, MATH dataset, and other mathematical problem-solving evaluation sets.

Factors

Evaluation considers:

Problem complexity levels (elementary to advanced)
Mathematical domain coverage (algebra, arithmetic, geometry, etc.)
Reasoning clarity and correctness
Step-by-step solution quality

Model Examination

The model's thinking mechanism provides interpretability by exposing the reasoning process. This feature allows users to:

Understand the model's problem-solving approach
Identify potential errors in reasoning
Learn from the step-by-step methodology
Verify the logical flow of solutions

Technical Specifications

Model Architecture and Objective

Architecture: Transformer-based causal language model
Parameters: ~600 million
Precision: bfloat16 for optimal performance/memory balance
Context Length: Extended context support up to 32,768 tokens
Special Tokens: Custom thinking token mechanism for reasoning transparency

Compute Infrastructure

Hardware

Minimum Requirements: 4GB VRAM for basic inference
Recommended: 8GB+ VRAM for optimal performance
CPU Fallback: Supported but significantly slower
Multi-GPU: Automatic device mapping supported

Software

Framework: PyTorch with Transformers library
Python Version: 3.8+
Key Dependencies:
- torch >= 1.12.0
- transformers >= 4.21.0
- CUDA toolkit (for GPU acceleration)

Citation

BibTeX

@misc{e-model-reasoner-math-v1,
  title={E-Model-Reasoner-Math-V1: A Fine-tuned Mathematical Reasoning Model},
  author={ErenalpCet},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/ErenalpCet/E-Model-Reasoner-Math-V1 }},
  note={Fine-tuned from Qwen3-0.6B on OpenMathReasoning dataset}
}

APA

ErenalpCet. (2024). E-Model-Reasoner-Math-V1: A Fine-tuned Mathematical Reasoning Model. Hugging Face. https://huggingface.co/ErenalpCet/E-Model-Reasoner-Math-V1

Glossary

Thinking Token: Special token (ID: 151668) that separates the model's internal reasoning from its final answer
Mathematical Reasoning: The process of logical thinking applied to solve mathematical problems
Fine-tuning: Process of adapting a pre-trained model to a specific task or domain
bfloat16: Brain floating-point format that provides memory efficiency while maintaining training stability

More Information

For additional technical details, usage examples, and community discussions, visit the model repository at https://huggingface.co/ErenalpCet/E-Model-Reasoner-Math-V1 .

For questions about mathematical reasoning capabilities or specific use cases, please refer to the model's discussion section or create an issue in the repository.

Model Card Authors

ErenalpCet

Model Card Contact

For questions, feedback, or collaboration opportunities regarding E-Model-Reasoner-Math-V1, please contact through the Hugging Face platform or the model repository's discussion section.

ErenalpCet
/

E-Model-Reasoner-Math-V1

You need to agree to share your contact information to access this model