Hebrew Math Tutor

Hebrew Math Tutor is a specialized mathematical reasoning model that provides step-by-step solutions to math problems in Hebrew. Built on Qwen3-4B-Thinking-2507, this model bridges the gap between advanced AI mathematical capabilities and Hebrew-language education.

🎯 Model ID: Intel/hebrew-math-tutor-v1
🤖 Demo: IntelLabs/hebrew-math-tutor
🏗️ Base Model: Qwen3-4B-Thinking-2507
🏛️ Architecture: Decoder-only causal language model (~4B parameters)
🗣️ Primary Language: Hebrew (retains multilingual capabilities)
📄 License: Apache-2.0

Model Description

Hebrew Math Tutor is a supervised fine-tune of Qwen3-4B-Thinking, specifically optimized to:

Provide detailed mathematical reasoning in Hebrew with clear step-by-step explanations
Maintain mathematical accuracy while adapting to Hebrew language patterns
Preserve multilingual capabilities for cross-language mathematical workflows
Support educational applications with natural Hebrew mathematical discourse

The model excels at translating complex mathematical concepts into clear, pedagogically sound Hebrew explanations while maintaining the computational precision of its base model.

Intended Use Cases

✅ Primary Applications

Educational Technology: Hebrew-language math tutoring systems and learning platforms.
Research Tools: Mathematical reasoning research in Hebrew educational contexts.
Prototype Development: Building Hebrew-first educational AI applications.
Accessibility: Providing advanced math AI assistance to Hebrew-speaking communities.

✅ Secondary Applications

Multilingual educational workflows requiring Hebrew mathematical explanations.
Cross-cultural mathematics education research.
Hebrew mathematical content generation for educational materials.

❌ Not Intended For

High-stakes assessments: Medical, legal, or financial decision-making.
Unsupervised grading: Certification or evaluation without human verification.
Production systems: Critical applications without proper validation and oversight.

Model Details

Specification	Details
Architecture	Decoder-only transformer (causal language model)
Parameters	~4 billion
Context Length	Inherited from Qwen3-4B-Thinking-2507
Tokenizer	Qwen3-compatible tokenizer with Hebrew support
Training Type	Supervised Fine-Tuning (Hebrew SFT)
Base Model	Qwen3-4B-Thinking-2507
Fine-tuning Focus	Mathematical reasoning in Hebrew

Training Details

Dataset

Source: ~10,000 selected problems from OpenMathReasoning.
Translation Approach: Automated high-quality translation using internal LLMs.
Language Adaptation: Questions and final answers translated to Hebrew; reasoning chains preserved.
Mathematical Notation: Equations and formal math notation kept intact.
Internal Reasoning: Model's <think>...</think> blocks intentionally remain in English (representing internal reasoning processes).

Training Configuration

Method: Supervised Fine-Tuning (Hebrew SFT)
Epochs: 3
Learning Rate: 5e-6
Warmup: 0.1
Scheduler: Cosine learning rate decay
Objective: Maintain mathematical accuracy while adapting output to Hebrew

Performance Evaluation

We evaluated Hebrew Math Tutor on three challenging mathematical benchmarks: MATH500, AIME24, and AIME25.

Evaluation Metrics

pass@16: Percentage of problems where at least one of 16 generated samples is correct.
maj@16: Majority-vote accuracy across 16 samples.
Hebrew Answers: Percentage of responses generated in Hebrew.

Hebrew Evaluation Results

Dataset	Metric	Base Model	Hebrew Math Tutor	Improvement
MATH500	pass@16	93%	95%	+2%
	maj@16	88%	90%	+2%
	Hebrew Answers	75%	100%	+25%
AIME24	pass@16	76.7%	80%	+3.3%
	maj@16	76.7%	76.7%	No change
	Hebrew Answers	35.2%	96.7%	+61.5%
AIME25	pass@16	80%	83.3%	+3.3%
	maj@16	70%	60%	-10%
	Hebrew Answers	36%	95.2%	+59.2%

English/Original Language Results

Dataset	Metric	Base Model	Hebrew Math Tutor	Change
MATH500	pass@16	99%	98%	-1%
	maj@16	98%	98%	No change
AIME24	pass@16	93.3%	90%	-3.3%
	maj@16	86.7%	86.7%	No change
AIME25	pass@16	83.3%	90%	+6.7%
	maj@16	73%	80%	+7%

Key Findings

🎯 Dramatic Language Improvement: Hebrew answer generation increased by 25-61.5% across all benchmarks, reaching 95-100% Hebrew output.

📈 Maintained Technical Performance: Consistent improvements in pass@16 on Hebrew evaluations while preserving competitive English performance.

🔍 Mixed Majority Vote Results: Strong performance on MATH500, stable on AIME24, with one notable decrease on AIME25 requiring further investigation.

✅ Preserved Core Capabilities: The fine-tuning successfully adapted language output without sacrificing fundamental mathematical reasoning abilities.

Usage

Quick Start

from transformers import pipeline

model = "Intel/hebrew-math-tutor-v1"
pipe = pipeline("text-generation", model)

messages = [
    {
        "role": "system",
        "content": """You are a helpful AI assistant specialized in mathematics and problem-solving who can answer math questions with the correct answer.
Answer shortly, not more than 500 tokens, but outline the process step by step.
Answer ONLY in Hebrew!""",
    },
    {"role": "user", "content": "מהו סכום הסדרה הבאה:  1 + 1/2 + 1/4 + 1/8 + ..."},
]

out = pipe(
    messages,
    return_full_text=False,
    max_new_tokens=1024,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
print(out[0]["generated_text"])

Recommended Parameters

Temperature: 0.6 (balanced creativity and accuracy)
Top-p: 0.95 (diverse but focused sampling)
Top-k: 20 (controlled vocabulary selection)
Max tokens: 500-1024 (sufficient for detailed explanations)

Best Practices

Request explicit structure: Ask for step-by-step reasoning and clearly marked final answers.
Use Hebrew formatting cues: Include phrases like "תשובה סופית:" or request \boxed{} formatting.
Specify language: Explicitly request Hebrew-only responses for consistent output.
Verify solutions: Always validate mathematical results, especially in educational contexts.

Demo Interface

Example Streamlit interface showing Hebrew Math Tutor providing step-by-step reasoning. The detailed reasoning can be collapsed for cleaner presentation.

Limitations & Considerations

Technical Limitations

Potential errors: May produce incorrect solutions or mathematical hallucinations.
Language mixing: Occasional mixing of Hebrew and English or inconsistent number formatting.
Training biases: May reflect biases present in the original training datasets.
Internal reasoning: <think>...</think> blocks remain in English due to training scope.

Usage Recommendations

Human verification required: Always validate outputs before use in educational settings
Not a replacement for educators: Designed as an assistive tool, not a substitute for qualified instruction.
Appropriate context: Best suited for educational prototyping and research applications.

Ethical Guidelines

Responsible Deployment

Include clear disclaimers about AI-generated content in user-facing applications.
Implement human oversight for any educational or assessment applications.
Ensure compliance with relevant privacy laws when collecting user data.
Provide transparency about model capabilities and limitations.

Educational Impact

Designed to enhance, not replace, human mathematical instruction.
Intended to increase accessibility of advanced math AI for Hebrew speakers.
Should be used as part of comprehensive educational approaches with human guidance.

Technical Details

Evaluation Methodology

Correctness verification: Solutions validated using Math-verify framework.
Statistical significance: Results based on 16 samples per problem for robust evaluation.
Language detection: Automated classification of response language for Hebrew Answers metric.
Benchmark diversity: Evaluation across competition mathematics (AIME) and curriculum problems (MATH500).

Reproducibility

All evaluation protocols follow standard mathematical reasoning assessment practices.
Sampling parameters and evaluation metrics clearly documented.
Training configuration and hyperparameters provided for reproduction.

Attribution & Licensing

Model License: Apache-2.0
Base Model: Qwen3-4B-Thinking-2507 (Alibaba)
Training Dataset: OpenMathReasoning (NVIDIA)
Development: Intel Labs

Citation

If you use Hebrew Math Tutor in your research or applications, please cite:

@misc{hebrew-math-tutor-v1,
  title={Hebrew Math Tutor: A Hebrew-focused Mathematical Reasoning Model},
  author={Intel AI},
  year={2025},
  url={https://huggingface.co/Intel/hebrew-math-tutor-v1},
  note={Fine-tuned from Qwen3-4B-Thinking-2507}
}

Community & Support

Blog Post: more details in the blog.
Model Repository: https://huggingface.co/Intel/hebrew-math-tutor-v1
Issues & Feedback: Use the Hugging Face repository issues for bug reports and feature requests.
Community Discussions: Join conversations in the repository discussions tab.

Changelog

v1.0 — Initial public release with Hebrew mathematical reasoning capabilities.

Hebrew Math Tutor represents a step forward in making advanced mathematical AI accessible across languages. We encourage responsible use and welcome community feedback to improve multilingual mathematical reasoning capabilities.

Downloads last month: 109

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Intel/hebrew-math-tutor-v1

Base model

Qwen/Qwen3-4B-Thinking-2507

Finetuned

(131)

this model

Merges

1 model

Quantizations