|
--- |
|
library_name: transformers |
|
tags: |
|
- mamba |
|
- deepseek |
|
- reasoning |
|
base_model: |
|
- tiiuae/Falcon3-Mamba-7B-Instruct |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Falcon3-Mamba-R1-v0 |
|
<img src="https://i.ibb.co/8DbzbXXC/Untitled-design.png" width="360" height="360" /> |
|
|
|
## Model Details |
|
|
|
**Model Description:** |
|
|
|
This model is a fine-tuned version of Falcon3-Mamba-7B-Instruct, optimized for logical reasoning and structured problem-solving before generating responses. |
|
|
|
It leverages the Mamba architecture, which scales linearly with an increased number of tokens, making it an efficient and fast reasoning model while maintaining high response quality. |
|
|
|
This fine-tuned version comes from an earlier checkpoint of the fine tuning pipeline. |
|
|
|
* **Developed by:** Hanzla Javaid |
|
* **Base Model:** tiiuae/Falcon3-Mamba-7B-Instruct |
|
* **Model Type:** Mamba-based causal decoder |
|
* **Model Release Date:** March 2025 |
|
|
|
## Intended Uses |
|
|
|
**Direct Use:** |
|
|
|
This model is designed for: |
|
|
|
* Reasoning-heavy tasks (math, logic, and structured problem-solving) |
|
* STEM-based question-answering |
|
* General-purpose text generation |
|
|
|
**Downstream Use:** |
|
|
|
* Fine-tuning for domain-specific applications such as finance, law, medicine, and research. |
|
* Integration into chatbots and virtual assistants that require advanced reasoning skills. |
|
* Enhancement of automated coding assistants with structured logic building. |
|
|
|
**Out-of-Scope Use:** |
|
|
|
* Misinformation or deceptive applications |
|
* Automated decision-making in high-risk fields (e.g., medical diagnosis without human oversight) |
|
* Bias-sensitive applications where fairness is critical but not explicitly controlled |
|
|
|
## Bias and Limitations |
|
|
|
**Known Biases:** |
|
|
|
* The model prioritizes English language data, so performance on multilingual tasks may be weaker. |
|
* Fine-tuning may introduce or amplify biases present in the training data, especially in areas like ethics, politics, and cultural perspectives. |
|
|
|
**Technical Limitations:** |
|
|
|
* Performance may degrade on long-form generation beyond 64K tokens. |
|
|
|
|
|
**Recommendations:** |
|
|
|
* Users should verify outputs for accuracy, especially in critical applications. |
|
* Regular bias evaluation should be conducted when deploying in production environments. |
|
|
|
## Getting Started |
|
|
|
To use this model, you can load it with transformers: |
|
|
|
```python |
|
repo_name = "hanzla/Falcon3-Mamba-R1-v0" |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(repo_name) |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
repo_name, |
|
device_map="auto", |
|
torch_dtype=torch.float16, |
|
) |
|
|
|
def generate_text(prompt,generation_model,generation_tokenizer,max_tokens=1024): |
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant"}, |
|
{"role": "user", "content": prompt}, |
|
] |
|
input_text = generation_tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
print(input_text) |
|
input_ids = generation_tokenizer(input_text, return_tensors="pt").input_ids.to("auto") |
|
outputs = generation_model.generate(input_ids, max_new_tokens=max_tokens) |
|
generated_tokens = outputs[0][len(input_ids[0]):] |
|
return tokenizer.decode(generated_tokens, skip_special_tokens=True) |
|
|
|
``` |
|
|
|
## Training Details |
|
|
|
|
|
**Training Procedure:** |
|
|
|
* **Pretrained Base Model:** Falcon3-Mamba-7B-Instruct |
|
* **Fine-tuning Data:** A subset of STEM problems from open-thoughts/OpenThoughts-114k |
|
* **Training Strategy:** GRPO |
|
* **Training Hyperparameters:** |
|
* **Batch Size:** 4 |
|
* **Epochs:** 3 |
|
* **Precision:** Mixed (fp16 / bf16) |
|
* **Hardware:** 2xH100 GPUs |
|
|
|
## Evaluation |
|
|
|
**Testing Data and Metrics:** |
|
|
|
The fine-tuned model's performance was evaluated on a variety of benchmarks to assess its reasoning abilities and overall capabilities. The table below presents a comparison between the fine-tuned model and the base model: |
|
|
|
| Category | Benchmark | Falcon3-Mamba-R1-v0 | Base Falcon3-Mamba-7B-Instruct | |
|
|---------------|--------------------------------|----------------------------------------|---------------------------------| |
|
| General | MMLU (5-shot) | 72.1 | 65.3 | |
|
| Math | GSM8K (5-shot) | 89.5 | 65.2 | |
|
|
|
|
|
## Technical Specifications |
|
|
|
**Model Architecture:** |
|
|
|
* **Mamba Blocks:** 64 |
|
* **Hidden Size:** 4096 |
|
|
|
**Software Requirements:** |
|
|
|
* `transformers >= 4.38` |
|
* `torch >= 2.1` |
|
* `accelerate >= 0.25` |
|
* `mamba-ssm` |
|
* `causal-conv1d>=1.4.0` |