File size: 4,677 Bytes
b35fb94 4a410ad 9758a5a 4a410ad b35fb94 0388972 6397fe2 cd8383b b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a 4a410ad 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a b35fb94 9758a5a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
library_name: transformers
tags:
- mamba
- deepseek
- reasoning
base_model:
- tiiuae/Falcon3-Mamba-7B-Instruct
pipeline_tag: text-generation
---
# Falcon3-Mamba-R1-v0
<img src="https://i.ibb.co/8DbzbXXC/Untitled-design.png" width="360" height="360" />
## Model Details
**Model Description:**
This model is a fine-tuned version of Falcon3-Mamba-7B-Instruct, optimized for logical reasoning and structured problem-solving before generating responses.
It leverages the Mamba architecture, which scales linearly with an increased number of tokens, making it an efficient and fast reasoning model while maintaining high response quality.
This fine-tuned version comes from an earlier checkpoint of the fine tuning pipeline.
* **Developed by:** Hanzla Javaid
* **Base Model:** tiiuae/Falcon3-Mamba-7B-Instruct
* **Model Type:** Mamba-based causal decoder
* **Model Release Date:** March 2025
## Intended Uses
**Direct Use:**
This model is designed for:
* Reasoning-heavy tasks (math, logic, and structured problem-solving)
* STEM-based question-answering
* General-purpose text generation
**Downstream Use:**
* Fine-tuning for domain-specific applications such as finance, law, medicine, and research.
* Integration into chatbots and virtual assistants that require advanced reasoning skills.
* Enhancement of automated coding assistants with structured logic building.
**Out-of-Scope Use:**
* Misinformation or deceptive applications
* Automated decision-making in high-risk fields (e.g., medical diagnosis without human oversight)
* Bias-sensitive applications where fairness is critical but not explicitly controlled
## Bias and Limitations
**Known Biases:**
* The model prioritizes English language data, so performance on multilingual tasks may be weaker.
* Fine-tuning may introduce or amplify biases present in the training data, especially in areas like ethics, politics, and cultural perspectives.
**Technical Limitations:**
* Performance may degrade on long-form generation beyond 64K tokens.
**Recommendations:**
* Users should verify outputs for accuracy, especially in critical applications.
* Regular bias evaluation should be conducted when deploying in production environments.
## Getting Started
To use this model, you can load it with transformers:
```python
repo_name = "hanzla/Falcon3-Mamba-R1-v0"
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForCausalLM.from_pretrained(
repo_name,
device_map="auto",
torch_dtype=torch.float16,
)
def generate_text(prompt,generation_model,generation_tokenizer,max_tokens=1024):
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": prompt},
]
input_text = generation_tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(input_text)
input_ids = generation_tokenizer(input_text, return_tensors="pt").input_ids.to("auto")
outputs = generation_model.generate(input_ids, max_new_tokens=max_tokens)
generated_tokens = outputs[0][len(input_ids[0]):]
return tokenizer.decode(generated_tokens, skip_special_tokens=True)
```
## Training Details
**Training Procedure:**
* **Pretrained Base Model:** Falcon3-Mamba-7B-Instruct
* **Fine-tuning Data:** A subset of STEM problems from open-thoughts/OpenThoughts-114k
* **Training Strategy:** GRPO
* **Training Hyperparameters:**
* **Batch Size:** 4
* **Epochs:** 3
* **Precision:** Mixed (fp16 / bf16)
* **Hardware:** 2xH100 GPUs
## Evaluation
**Testing Data and Metrics:**
The fine-tuned model's performance was evaluated on a variety of benchmarks to assess its reasoning abilities and overall capabilities. The table below presents a comparison between the fine-tuned model and the base model:
| Category | Benchmark | Falcon3-Mamba-R1-v0 | Base Falcon3-Mamba-7B-Instruct |
|---------------|--------------------------------|----------------------------------------|---------------------------------|
| General | MMLU (5-shot) | 72.1 | 65.3 |
| Math | GSM8K (5-shot) | 89.5 | 65.2 |
## Technical Specifications
**Model Architecture:**
* **Mamba Blocks:** 64
* **Hidden Size:** 4096
**Software Requirements:**
* `transformers >= 4.38`
* `torch >= 2.1`
* `accelerate >= 0.25`
* `mamba-ssm`
* `causal-conv1d>=1.4.0` |