Update README.md
Browse files
README.md
CHANGED
@@ -148,20 +148,20 @@ The loss curve demonstrates stable convergence with the final training loss reac
|
|
148 |
|
149 |
| **Benchmark** | **Metric** | **Base Gemma-3-27B-IT** | **LogicFlow-Gemma-3-27b-thinking** | **Improvement** |
|
150 |
|---------------|------------|--------------------------|-------------------------------------|-----------------|
|
151 |
-
| **
|
152 |
| GSM8K | Exact Match | 82.6% | **89.5%** | **+6.9%** |
|
153 |
| MATH | Accuracy | 50.0% | **76.8%** | **+26.8%** |
|
154 |
-
| **
|
155 |
| MBPP | pass@1 | 65.6% | **69.0%** | **+3.4%** |
|
156 |
| HumanEval | 0-shot | 48.8% | *Pending* | *TBD* |
|
157 |
-
| **
|
158 |
| IFEval | Prompt-level | *45.0%* | **40.0%** | **-5.0%** |
|
159 |
| IFEval | Instruction-level | *58.0%* | **53.1%** | **-4.9%** |
|
160 |
-
| **
|
161 |
| AIME25 | Problem Solving | ~8-12% | **13.3%** | **+1-5%** |
|
162 |
-
| **
|
163 |
| GPQA Diamond | Science QA | ~30-35% | **45.96%** | **+11-16%** |
|
164 |
-
| **
|
165 |
| MMLU | Overall Accuracy | 78.6% | **75.3%** | **-3.3%** |
|
166 |
| MMLU STEM | Sciences & Math | ~70.0% | **71.6%** | **+1.6%** |
|
167 |
| MMLU Humanities | Arts & Literature | ~67.0% | **69.2%** | **+2.2%** |
|
@@ -422,10 +422,10 @@ The model showcases systematic thinking through:
|
|
422 |
- Clear documentation of the reasoning process
|
423 |
|
424 |
These examples demonstrate the model's ability to:
|
425 |
-
- **
|
426 |
-
- **
|
427 |
-
- **
|
428 |
-
- **
|
429 |
|
430 |
### Activating Chain-of-Thought Reasoning
|
431 |
|
|
|
148 |
|
149 |
| **Benchmark** | **Metric** | **Base Gemma-3-27B-IT** | **LogicFlow-Gemma-3-27b-thinking** | **Improvement** |
|
150 |
|---------------|------------|--------------------------|-------------------------------------|-----------------|
|
151 |
+
| **Mathematical Reasoning** |
|
152 |
| GSM8K | Exact Match | 82.6% | **89.5%** | **+6.9%** |
|
153 |
| MATH | Accuracy | 50.0% | **76.8%** | **+26.8%** |
|
154 |
+
| **Code Generation** |
|
155 |
| MBPP | pass@1 | 65.6% | **69.0%** | **+3.4%** |
|
156 |
| HumanEval | 0-shot | 48.8% | *Pending* | *TBD* |
|
157 |
+
| **Instruction Following** |
|
158 |
| IFEval | Prompt-level | *45.0%* | **40.0%** | **-5.0%** |
|
159 |
| IFEval | Instruction-level | *58.0%* | **53.1%** | **-4.9%** |
|
160 |
+
| **Advanced Mathematics** |
|
161 |
| AIME25 | Problem Solving | ~8-12% | **13.3%** | **+1-5%** |
|
162 |
+
| **Scientific Reasoning** |
|
163 |
| GPQA Diamond | Science QA | ~30-35% | **45.96%** | **+11-16%** |
|
164 |
+
| **Knowledge & Understanding** |
|
165 |
| MMLU | Overall Accuracy | 78.6% | **75.3%** | **-3.3%** |
|
166 |
| MMLU STEM | Sciences & Math | ~70.0% | **71.6%** | **+1.6%** |
|
167 |
| MMLU Humanities | Arts & Literature | ~67.0% | **69.2%** | **+2.2%** |
|
|
|
422 |
- Clear documentation of the reasoning process
|
423 |
|
424 |
These examples demonstrate the model's ability to:
|
425 |
+
- **Break down complex problems** into manageable steps
|
426 |
+
- **Self-verify results** using multiple approaches
|
427 |
+
- **Document reasoning chains** for transparency
|
428 |
+
- **Maintain accuracy** while showing work
|
429 |
|
430 |
### Activating Chain-of-Thought Reasoning
|
431 |
|