Update README.md
Browse files
README.md
CHANGED
@@ -149,8 +149,8 @@ The loss curve demonstrates stable convergence with the final training loss reac
|
|
149 |
| **Benchmark** | **Metric** | **Base Gemma-3-27B-IT** | **LogicFlow-Gemma-3-27b-thinking** | **Improvement** |
|
150 |
|---------------|------------|--------------------------|-------------------------------------|-----------------|
|
151 |
| **Mathematical Reasoning** |
|
152 |
-
| GSM8K |
|
153 |
-
| MATH |
|
154 |
| **Code Generation** |
|
155 |
| MBPP | pass@1 | 65.6% | **69.0%** | **+3.4%** |
|
156 |
| HumanEval | 0-shot | 48.8% | *Pending* | *TBD* |
|
@@ -158,9 +158,9 @@ The loss curve demonstrates stable convergence with the final training loss reac
|
|
158 |
| IFEval | Prompt-level | *45.0%* | **40.0%** | **-5.0%** |
|
159 |
| IFEval | Instruction-level | *58.0%* | **53.1%** | **-4.9%** |
|
160 |
| **Advanced Mathematics** |
|
161 |
-
| AIME25 |
|
162 |
| **Scientific Reasoning** |
|
163 |
-
| GPQA Diamond |
|
164 |
| **Knowledge & Understanding** |
|
165 |
| MMLU | Overall Accuracy | 78.6% | **75.3%** | **-3.3%** |
|
166 |
| MMLU STEM | Sciences & Math | ~70.0% | **71.6%** | **+1.6%** |
|
|
|
149 |
| **Benchmark** | **Metric** | **Base Gemma-3-27B-IT** | **LogicFlow-Gemma-3-27b-thinking** | **Improvement** |
|
150 |
|---------------|------------|--------------------------|-------------------------------------|-----------------|
|
151 |
| **Mathematical Reasoning** |
|
152 |
+
| GSM8K | 5-shot | 82.6% | **89.5%** | **+6.9%** |
|
153 |
+
| MATH | 5-shot | 50.0% | **76.8%** | **+26.8%** |
|
154 |
| **Code Generation** |
|
155 |
| MBPP | pass@1 | 65.6% | **69.0%** | **+3.4%** |
|
156 |
| HumanEval | 0-shot | 48.8% | *Pending* | *TBD* |
|
|
|
158 |
| IFEval | Prompt-level | *45.0%* | **40.0%** | **-5.0%** |
|
159 |
| IFEval | Instruction-level | *58.0%* | **53.1%** | **-4.9%** |
|
160 |
| **Advanced Mathematics** |
|
161 |
+
| AIME25 | 5-shot | ~8-12% | **13.3%** | **+1-5%** |
|
162 |
| **Scientific Reasoning** |
|
163 |
+
| GPQA Diamond | 5-shot | ~30-35% | **45.96%** | **+11-16%** |
|
164 |
| **Knowledge & Understanding** |
|
165 |
| MMLU | Overall Accuracy | 78.6% | **75.3%** | **-3.3%** |
|
166 |
| MMLU STEM | Sciences & Math | ~70.0% | **71.6%** | **+1.6%** |
|