Update README.md
Browse filescorrection in bigbenchhard score and removal of thinking ablation table.
README.md
CHANGED
@@ -262,7 +262,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
262 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
|
263 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
|
264 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
|
265 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">
|
266 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
|
267 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
|
268 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
|
@@ -375,7 +375,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
375 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.54 </td>
|
376 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 26.17 </td>
|
377 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 66.86 </td>
|
378 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">
|
379 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 41.53 </td>
|
380 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.89 </td>
|
381 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 89.73 </td>
|
@@ -428,7 +428,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
428 |
|
429 |
</tbody></table>
|
430 |
|
431 |
-
<table>
|
432 |
<caption><b>Thinking Ablation</b></caption>
|
433 |
<thead>
|
434 |
<tr>
|
@@ -514,7 +514,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
514 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">69.02</td>
|
515 |
</tr>
|
516 |
</table>
|
517 |
-
<tbody>
|
518 |
|
519 |
**Training Data:**
|
520 |
Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
|
|
|
262 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
|
263 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
|
264 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
|
265 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 52.51 </td>
|
266 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
|
267 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
|
268 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
|
|
|
375 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.54 </td>
|
376 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 26.17 </td>
|
377 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 66.86 </td>
|
378 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 59.01 </td>
|
379 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 41.53 </td>
|
380 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.89 </td>
|
381 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 89.73 </td>
|
|
|
428 |
|
429 |
</tbody></table>
|
430 |
|
431 |
+
<!-- <table>
|
432 |
<caption><b>Thinking Ablation</b></caption>
|
433 |
<thead>
|
434 |
<tr>
|
|
|
514 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">69.02</td>
|
515 |
</tr>
|
516 |
</table>
|
517 |
+
<tbody> -->
|
518 |
|
519 |
**Training Data:**
|
520 |
Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
|