ranarag commited on
Commit
de39e7d
·
verified ·
1 Parent(s): 39dcd68

Update README.md

Browse files

added citation for olmes.

Files changed (1) hide show
  1. README.md +53 -57
README.md CHANGED
@@ -206,9 +206,8 @@ By implementing this innovative prevention strategy, we can significantly reduce
206
 
207
  **Evaluation Results:**
208
  <table>
209
-
210
  <thead>
211
- <caption style="text-align:center"><b>Comparison with different models over various benchmarks. Scores of AlpacaEval-2.0 and Arena-Hard are calculated with thinking=True</b></caption>
212
  <tr>
213
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
214
  <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
@@ -222,53 +221,53 @@ By implementing this innovative prevention strategy, we can significantly reduce
222
  <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval</th>
223
  <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval+</th>
224
  <th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
225
- <th style="text-align:center; background-color: #001d6c; color: white;">Attaq</th>
226
  </tr></thead>
227
  <tbody>
228
  <tr>
229
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
230
- <td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
231
- <td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
232
- <td style="text-align:center; background-color: #DAE8FF; color: black;">57.11</td>
233
- <td style="text-align:center; background-color: #DAE8FF; color: black;">20.55</td>
234
- <td style="text-align:center; background-color: #DAE8FF; color: black;">59.79</td>
235
- <td style="text-align:center; background-color: #DAE8FF; color: black;">54.46</td>
236
- <td style="text-align:center; background-color: #DAE8FF; color: black;">18.68</td>
237
- <td style="text-align:center; background-color: #DAE8FF; color: black;">67.55</td>
238
- <td style="text-align:center; background-color: #DAE8FF; color: black;">79.45</td>
239
- <td style="text-align:center; background-color: #DAE8FF; color: black;">75.26</td>
240
- <td style="text-align:center; background-color: #DAE8FF; color: black;">63.59</td>
241
- <td style="text-align:center; background-color: #DAE8FF; color: black;">84.7</td>
242
  </tr>
243
  <tr>
244
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
245
- <td style="text-align:center; background-color: #DAE8FF; color: black;">24.86</td>
246
- <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
247
- <td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
248
- <td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
249
- <td style="text-align:center; background-color: #DAE8FF; color: black;">59.8</td>
250
- <td style="text-align:center; background-color: #DAE8FF; color: black;">52.27</td>
251
- <td style="text-align:center; background-color: #DAE8FF; color: black;">21.12</td>
252
- <td style="text-align:center; background-color: #DAE8FF; color: black;">67.02</td>
253
- <td style="text-align:center; background-color: #DAE8FF; color: black;">80.13</td>
254
- <td style="text-align:center; background-color: #DAE8FF; color: black;">73.39</td>
255
- <td style="text-align:center; background-color: #DAE8FF; color: black;">61.55</td>
256
- <td style="text-align:center; background-color: #DAE8FF; color: black;">83.23</td>
257
  </tr>
258
  <tr>
259
- <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
260
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 28.86 </td>
261
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 43.45 </td>
262
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
263
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
264
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
265
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 52.51 </td>
266
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
267
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
268
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
269
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 75.68 </td>
270
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.8 </td>
271
- <td style="text-align:center; background-color: #DAE8FF; color: black;">87.47</td>
272
  </tr>
273
 
274
  <tr>
@@ -285,7 +284,6 @@ By implementing this innovative prevention strategy, we can significantly reduce
285
  <td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
286
  <td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
287
  <td style="text-align:center; background-color: #DAE8FF; color: black;">83.43</td>
288
-
289
  </tr>
290
 
291
  <tr>
@@ -335,7 +333,6 @@ By implementing this innovative prevention strategy, we can significantly reduce
335
  <td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
336
  <td style="text-align:center; background-color: #DAE8FF; color: black;">42.45</td>
337
  </tr>
338
-
339
  <tr>
340
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
341
  <td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
@@ -352,7 +349,6 @@ By implementing this innovative prevention strategy, we can significantly reduce
352
  <td style="text-align:center; background-color: #DAE8FF; color: black;">85.73</td>
353
  </tr>
354
 
355
-
356
  <tr>
357
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
358
  <td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
@@ -395,19 +391,19 @@ By implementing this innovative prevention strategy, we can significantly reduce
395
  </tr></thead>
396
  <tbody>
397
  <tr>
398
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
399
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
400
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.07 </td>
401
  </tr>
402
  <tr>
403
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
404
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
405
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.54 </td>
406
  </tr>
407
  <tr>
408
- <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
409
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 3.28 </td>
410
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.09 </td>
411
  </tr>
412
  <tr>
413
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
@@ -425,9 +421,6 @@ By implementing this innovative prevention strategy, we can significantly reduce
425
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
426
  </tr>
427
  </tbody></table>
428
-
429
- </tbody></table>
430
-
431
  <!-- <table>
432
  <caption><b>Thinking Ablation</b></caption>
433
  <thead>
@@ -532,6 +525,9 @@ Granite-3.3-2B-Instruct builds upon Granite-3.3-2B-Base, leveraging both permiss
532
  - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
533
  - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
534
 
 
 
 
535
  <!-- ## Citation
536
  ```
537
  @misc{granite-models,
 
206
 
207
  **Evaluation Results:**
208
  <table>
 
209
  <thead>
210
+ <caption style="text-align:center"><b>Comparison with different models over various benchmarks. Scores of AlpacaEval-2.0 and Arena-Hard are calculated with thinking=True</b><sup id="fnref1"><a href="#fn1">1</a></caption>
211
  <tr>
212
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
213
  <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
 
221
  <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval</th>
222
  <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval+</th>
223
  <th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
224
+ <th style="text-align:center; background-color: #001d6c; color: white;">AttaQ</th>
225
  </tr></thead>
226
  <tbody>
227
  <tr>
228
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-2B-Instruct</td>
229
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">23.3</td>
230
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">27.17</td>
231
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">57.11</td>
232
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">20.55</td>
233
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">59.79</td>
234
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">54.46</td>
235
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">18.68</td>
236
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">67.55</td>
237
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">79.45</td>
238
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">75.26</td>
239
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">63.59</td>
240
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">84.7</td>
241
  </tr>
242
  <tr>
243
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.2-2B-Instruct</td>
244
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">24.86</td>
245
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">34.51</td>
246
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">57.18</td>
247
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">20.56</td>
248
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">59.8</td>
249
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">52.27</td>
250
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">21.12</td>
251
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">67.02</td>
252
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">80.13</td>
253
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">73.39</td>
254
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">61.55</td>
255
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">83.23</td>
256
  </tr>
257
  <tr>
258
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;"><b>Granite-3.3-2B-Instruct</b></td>
259
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 28.86 </td>
260
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 43.45 </td>
261
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 55.88 </td>
262
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 18.4 </td>
263
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 58.97 </td>
264
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 52.51 </td>
265
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 35.98 </td>
266
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 72.48 </td>
267
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 80.51 </td>
268
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 75.68 </td>
269
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 65.8 </td>
270
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">87.47</td>
271
  </tr>
272
 
273
  <tr>
 
284
  <td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
285
  <td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
286
  <td style="text-align:center; background-color: #DAE8FF; color: black;">83.43</td>
 
287
  </tr>
288
 
289
  <tr>
 
333
  <td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
334
  <td style="text-align:center; background-color: #DAE8FF; color: black;">42.45</td>
335
  </tr>
 
336
  <tr>
337
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
338
  <td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
 
349
  <td style="text-align:center; background-color: #DAE8FF; color: black;">85.73</td>
350
  </tr>
351
 
 
352
  <tr>
353
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
354
  <td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
 
391
  </tr></thead>
392
  <tbody>
393
  <tr>
394
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-2B-Instruct</td>
395
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 0.89 </td>
396
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 35.07 </td>
397
  </tr>
398
  <tr>
399
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.2-2B-Instruct</td>
400
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 0.89 </td>
401
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 35.54 </td>
402
  </tr>
403
  <tr>
404
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;"><b>Granite-3.3-2B-Instruct</b></td>
405
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 3.28 </td>
406
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 58.09 </td>
407
  </tr>
408
  <tr>
409
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
 
421
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
422
  </tr>
423
  </tbody></table>
 
 
 
424
  <!-- <table>
425
  <caption><b>Thinking Ablation</b></caption>
426
  <thead>
 
525
  - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
526
  - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
527
 
528
+
529
+
530
+ <p><a href="#fnref1" title="Jump back to reference">[1]</a> Evaluated using <a href="https://github.com/allenai/olmes">OLMES</a> (except the AttaQ scores)</p>
531
  <!-- ## Citation
532
  ```
533
  @misc{granite-models,