laineyyy commited on
Commit
b26e5ea
·
verified ·
1 Parent(s): 16b1368

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -3
README.md CHANGED
@@ -55,6 +55,8 @@ This model represents the SFT phase of post-training, using 1.4M instruction-fol
55
  - Top-rated conversations from OASST2 and Avoin Avustaja datasets (5K samples)
56
  - Translation samples from EuroParl (1K samples)
57
 
 
 
58
  ## SFT Hyperparameters
59
 
60
  | Hyperparameter | Value |
@@ -71,14 +73,25 @@ This model represents the SFT phase of post-training, using 1.4M instruction-fol
71
  Poro 2 8B SFT shows substantial improvements in Finnish instruction-following capabilities compared to Llama 3.1 8B Instruct, while maintaining strong English performance. Note that the final Instruct model (with DPO) performs significantly better.
72
 
73
  ### Finnish Instruction Following
 
 
 
 
 
 
 
 
74
  - IFEval (Finnish): 64.69 (vs 47.31 Llama 3.1 8B Instruct, vs 66.54 Poro 2 8B Instruct)
75
  - MTBench (Finnish): 5.92 (vs 4.1 Llama 3.1 8B Instruct, vs 6.75 Poro 2 8B Instruct)
76
  - AlpacaEval 2 (Finnish): 16.8 (vs 2.05 Llama 3.1 8B Instruct, vs 28.89 Poro 2 8B Instruct)
77
 
78
  ### English Instruction Following
79
- - IFEval: 79.66 (vs 79.48 Llama 3.1 8B Instruct, vs 79.29 Poro 2 8B Instruct)
80
- - MTBench: 7.07 (vs 7.7 Llama 3.1 8B Instruct, vs 7.33 Poro 2 8B Instruct)
81
- - AlpacaEval 2: 29.67 (vs 32.7 Llama 3.1 8B Instruct, vs 35.3 Poro 2 8B Instruct)
 
 
 
82
 
83
  **Overall**: ~16% average improvement in Finnish instruction-following benchmarks compared to Llama 3.1 8B Instruct, with maintained English performance. The additional DPO step in the Instruct model provides further improvements.
84
 
 
55
  - Top-rated conversations from OASST2 and Avoin Avustaja datasets (5K samples)
56
  - Translation samples from EuroParl (1K samples)
57
 
58
+ We release the [Poro 2 instruction collection](https://huggingface.co/datasets/LumiOpen/poro2-instruction-collection).
59
+
60
  ## SFT Hyperparameters
61
 
62
  | Hyperparameter | Value |
 
73
  Poro 2 8B SFT shows substantial improvements in Finnish instruction-following capabilities compared to Llama 3.1 8B Instruct, while maintaining strong English performance. Note that the final Instruct model (with DPO) performs significantly better.
74
 
75
  ### Finnish Instruction Following
76
+
77
+ | | Poro 2 8B SFT | Llama 3.1 8B Instruct | Poro 2 8B Instruct |
78
+ |----------------|------------------|------------------------|--------------------|
79
+ | IFEval Finnish | 64.69 | 47.31 | **66.54** |
80
+ | MTBench Finnish | 5.92 | 4.10 | **6.75** |
81
+ | AlpacaEval 2 Finnish | 16.80 | 2.05 | **28.89** |
82
+
83
+
84
  - IFEval (Finnish): 64.69 (vs 47.31 Llama 3.1 8B Instruct, vs 66.54 Poro 2 8B Instruct)
85
  - MTBench (Finnish): 5.92 (vs 4.1 Llama 3.1 8B Instruct, vs 6.75 Poro 2 8B Instruct)
86
  - AlpacaEval 2 (Finnish): 16.8 (vs 2.05 Llama 3.1 8B Instruct, vs 28.89 Poro 2 8B Instruct)
87
 
88
  ### English Instruction Following
89
+ | | Poro 2 8B SFT | Llama 3.1 8B Instruct | Poro 2 8B Instruct |
90
+ |----------------|--------|------------------------|--------------------|
91
+ | IFEval | **79.66** | 79.48 | 79.29 |
92
+ | MTBench | 7.07 | **7.70** | 7.33 |
93
+ | AlpacaEval 2 | 29.67 | 32.70 | **35.30** |
94
+
95
 
96
  **Overall**: ~16% average improvement in Finnish instruction-following benchmarks compared to Llama 3.1 8B Instruct, with maintained English performance. The additional DPO step in the Instruct model provides further improvements.
97