LumiOpen
/

Llama-Poro-2-8B-SFT

@@ -55,6 +55,8 @@ This model represents the SFT phase of post-training, using 1.4M instruction-fol
 - Top-rated conversations from OASST2 and Avoin Avustaja datasets (5K samples)
 - Translation samples from EuroParl (1K samples)
 ## SFT Hyperparameters
 | Hyperparameter | Value |
@@ -71,14 +73,25 @@ This model represents the SFT phase of post-training, using 1.4M instruction-fol
 Poro 2 8B SFT shows substantial improvements in Finnish instruction-following capabilities compared to Llama 3.1 8B Instruct, while maintaining strong English performance. Note that the final Instruct model (with DPO) performs significantly better.
 ### Finnish Instruction Following
 - IFEval (Finnish): 64.69 (vs 47.31 Llama 3.1 8B Instruct, vs 66.54 Poro 2 8B Instruct)
 - MTBench (Finnish): 5.92 (vs 4.1 Llama 3.1 8B Instruct, vs 6.75 Poro 2 8B Instruct)
 - AlpacaEval 2 (Finnish): 16.8 (vs 2.05 Llama 3.1 8B Instruct, vs 28.89 Poro 2 8B Instruct)
 ### English Instruction Following
-- IFEval: 79.66 (vs 79.48 Llama 3.1 8B Instruct, vs 79.29 Poro 2 8B Instruct)
-- MTBench: 7.07 (vs 7.7 Llama 3.1 8B Instruct, vs 7.33 Poro 2 8B Instruct)
-- AlpacaEval 2: 29.67 (vs 32.7 Llama 3.1 8B Instruct, vs 35.3 Poro 2 8B Instruct)
 **Overall**: ~16% average improvement in Finnish instruction-following benchmarks compared to Llama 3.1 8B Instruct, with maintained English performance. The additional DPO step in the Instruct model provides further improvements.

 - Top-rated conversations from OASST2 and Avoin Avustaja datasets (5K samples)
 - Translation samples from EuroParl (1K samples)
+We release the [Poro 2 instruction collection](https://huggingface.co/datasets/LumiOpen/poro2-instruction-collection).
 ## SFT Hyperparameters
 | Hyperparameter | Value |
 Poro 2 8B SFT shows substantial improvements in Finnish instruction-following capabilities compared to Llama 3.1 8B Instruct, while maintaining strong English performance. Note that the final Instruct model (with DPO) performs significantly better.
 ### Finnish Instruction Following
+|       | Poro 2 8B SFT | Llama 3.1 8B Instruct | Poro 2 8B Instruct |
+|----------------|------------------|------------------------|--------------------|
+| IFEval Finnish        | 64.69            | 47.31                  | **66.54**              |
+| MTBench Finnish       | 5.92             | 4.10                   | **6.75**               |
+| AlpacaEval 2 Finnish   | 16.80            | 2.05                   | **28.89**              |
 - IFEval (Finnish): 64.69 (vs 47.31 Llama 3.1 8B Instruct, vs 66.54 Poro 2 8B Instruct)
 - MTBench (Finnish): 5.92 (vs 4.1 Llama 3.1 8B Instruct, vs 6.75 Poro 2 8B Instruct)
 - AlpacaEval 2 (Finnish): 16.8 (vs 2.05 Llama 3.1 8B Instruct, vs 28.89 Poro 2 8B Instruct)
 ### English Instruction Following
+|       | Poro 2 8B SFT | Llama 3.1 8B Instruct | Poro 2 8B Instruct |
+|----------------|--------|------------------------|--------------------|
+| IFEval         | **79.66**  | 79.48                  | 79.29              |
+| MTBench        | 7.07   | **7.70**                   | 7.33               |
+| AlpacaEval 2   | 29.67  | 32.70                  | **35.30**              |
 **Overall**: ~16% average improvement in Finnish instruction-following benchmarks compared to Llama 3.1 8B Instruct, with maintained English performance. The additional DPO step in the Instruct model provides further improvements.