File size: 9,373 Bytes
16b1368
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0508838
 
16b1368
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b26e5ea
 
16b1368
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b26e5ea
 
 
 
 
 
 
 
16b1368
b26e5ea
 
 
 
 
 
16b1368
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
---
datasets:
- LumiOpen/poro2-instruction-collection
language:
- fi
- en
license: llama3.3
library_name: transformers
pipeline_tag: text-generation
---

# Poro 2 8B SFT Model Card

> **Note for most users**: This is an intermediate checkpoint from our post-training pipeline. Most users should use [Poro 2 8B Instruct](https://huggingface.co/LumiOpen/Llama-Poro-2-8B-Instruct) instead, which includes an additional round of Direct Preference Optimization (DPO) for improved response quality and alignment. This SFT-only model is primarily intended for researchers interested in studying the effects of different post-training techniques.

Poro 2 8B SFT is a supervised fine-tuned model created from the Poro 2 8B Base model. This model has been trained for instruction following and conversational AI applications in both Finnish and English, but has not undergone preference tuning. It represents the intermediate step before Direct Preference Optimization (DPO) in our post-training pipeline.

Poro 2 was created in a collaboration between [AMD Silo AI](https://www.amd.com/en/solutions/ai/silo-ai.html), the [TurkuNLP group](https://turkunlp.org/) of the University of Turku, and [High Performance Language Technologies](https://hplt-project.org/) (HPLT). Training was conducted on the [LUMI supercomputer](https://www.lumi-supercomputer.eu/), using compute resources generously provided by [CSC](https://csc.fi/) - IT Center for Science, Finland.

For more details on our training and data generation pipeline, check out our [Continued Pretraining Playbook](https://rocm.blogs.amd.com/artificial-intelligence/multilingual-continued-pretraining/README.html). 

## Poro 2 Model Family

The Poro 2 model family includes both 8B and 70B models, and there are three different versions released of the Poro 2 models: a base model, a post-training SFT-only checkpoint, and the final instruct model which is the SFT model plus a round of DPO.

| Model | Based on | Base Model | SFT | Instruct |
| :---: | :------: | :--------: | :-: | :------- |
| Poro 2 8B | Llama 3.1 8B | [Poro 2 8B Base](https://huggingface.co/LumiOpen/Llama-Poro-2-8B-base) | [Poro 2 8B SFT](https://huggingface.co/LumiOpen/Llama-Poro-2-8B-SFT) | [Poro 2 8B Instruct](https://huggingface.co/LumiOpen/Llama-Poro-2-8B-Instruct) |
| Poro 2 70B | Llama 3.1 70B | [Poro 2 70B Base](https://huggingface.co/LumiOpen/Llama-Poro-2-70B-base) | [Poro 2 70B SFT](https://huggingface.co/LumiOpen/Llama-Poro-2-70B-SFT) | [Poro 2 70B Instruct](https://huggingface.co/LumiOpen/Llama-Poro-2-70B-Instruct) |

_What does Poro mean?_ Poro is the Finnish word for Reindeer! 🦌 These animals are native to Finland and hold a significant and historical role in Finnish culture.

## Model Overview

Poro 2 8B SFT is based on the Llama 3.1 8B architecture and has been supervised fine-tuned for instruction following. The model supports both English and Finnish conversations but has not undergone preference tuning for response quality optimization.

| Hyperparameter | Value  |
| :------------- | :----: |
| n_parameters | 8.03B |
| n_layers | 32 |
| n_heads | 32 |
| n_kv_heads | 8 |
| d_model | 4096 |
| vocab_size | 128256 |
| max_sequence_length | 8192 |
| base_model | Llama-3.1-8B |

## Training Process

### Continued Pretraining
The base Poro 2 8B model was created through continued pretraining on 165B tokens of Finnish, English, code, and math data.

### Supervised Fine-Tuning (SFT)
This model represents the SFT phase of post-training, using 1.4M instruction-following examples in English and Finnish, including:
- English and Finnish Tulu 3 prompts with Llama-3.3-70B-Instruct responses (1.35M samples)
- Multi-turn conversations generated using the Magpie method (14K samples)
- Top-rated conversations from OASST2 and Avoin Avustaja datasets (5K samples)
- Translation samples from EuroParl (1K samples)

We release the [Poro 2 instruction collection](https://huggingface.co/datasets/LumiOpen/poro2-instruction-collection).

## SFT Hyperparameters

| Hyperparameter | Value |
| :------------: | :---: |
| Epochs | 2 |
| Global batch size | 64 |
| Learning rate | 5e-6 |
| LR scheduler | linear |
| Warmup ratio | 0.03 |
| Max sequence length | 4,096 |

## Evaluation Results

Poro 2 8B SFT shows substantial improvements in Finnish instruction-following capabilities compared to Llama 3.1 8B Instruct, while maintaining strong English performance. Note that the final Instruct model (with DPO) performs significantly better.

### Finnish Instruction Following

|       | Poro 2 8B SFT | Llama 3.1 8B Instruct | Poro 2 8B Instruct |
|----------------|------------------|------------------------|--------------------|
| IFEval Finnish        | 64.69            | 47.31                  | **66.54**              |
| MTBench Finnish       | 5.92             | 4.10                   | **6.75**               |
| AlpacaEval 2 Finnish   | 16.80            | 2.05                   | **28.89**              |


### English Instruction Following
|       | Poro 2 8B SFT | Llama 3.1 8B Instruct | Poro 2 8B Instruct |
|----------------|--------|------------------------|--------------------|
| IFEval         | **79.66**  | 79.48                  | 79.29              |
| MTBench        | 7.07   | **7.70**                   | 7.33               |
| AlpacaEval 2   | 29.67  | 32.70                  | **35.30**              |


**Overall**: ~16% average improvement in Finnish instruction-following benchmarks compared to Llama 3.1 8B Instruct, with maintained English performance. The additional DPO step in the Instruct model provides further improvements.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "LumiOpen/Poro-2-8B-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Finnish conversation example
messages = [
    {"role": "user", "content": "Kerro minulle Suomen historiasta."}
]

inputs = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True,
    return_tensors="pt"
)

outputs = model.generate(
    inputs,
    max_new_tokens=500,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Research Applications

This SFT-only model is particularly useful for researchers studying:
- The effects of supervised fine-tuning vs. preference tuning
- Comparative analysis of different post-training techniques
- Ablation studies on instruction-following capabilities
- Cross-lingual transfer in instruction-following tasks
- The impact of DPO on model behavior and alignment

## Intended Use

Poro 2 8B SFT is primarily intended for:
- **Research purposes**: Studying post-training techniques and their effects
- **Comparative analysis**: Understanding the contribution of different training phases
- **Educational applications**: Learning about instruction-following model development
- **Development**: As a starting point for further preference tuning experiments

**For production use cases**, we recommend using [Poro 2 8B Instruct](https://huggingface.co/LumiOpen/Llama-Poro-2-8B-Instruct) instead.

## Ethical Considerations and Limitations

Poro 2 8B SFT is a research checkpoint optimized for English and Finnish instruction following. As this model has not undergone preference tuning, it may be more prone to generating responses that are misaligned with user expectations compared to the final Instruct model.

Key limitations:
- **No preference tuning**: May generate responses that are less aligned or of lower quality than the Instruct version
- Limited proficiency in languages other than English and Finnish
- May occasionally generate biased, inappropriate, or factually incorrect content
- Performance may vary significantly for specialized or technical domains
- Context window limited to 8,192 tokens
- May struggle with very recent events (knowledge cutoff limitations)

**Safety Considerations:**
- This model should primarily be used for research purposes
- Users should verify important factual claims independently
- The model should not be used for medical, legal, or financial advice without human oversight
- Responses should be reviewed for appropriateness in sensitive contexts
- Consider using the Instruct version for better alignment and response quality

## License

Built with Llama

Poro 2 8B SFT is released under the Llama 3.3 Community License. Please review the license terms before use.

## Citation

```bibtex
@misc{poro2_2025,
    title={Poro 2: Continued Pretraining for Language Acquisition},
    author={Elaine Zosa and Jouni Louma and Kai Hakala and Antti Virtanen and Mika Koistinen and Risto Luukkonen and Akseli Reunamo and Sampo Pyysalo and Jonathan Burdge},
    year={2025},
    howpublished={LumiOpen}
}
```

## Acknowledgments

We thank CSC - IT Center for Science, Finland for providing access to the LUMI supercomputer. This work was supported by the High Performance Language Technologies (HPLT) project and conducted in collaboration with TurkuNLP from the University of Turku. This project has received funding from the European Union's Horizon Europe research and innovation programme under grant agreement No 101070350.