|
--- |
|
library_name: transformers |
|
--- |
|
|
|
 |
|
|
|
- ์คํ ๊ฒฐ๊ณผ: [werty1248/s1.1-Ko-Native-result](https://huggingface.co/datasets/werty1248/s1.1-Ko-Native-result) |
|
- [werty1248/EXAONE-3.5-7.8B-s1-Ko-no-sample-packing](https://huggingface.co/werty1248/EXAONE-3.5-7.8B-s1-Ko-no-sample-packing) ๋ณด๋จ ๋ซ์ง๋ง, ์ค๋ฆฌ์ง๋ ๋ชจ๋ธ๊ณผ ์ ์ ์ฐจ์ด ๊ฑฐ์ ์์ |
|
|
|
### Training Details |
|
|
|
- [๊ณต์ ํ์ต ์ฝ๋](https://github.com/simplescaling/s1) ์ฌ์ฉ |
|
- 8xA40, 2.5 hours |
|
- Total batch size: 16 -> 8 |
|
- block_size=16384 |
|
- gradient_checkpointing=True |
|
|
|
### Others |
|
|
|
- VRAM ์์ฌ์์ฌ (block_size=20000, gradient_accumulation_steps=2 ์ ๋ถ CUDA OOM) |
|
- ๊ณ ์ง์ ์ธ "ํ ๋ฒ ์๋ชป ์๊ฐํ๋ฉด *์ ๊น๋ง์* ํด๋๊ณ ๋ ๊ณ์ ๊ฐ์ ์ค์๋ฅผ ๋ฐ๋ณตํ๋ ํ์"์ด ํด๊ฒฐ์ด ์๋จ |
|
- EXAONE์ ํน์ฑ or ์ํ๋ชจ๋ธ์ ํน์ฑ or ~๋ฒ์ญ ๋ฐ์ดํฐ์ ํน์ฑ~ |