File size: 973 Bytes
a3f70ac bc02efc a3f70ac bc02efc a3f70ac bc02efc a3f70ac bc02efc a3f70ac bc02efc a3f70ac bc02efc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
---
library_name: transformers
---

- ์คํ ๊ฒฐ๊ณผ: [werty1248/s1.1-Ko-Native-result](https://huggingface.co/datasets/werty1248/s1.1-Ko-Native-result)
- [werty1248/EXAONE-3.5-7.8B-s1-Ko-no-sample-packing](https://huggingface.co/werty1248/EXAONE-3.5-7.8B-s1-Ko-no-sample-packing) ๋ณด๋จ ๋ซ์ง๋ง, ์ค๋ฆฌ์ง๋ ๋ชจ๋ธ๊ณผ ์ ์ ์ฐจ์ด ๊ฑฐ์ ์์
### Training Details
- [๊ณต์ ํ์ต ์ฝ๋](https://github.com/simplescaling/s1) ์ฌ์ฉ
- 8xA40, 2.5 hours
- Total batch size: 16 -> 8
- block_size=16384
- gradient_checkpointing=True
### Others
- VRAM ์์ฌ์์ฌ (block_size=20000, gradient_accumulation_steps=2 ์ ๋ถ CUDA OOM)
- ๊ณ ์ง์ ์ธ "ํ ๋ฒ ์๋ชป ์๊ฐํ๋ฉด *์ ๊น๋ง์* ํด๋๊ณ ๋ ๊ณ์ ๊ฐ์ ์ค์๋ฅผ ๋ฐ๋ณตํ๋ ํ์"์ด ํด๊ฒฐ์ด ์๋จ
- EXAONE์ ํน์ฑ or ์ํ๋ชจ๋ธ์ ํน์ฑ or ~๋ฒ์ญ ๋ฐ์ดํฐ์ ํน์ฑ~ |