werty1248's picture
Update README.md
bc02efc verified
---
library_name: transformers
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/oe76LTbEo_qDZPTe3VtuW.png)
- ์‹คํ–‰ ๊ฒฐ๊ณผ: [werty1248/s1.1-Ko-Native-result](https://huggingface.co/datasets/werty1248/s1.1-Ko-Native-result)
- [werty1248/EXAONE-3.5-7.8B-s1-Ko-no-sample-packing](https://huggingface.co/werty1248/EXAONE-3.5-7.8B-s1-Ko-no-sample-packing) ๋ณด๋‹จ ๋‚ซ์ง€๋งŒ, ์˜ค๋ฆฌ์ง€๋„ ๋ชจ๋ธ๊ณผ ์ ์ˆ˜ ์ฐจ์ด ๊ฑฐ์˜ ์—†์Œ
### Training Details
- [๊ณต์‹ ํ•™์Šต ์ฝ”๋“œ](https://github.com/simplescaling/s1) ์‚ฌ์šฉ
- 8xA40, 2.5 hours
- Total batch size: 16 -> 8
- block_size=16384
- gradient_checkpointing=True
### Others
- VRAM ์•„์Šฌ์•„์Šฌ (block_size=20000, gradient_accumulation_steps=2 ์ „๋ถ€ CUDA OOM)
- ๊ณ ์งˆ์ ์ธ "ํ•œ ๋ฒˆ ์ž˜๋ชป ์ƒ๊ฐํ•˜๋ฉด *์ž ๊น๋งŒ์š”* ํ•ด๋†“๊ณ ๋„ ๊ณ„์† ๊ฐ™์€ ์‹ค์ˆ˜๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ํ˜„์ƒ"์ด ํ•ด๊ฒฐ์ด ์•ˆ๋จ
- EXAONE์˜ ํŠน์„ฑ or ์†Œํ˜•๋ชจ๋ธ์˜ ํŠน์„ฑ or ~๋ฒˆ์—ญ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ~