werty1248
/

EXAONE-3.5-7.8B-s1.1-Ko-Native

Text Generation

text-generation-inference

Model card Files Files and versions Community

EXAONE-3.5-7.8B-s1.1-Ko-Native / README.md

werty1248's picture

Update README.md

bc02efc verified about 1 month ago

|

history blame contribute delete

973 Bytes

	---
	library_name: transformers
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/oe76LTbEo_qDZPTe3VtuW.png)

	- 실행 결과: [werty1248/s1.1-Ko-Native-result](https://huggingface.co/datasets/werty1248/s1.1-Ko-Native-result)
	- [werty1248/EXAONE-3.5-7.8B-s1-Ko-no-sample-packing](https://huggingface.co/werty1248/EXAONE-3.5-7.8B-s1-Ko-no-sample-packing) 보단 낫지만, 오리지널 모델과 점수 차이 거의 없음

	### Training Details

	- [공식 학습 코드](https://github.com/simplescaling/s1) 사용
	- 8xA40, 2.5 hours
	- Total batch size: 16 -> 8
	- block_size=16384
	- gradient_checkpointing=True

	### Others

	- VRAM 아슬아슬 (block_size=20000, gradient_accumulation_steps=2 전부 CUDA OOM)
	- 고질적인 "한 번 잘못 생각하면 잠깐만요 해놓고도 계속 같은 실수를 반복하는 현상"이 해결이 안됨
	- EXAONE의 특성 or 소형모델의 특성 or ~번역 데이터의 특성~