segopecelus commited on
Commit
2275fc9
·
verified ·
1 Parent(s): d1f52e3

End of training

Browse files
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: JackFram/llama-160m
3
+ library_name: transformers
4
+ model_name: 7bbcd666-35c9-4b85-a369-15b783e08b25
5
+ tags:
6
+ - generated_from_trainer
7
+ - axolotl
8
+ - trl
9
+ - grpo
10
+ licence: license
11
+ ---
12
+
13
+ # Model Card for 7bbcd666-35c9-4b85-a369-15b783e08b25
14
+
15
+ This model is a fine-tuned version of [JackFram/llama-160m](https://huggingface.co/JackFram/llama-160m).
16
+ It has been trained using [TRL](https://github.com/huggingface/trl).
17
+
18
+ ## Quick start
19
+
20
+ ```python
21
+ from transformers import pipeline
22
+
23
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
+ generator = pipeline("text-generation", model="segopecelus/7bbcd666-35c9-4b85-a369-15b783e08b25", device="cuda")
25
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
+ print(output["generated_text"])
27
+ ```
28
+
29
+ ## Training procedure
30
+
31
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/apriasmoro-abcstudio/Gradients-On-Demand/runs/av7z68hj)
32
+
33
+
34
+ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
35
+
36
+ ### Framework versions
37
+
38
+ - TRL: 0.17.0
39
+ - Transformers: 4.51.3
40
+ - Pytorch: 2.5.1+cu124
41
+ - Datasets: 3.5.1
42
+ - Tokenizers: 0.21.1
43
+
44
+ ## Citations
45
+
46
+ Cite GRPO as:
47
+
48
+ ```bibtex
49
+ @article{zhihong2024deepseekmath,
50
+ title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
51
+ author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
52
+ year = 2024,
53
+ eprint = {arXiv:2402.03300},
54
+ }
55
+
56
+ ```
57
+
58
+ Cite TRL as:
59
+
60
+ ```bibtex
61
+ @misc{vonwerra2022trl,
62
+ title = {{TRL: Transformer Reinforcement Learning}},
63
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
64
+ year = 2020,
65
+ journal = {GitHub repository},
66
+ publisher = {GitHub},
67
+ howpublished = {\url{https://github.com/huggingface/trl}}
68
+ }
69
+ ```
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d1a404e3d06de8f128cf1faf5ac904780f56a79c6ac42c8ea6b99e64facd480f
3
  size 6804608
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e159f21147862ebe6538c17d8a8cba643e5eafd80337cd774c2a724bedfcce7b
3
  size 6804608
runs/Jun29_21-42-34_8a57041aaacc/events.out.tfevents.1751233355.8a57041aaacc.281.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:226faedf4e5f2a90934edf3f2154f207485bfa23586bb023c01dd21fa4f4d735
3
- size 90195
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:133cc31cfcdc1dcc58aa45a8a54fe6e3c19b784ec98a23c951dd8bbda0a112b6
3
+ size 159348