Metal3d commited on
Commit
3a1cf13
·
unverified ·
1 Parent(s): 61fff43

Add sampling and temperature parameters and grox up num tokens

Browse files

I think it's a good idea to allow the tweaking of this values.

Also, more tokens produce better results.

Files changed (1) hide show
  1. app.py +35 -12
app.py CHANGED
@@ -76,7 +76,13 @@ def rebuild_messages(history: list):
76
 
77
 
78
  @spaces.GPU
79
- def bot(history: list, max_num_tokens: int, final_num_tokens: int):
 
 
 
 
 
 
80
  """Make the model answering the question"""
81
 
82
  # to get token as a stream, later in a thread
@@ -114,6 +120,8 @@ def bot(history: list, max_num_tokens: int, final_num_tokens: int):
114
  kwargs=dict(
115
  max_new_tokens=num_tokens,
116
  streamer=streamer,
 
 
117
  ),
118
  )
119
  t.start()
@@ -133,14 +141,14 @@ def bot(history: list, max_num_tokens: int, final_num_tokens: int):
133
  yield history
134
 
135
 
136
- with gr.Blocks(fill_height=True, title="Making any model reasoning") as demo:
137
  with gr.Row(scale=1):
138
  with gr.Column(scale=5):
139
  gr.Markdown(f"""
140
- # Force reasoning for any model
141
 
142
- This is a simple proof-of-concept to get any LLM model to reason ahead of its response.
143
- This interface uses *{model_name}* model which is **not** a reasoning model. The used method
144
  is only to force some "reasoning" steps with prefixes to help the model to enhance the answer.
145
 
146
  See my related article here: [Make any model reasoning](https://huggingface.co/blog/Metal3d/making-any-model-reasoning)
@@ -158,10 +166,10 @@ with gr.Blocks(fill_height=True, title="Making any model reasoning") as demo:
158
  autofocus=True,
159
  )
160
  with gr.Column(scale=1):
161
- gr.Markdown("""## Tweaks""")
162
  num_tokens = gr.Slider(
163
  50,
164
- 255,
165
  100,
166
  step=1,
167
  label="Max tokens per reasoning step",
@@ -169,20 +177,29 @@ with gr.Blocks(fill_height=True, title="Making any model reasoning") as demo:
169
  )
170
  final_num_tokens = gr.Slider(
171
  50,
172
- 255,
173
- 200,
174
  step=1,
175
  label="Max token for the final answer",
176
  interactive=True,
177
  )
 
 
178
  gr.Markdown("""
179
  Using smaller number of tokens in the reasoning steps will make the model
180
  faster to answer, but it may not be able to go deep enough in its reasoning.
181
- A good value is 100.
182
 
183
  Using smaller number of tokens for the final answer will make the model
184
  to be less verbose, but it may not be able to give a complete answer.
185
- A good value is 200 to 255.
 
 
 
 
 
 
 
186
  """)
187
  gr.Markdown("""
188
  This interface can work on personal computer with 6Go VRAM (e.g. NVidia 3050/3060 on laptop).
@@ -196,7 +213,13 @@ with gr.Blocks(fill_height=True, title="Making any model reasoning") as demo:
196
  [msg, chatbot], # outputs
197
  ).then(
198
  bot,
199
- [chatbot, num_tokens, final_num_tokens], # actually, the "history" input
 
 
 
 
 
 
200
  chatbot, # to store the new history from the output
201
  )
202
 
 
76
 
77
 
78
  @spaces.GPU
79
+ def bot(
80
+ history: list,
81
+ max_num_tokens: int,
82
+ final_num_tokens: int,
83
+ do_sample: bool,
84
+ temperature: float,
85
+ ):
86
  """Make the model answering the question"""
87
 
88
  # to get token as a stream, later in a thread
 
120
  kwargs=dict(
121
  max_new_tokens=num_tokens,
122
  streamer=streamer,
123
+ do_sample=do_sample,
124
+ temperature=temperature,
125
  ),
126
  )
127
  t.start()
 
141
  yield history
142
 
143
 
144
+ with gr.Blocks(fill_height=True, title="Making any LLM model reasoning") as demo:
145
  with gr.Row(scale=1):
146
  with gr.Column(scale=5):
147
  gr.Markdown(f"""
148
+ # Force reasoning for any LLM
149
 
150
+ This is a simple proof-of-concept to get any LLM (Large language Model) to reason ahead of its response.
151
+ This interface uses *{model_name}* model **which is not a reasoning model**. The used method
152
  is only to force some "reasoning" steps with prefixes to help the model to enhance the answer.
153
 
154
  See my related article here: [Make any model reasoning](https://huggingface.co/blog/Metal3d/making-any-model-reasoning)
 
166
  autofocus=True,
167
  )
168
  with gr.Column(scale=1):
169
+ gr.Markdown("""## Tweaking""")
170
  num_tokens = gr.Slider(
171
  50,
172
+ 1024,
173
  100,
174
  step=1,
175
  label="Max tokens per reasoning step",
 
177
  )
178
  final_num_tokens = gr.Slider(
179
  50,
180
+ 1024,
181
+ 512,
182
  step=1,
183
  label="Max token for the final answer",
184
  interactive=True,
185
  )
186
+ do_sample = gr.Checkbox(True, label="Do sample")
187
+ temperature = gr.Slider(0.1, 1.0, 0.7, step=0.1, label="Temperature")
188
  gr.Markdown("""
189
  Using smaller number of tokens in the reasoning steps will make the model
190
  faster to answer, but it may not be able to go deep enough in its reasoning.
191
+ A good value is 100 to 512.
192
 
193
  Using smaller number of tokens for the final answer will make the model
194
  to be less verbose, but it may not be able to give a complete answer.
195
+ A good value is 512 to 1024.
196
+
197
+ **Do sample** uses another strategie to select the next token to complete the
198
+ answer. It's commonly better to leave it checked.
199
+
200
+ **Temperature** indicates how much the model could be "creative". 0.7 is a common value.
201
+ If you set a too high value (like 1.0) the model could be incoherent. With a low value
202
+ (like 0.3), the model will produce very predictives answers.
203
  """)
204
  gr.Markdown("""
205
  This interface can work on personal computer with 6Go VRAM (e.g. NVidia 3050/3060 on laptop).
 
213
  [msg, chatbot], # outputs
214
  ).then(
215
  bot,
216
+ [
217
+ chatbot,
218
+ num_tokens,
219
+ final_num_tokens,
220
+ do_sample,
221
+ temperature,
222
+ ], # actually, the "history" input
223
  chatbot, # to store the new history from the output
224
  )
225