Text Generation
Transformers
TensorBoard
Safetensors
Japanese
English
qwen2
conversational
text-generation-inference
leonardlin commited on
Commit
ae8f95e
·
verified ·
1 Parent(s): d695cb7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -105,6 +105,7 @@ We believe these benchmarks will be generally useful and plan to open-source the
105
  All Shisa V2 models inherit the [chat templates](https://huggingface.co/docs/transformers/v4.37.1/chat_templating) of their respective base models and have been tested and validated for proper inference with both [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang).
106
 
107
  Running sampler sweeps, we found the models operate well across a variety of temperatures in most settings. For translation tasks specifically, we recommend a lower temperatures (0.2) to increase accuracy. For role-play and creative tasks, a higher temp (eg 1.0) seems to give good results. To prevent cross-lingual token leakage we recommend a top_p of 0.9 or min_p of 0.1.
 
108
  No additional safety alignment has been done on these models, so they will largely inherit the base models' biases and safety profiles.
109
 
110
  ## Datasets
 
105
  All Shisa V2 models inherit the [chat templates](https://huggingface.co/docs/transformers/v4.37.1/chat_templating) of their respective base models and have been tested and validated for proper inference with both [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang).
106
 
107
  Running sampler sweeps, we found the models operate well across a variety of temperatures in most settings. For translation tasks specifically, we recommend a lower temperatures (0.2) to increase accuracy. For role-play and creative tasks, a higher temp (eg 1.0) seems to give good results. To prevent cross-lingual token leakage we recommend a top_p of 0.9 or min_p of 0.1.
108
+
109
  No additional safety alignment has been done on these models, so they will largely inherit the base models' biases and safety profiles.
110
 
111
  ## Datasets