shisa-ai
/

shisa-v2-qwen2.5-7b

Text Generation

text-generation-inference

Model card Files Files and versions Metrics Training metrics Community

leonardlin commited on Apr 16

Commit

ae8f95e

·

verified ·

1 Parent(s): d695cb7

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -105,6 +105,7 @@ We believe these benchmarks will be generally useful and plan to open-source the
 All Shisa V2 models inherit the [chat templates](https://huggingface.co/docs/transformers/v4.37.1/chat_templating) of their respective base models and have been tested and validated for proper inference with both [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang).
 Running sampler sweeps, we found the models operate well across a variety of temperatures in most settings. For translation tasks specifically, we recommend a lower temperatures (0.2) to increase accuracy. For role-play and creative tasks, a higher temp (eg 1.0) seems to give good results. To prevent cross-lingual token leakage we recommend a top_p of 0.9 or min_p of 0.1.
 No additional safety alignment has been done on these models, so they will largely inherit the base models' biases and safety profiles.
 ## Datasets

 All Shisa V2 models inherit the [chat templates](https://huggingface.co/docs/transformers/v4.37.1/chat_templating) of their respective base models and have been tested and validated for proper inference with both [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang).
 Running sampler sweeps, we found the models operate well across a variety of temperatures in most settings. For translation tasks specifically, we recommend a lower temperatures (0.2) to increase accuracy. For role-play and creative tasks, a higher temp (eg 1.0) seems to give good results. To prevent cross-lingual token leakage we recommend a top_p of 0.9 or min_p of 0.1.
 No additional safety alignment has been done on these models, so they will largely inherit the base models' biases and safety profiles.
 ## Datasets