Qwen
/

Qwen3-14B-FP8

Text Generation

Model card Files Files and versions

Update README.md

#3

by medmekk HF Staff - opened 8 days ago

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

Files changed (1) hide show

README.md +0 -2

README.md CHANGED Viewed

@@ -110,8 +110,6 @@ For convenience and performance, we have provided `fp8`-quantized model checkpoi
 You can use the Qwen3-14B-FP8 model with serveral inference frameworks, including `transformers`, `sglang`, and `vllm`, as the original bfloat16 model.
 However, please pay attention to the following known issues:
-- `transformers`:
-    - there are currently issues with the "fine-grained fp8" method in `transformers` for distributed inference. You may need to set the environment variable `CUDA_LAUNCH_BLOCKING=1` if multiple devices are used in inference.
 ## Switching Between Thinking and Non-Thinking Mode

 You can use the Qwen3-14B-FP8 model with serveral inference frameworks, including `transformers`, `sglang`, and `vllm`, as the original bfloat16 model.
 However, please pay attention to the following known issues:
 ## Switching Between Thinking and Non-Thinking Mode