medmekk HF Staff commited on
Commit
38d6077
·
verified ·
1 Parent(s): 7e0db87

Update README.md

Browse files

same as here https://huggingface.co/Qwen/Qwen3-32B-FP8/discussions/6

Files changed (1) hide show
  1. README.md +0 -2
README.md CHANGED
@@ -110,8 +110,6 @@ For convenience and performance, we have provided `fp8`-quantized model checkpoi
110
 
111
  You can use the Qwen3-14B-FP8 model with serveral inference frameworks, including `transformers`, `sglang`, and `vllm`, as the original bfloat16 model.
112
  However, please pay attention to the following known issues:
113
- - `transformers`:
114
- - there are currently issues with the "fine-grained fp8" method in `transformers` for distributed inference. You may need to set the environment variable `CUDA_LAUNCH_BLOCKING=1` if multiple devices are used in inference.
115
 
116
  ## Switching Between Thinking and Non-Thinking Mode
117
 
 
110
 
111
  You can use the Qwen3-14B-FP8 model with serveral inference frameworks, including `transformers`, `sglang`, and `vllm`, as the original bfloat16 model.
112
  However, please pay attention to the following known issues:
 
 
113
 
114
  ## Switching Between Thinking and Non-Thinking Mode
115