Update README.md
#3
by
medmekk
HF Staff
- opened
README.md
CHANGED
@@ -110,8 +110,6 @@ For convenience and performance, we have provided `fp8`-quantized model checkpoi
|
|
110 |
|
111 |
You can use the Qwen3-14B-FP8 model with serveral inference frameworks, including `transformers`, `sglang`, and `vllm`, as the original bfloat16 model.
|
112 |
However, please pay attention to the following known issues:
|
113 |
-
- `transformers`:
|
114 |
-
- there are currently issues with the "fine-grained fp8" method in `transformers` for distributed inference. You may need to set the environment variable `CUDA_LAUNCH_BLOCKING=1` if multiple devices are used in inference.
|
115 |
|
116 |
## Switching Between Thinking and Non-Thinking Mode
|
117 |
|
|
|
110 |
|
111 |
You can use the Qwen3-14B-FP8 model with serveral inference frameworks, including `transformers`, `sglang`, and `vllm`, as the original bfloat16 model.
|
112 |
However, please pay attention to the following known issues:
|
|
|
|
|
113 |
|
114 |
## Switching Between Thinking and Non-Thinking Mode
|
115 |
|