FP8 UE8M0 Support

by haili-tian - opened 29 days ago

29 days ago

Regarding UE8M0 support in the demo inference @github (inference/model.py and kernel.py), I suspect the current implementation may be insufficient.

In the Linear class, weight.scale might need to be a single-byte type to properly store UE8M0 scale values.
Additionally, the fp8_gemm function should be updated to accept UE8M0-formatted scales for both the scale(a_s) and weight.scale(b_s) arguments. These UE8M0 scales represent only the exponent portion of the full fp32 scale. The computation within fp8_gemm() would then directly utilize these exponent-based UE8M0 values.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment