FP8 UE8M0 Support

#5
by haili-tian - opened

Regarding UE8M0 support in the demo inference @github (inference/model.py and kernel.py), I suspect the current implementation may be insufficient.

In the Linear class, weight.scale might need to be a single-byte type to properly store UE8M0 scale values.
Additionally, the fp8_gemm function should be updated to accept UE8M0-formatted scales for both the scale(a_s) and weight.scale(b_s) arguments. These UE8M0 scales represent only the exponent portion of the full fp32 scale. The computation within fp8_gemm() would then directly utilize these exponent-based UE8M0 values.

Sign up or log in to comment