FP8 UE8M0 Support
#5
by
haili-tian
- opened
Regarding UE8M0 support in the demo inference @github (inference/model.py and kernel.py), I suspect the current implementation may be insufficient.
In the Linear class, weight.scale might need to be a single-byte type to properly store UE8M0 scale values.
Additionally, the fp8_gemm function should be updated to accept UE8M0-formatted scales for both the scale(a_s) and weight.scale(b_s) arguments. These UE8M0 scales represent only the exponent portion of the full fp32 scale. The computation within fp8_gemm() would then directly utilize these exponent-based UE8M0 values.