RMSNorm kernel for ROCm devices from https://github.com/huggingface/hf-rocm-kernels