Can you post the patch?
I don't have discord so it would be nice to see what it is.
I believe this part is sufficient to run it:
diff --git a/exllamav2/architecture.py b/exllamav2/architecture.py
index b2a6280..67db6ef 100644
--- a/exllamav2/architecture.py
+++ b/exllamav2/architecture.py
@@ -496,7 +496,7 @@ class ExLlamaV2ArchParams:
# Cohere
- if arch_string == "CohereForCausalLM":
+ if arch_string in ("CohereForCausalLM", "Cohere2ForCausalLM"):
arch_recognized = True
self.lm.layer_keys += \
layer_keys_cohere_norms + \
I took a look at the code, and I’m pretty sure that patch isn’t actually doing anything functional? The changes seem to be overwritten later on by the if arch_string == "Cohere2ForCausalLM" part.
I also tested the quants, and they sadly seem to be broken (repetition issues). I think there needs to be more explicit handling in exllamav2 for this model.
Ok, I have removed it from the model card. Hopefully the quants will still be fine when exllamav2 fixes command-a support.
Yeah, let's hope that turboderp can take a look, but he seems to be busy with developing exllamav3. I was planning to create more quantized models, but held off on it because the measurement pass looked problematic and I was concerned the resulting quants would be suboptimal, even if the support gets fixed.