TareksLab
/

Progenitor-V5-Broken-LLaMa-70B

Text Generation

text-generation-inference

Model card Files Files and versions Community

Tarek07 commited on Feb 16

Commit

0383ccc

·

verified ·

1 Parent(s): 2364d82

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -12,7 +12,9 @@ tags:
 - merge
 license: llama3.3
 ---
-**User y-ryan discovered an issue where the model had invalid tensor.Shape for weights ([1, 8192]), raising following errors when loading with transformers, and fixed it here: [tmfi-us/Progenitor-V5-Final-LLaMa-70B](https://huggingface.co/tmfi-us/Progenitor-V5-Final-LLaMa-70B) I have no clue what is the reason, but despite that I was still able to use and even quant this model?! Testing the fixed version and this gave me different outputs too, with this version's stuff being really good?. If anyone understands this I would love to hear about it.**
 This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
 # merge

 - merge
 license: llama3.3
 ---
+**Upon further testing I found some logic issues!!**
+//User y-ryan discovered an issue where the model had invalid tensor.Shape for weights ([1, 8192]), raising following errors when loading with transformers, and fixed it here: [tmfi-us/Progenitor-V5-Final-LLaMa-70B](https://huggingface.co/tmfi-us/Progenitor-V5-Final-LLaMa-70B) I have no clue what is the reason, but despite that I was still able to use and even quant this model?! Testing the fixed version and this gave me different outputs too, with this version's stuff being good?. If anyone understands this I would love to hear about it.//
 This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
 # merge