ZeusLabs (Zeus Labs)

We conducted an experiment in an effort to revive LLaMA 1 33B as it had unique prose and a lack of "GPT-isms" and "slop" in its pretraining data, as well as being one of the favorites at the time. With multiple finetune runs, we were able to extend the model from it's pretrained base of 2048 to ~12,000 tokens adding approx. 500M tokens in the process. The effective length is 16,384 but it's better to keep it on the lower range. It writes well and in multiple formats. In the future, we have some ideas like implementing GQA. Please take a look and we would love to hear your feedback!

ZeusLabs/Chronos-Divergence-33B

elinas

updated a model 12 months ago

ZeusLabs/Chronos-Divergence-33B-exl2-8.0bpw

Updated Sep 11, 2024 • 3

elinas

updated a Space 12 months ago

README

🚀

Steelskull

updated a model about 1 year ago

SteelStorage/L3-Aethora-15B-V2

Text Generation • 15B • Updated Jul 24, 2024 • 13 • 43

Fizzarolli

posted an update about 1 year ago

Post

2685

hi everyone!

i wanted to share an experiment i did with upcycling phi-3 mini into an moe recently.
while benchmarks are definitely within a margin of error and they performed similarly, i think it's an interesting base to try and see if you can improve phi's performance! (maybe looking into HuggingFaceFW/fineweb-edu could be interesting, i also left some other notes if anyone with more compute access wants to try it themselves)

check it out! Fizzarolli/phi3-4x4b-v1

Fizzarolli

posted an update over 1 year ago

Post

3106

Is anyone looking into some sort of decentralized/federated dataset generation or classification by humans instead of synthetically?

From my experience with trying models, a *lot* of modern finetunes are trained on what amounts to, in essence, GPT-4 generated slop that makes everything sound like a rip-off GPT-4 (refer to i.e. the Dolphin finetunes). I have a feeling that this is a lot of the reason people haven't been quite as successful as Meta's instruct tunes of Llama 3.

AI & ML interests

Team members 4

ZeusLabs's activity

README

Special tokens for instruction template

README