Badger ν Llama 3.1 8B UltraLong Instruct
Badger is a recursive normalized denoised fourier interpolation of the following models:
# Badger Nu
models = [
('Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct', 'Llama-3.1-8B-Instruct'),
('Skywork-o1-Open-Llama-3.1-8B', 'Llama-3.1-8B-Instruct'),
('Dolphin3.0-Llama3.1-8B', 'Llama-3.1-8B'),
('Llama-3.1-Nemotron-Nano-8B-v1', 'Llama-3.1-8B-Instruct'),
('cogito-v1-preview-llama-8B', 'Llama-3.1-8B'),
('Llama-3.1-Tulu-3.1-8B', 'Llama-3.1-8B'),
('DeepHermes-3-Llama-3-8B-Preview', 'Llama-3.1-8B'),
('Fireball-R1.1-Llama-3.1-8B', 'Llama-3.1-8B'),
('OpenMath2-Llama3.1-8B', 'Llama-3.1-8B-Instruct'),
('Foundation-Sec-8B', 'Llama-3.1-8B'),
('Bio-Medical-Llama-3-8B', 'Meta-Llama-3-8B-Instruct'),
('Llama-3.1-Hawkish-8B', 'Llama-3.1-8B-Instruct'),
('Einstein-v6.1-Llama3-8B', 'Meta-Llama-3-8B'),
('Llama-3-Instruct-8B-SimPO-v0.2', 'Meta-Llama-3-8B-Instruct'),
('Llama-3.1_OpenScholar-8B', 'Llama-3.1-8B-Instruct'),
('L3-8B-Stheno-v3.2', 'Meta-Llama-3-8B-Instruct'),
('L3.1-EtherealRainbow-v1.0-rc1-8B', 'Llama-3.1-8B-Instruct'),
('Llama3.1-8B-ShiningValiant2', 'Llama-3.1-8B-Instruct'),
('Pantheon-RP-1.0-8b-Llama-3', 'Meta-Llama-3-8B'),
('SillyTilly-SlopJob-8b-RP-ForFree', 'Meta-Llama-3-8B'),
('opus-v1.2-llama-3-8b-base-run3.4-epoch2', 'Meta-Llama-3-8B'),
('llama-3-fantasy-writer-8b', 'Meta-Llama-3-8B-Instruct'),
('Llama-3.1-SuperNova-Lite', 'Llama-3.1-8B-Instruct'),
]
task_add = [
('meta-llama-3-8b-instruct-hf-ortho-baukit-2fail-128total', 'Meta-Llama-3-8B-Instruct')
]
all_models = models + task_add
model_path = "./models/l38/"
in_model = "Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct"
out_model = 'Llama-3.1-SuperNova-Lite'
root_model = 'Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct'
- with thanks to NVIDIA, Arcee, Nous, the geniuses in SillyTilly, Cognitive Computations, and all of the other AI labs and independent model creators for your hard work!
Llama 3 may be the last open model trained in the US based on the highly valuable LibGen data set. While the usage of this dataset has been highly controversial, there is no arguing that it represents some of the finest text data that mankind has produced.
In light of this, and given the open model community has made a lot of advancements since my last release of Badger Mu, I thought it might be time to give Llama 3 8B another look.
One of the primary motivators of this decision was Unsloth publishing turnkey GRPO notebooks, which I found to be quite easy to run on Paperspace A6000s using the shivamb25/unsloth-dev container. I'm really excited to try this model as the basis for my further experiments.
Format
Use the Llama 3 Instruct format.
Models
We have a few strong clusters of models - UltraLong being the most different, is the base; the reasoning models bear a lot of similarity; and then we have a diversity of unique models for the latter group.
Correspondence to
Praxis Maldevide ([email protected])
Citation
@article{badger-nu, title={Llama 3 Is All You Need: LibGen Is The Best Source Of Human Textual Data}, author={Praxis Maldevide}, journal={None}, year={2025} }
- Downloads last month
- 0