|
--- |
|
license: cc-by-nc-4.0 |
|
library_name: transformers |
|
tags: |
|
- llama3 |
|
--- |
|
|
|
 |
|
|
|
[GGUF](https://huggingface.co/mradermacher/badger-nu-llama-3.1-8B-UltraLong-GGUF) [iMat](https://huggingface.co/mradermacher/badger-nu-llama-3.1-8B-UltraLong-i1-GGUF) |
|
|
|
# Badger ν Llama 3.1 8B UltraLong Instruct |
|
|
|
Badger is a *recursive normalized denoised fourier interpolation* of the following models: |
|
|
|
```python |
|
# Badger Nu |
|
models = [ |
|
('Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct', 'Llama-3.1-8B-Instruct'), |
|
('Skywork-o1-Open-Llama-3.1-8B', 'Llama-3.1-8B-Instruct'), |
|
('Dolphin3.0-Llama3.1-8B', 'Llama-3.1-8B'), |
|
('Llama-3.1-Nemotron-Nano-8B-v1', 'Llama-3.1-8B-Instruct'), |
|
('cogito-v1-preview-llama-8B', 'Llama-3.1-8B'), |
|
('Llama-3.1-Tulu-3.1-8B', 'Llama-3.1-8B'), |
|
('DeepHermes-3-Llama-3-8B-Preview', 'Llama-3.1-8B'), |
|
('Fireball-R1.1-Llama-3.1-8B', 'Llama-3.1-8B'), |
|
('OpenMath2-Llama3.1-8B', 'Llama-3.1-8B-Instruct'), |
|
('Foundation-Sec-8B', 'Llama-3.1-8B'), |
|
('Bio-Medical-Llama-3-8B', 'Meta-Llama-3-8B-Instruct'), |
|
('Llama-3.1-Hawkish-8B', 'Llama-3.1-8B-Instruct'), |
|
('Einstein-v6.1-Llama3-8B', 'Meta-Llama-3-8B'), |
|
('Llama-3-Instruct-8B-SimPO-v0.2', 'Meta-Llama-3-8B-Instruct'), |
|
('Llama-3.1_OpenScholar-8B', 'Llama-3.1-8B-Instruct'), |
|
('L3-8B-Stheno-v3.2', 'Meta-Llama-3-8B-Instruct'), |
|
('L3.1-EtherealRainbow-v1.0-rc1-8B', 'Llama-3.1-8B-Instruct'), |
|
('Llama3.1-8B-ShiningValiant2', 'Llama-3.1-8B-Instruct'), |
|
('Pantheon-RP-1.0-8b-Llama-3', 'Meta-Llama-3-8B'), |
|
('SillyTilly-SlopJob-8b-RP-ForFree', 'Meta-Llama-3-8B'), |
|
('opus-v1.2-llama-3-8b-base-run3.4-epoch2', 'Meta-Llama-3-8B'), |
|
('llama-3-fantasy-writer-8b', 'Meta-Llama-3-8B-Instruct'), |
|
('Llama-3.1-SuperNova-Lite', 'Llama-3.1-8B-Instruct'), |
|
] |
|
task_add = [ |
|
('meta-llama-3-8b-instruct-hf-ortho-baukit-2fail-128total', 'Meta-Llama-3-8B-Instruct') |
|
] |
|
all_models = models + task_add |
|
model_path = "./models/l38/" |
|
in_model = "Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct" |
|
out_model = 'Llama-3.1-SuperNova-Lite' |
|
root_model = 'Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct' |
|
``` |
|
|
|
* with thanks to [NVIDIA](https://huggingface.co/nvidia), [Arcee](https://huggingface.co/arcee-ai), [Nous](https://huggingface.co/NousResearch), the geniuses in [SillyTilly](https://huggingface.co/SillyTilly), [Cognitive Computations](https://huggingface.co/cognitivecomputations), and all of the other AI labs and independent model creators for your hard work! |
|
|
|
Llama 3 may be the last open model trained in the US based on the highly valuable [LibGen](https://libgen.is/) data set. While the usage of |
|
this dataset has been highly controversial, there is no arguing that it represents some of the finest text data that mankind has produced. |
|
|
|
In light of this, and given the open model community has made a lot of advancements since my last release of Badger Mu, I thought it might be time to give Llama 3 8B another look. |
|
|
|
One of the primary motivators of this decision was [Unsloth publishing turnkey GRPO notebooks](https://docs.unsloth.ai/basics/reasoning-grpo-and-rl), which I found to be quite easy to run on Paperspace A6000s using the shivamb25/unsloth-dev container. I'm really excited to try this model as the basis for my further experiments. |
|
|
|
### Format |
|
|
|
Use the Llama 3 Instruct format. |
|
|
|
### Models |
|
|
|
 |
|
|
|
We have a few strong clusters of models - UltraLong being the most different, is the base; the reasoning models bear a lot of similarity; and then we have a diversity of unique models for the latter group. |
|
|
|
## Correspondence to |
|
Praxis Maldevide ([email protected]) |
|
|
|
## Citation |
|
<pre> |
|
@article{badger-nu, |
|
title={Llama 3 Is All You Need: LibGen Is The Best Source Of Human Textual Data}, |
|
author={Praxis Maldevide}, |
|
journal={None}, |
|
year={2025} |
|
} |
|
</pre> |