badger-nu-llama-3.1-8B-UltraLong / README.md

Update README.md

901c17d verified about 1 month ago

4 kB

	---
	license: cc-by-nc-4.0
	library_name: transformers
	tags:
	- llama3
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65b19c1b098c85365af5a83e/FBQCFBZbgm0FpC4tBegIh.png)

	[GGUF](https://huggingface.co/mradermacher/badger-nu-llama-3.1-8B-UltraLong-GGUF) [iMat](https://huggingface.co/mradermacher/badger-nu-llama-3.1-8B-UltraLong-i1-GGUF)

	# Badger ν Llama 3.1 8B UltraLong Instruct

	Badger is a recursive normalized denoised fourier interpolation of the following models:

	```python
	# Badger Nu
	models = [
	('Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct', 'Llama-3.1-8B-Instruct'),
	('Skywork-o1-Open-Llama-3.1-8B', 'Llama-3.1-8B-Instruct'),
	('Dolphin3.0-Llama3.1-8B', 'Llama-3.1-8B'),
	('Llama-3.1-Nemotron-Nano-8B-v1', 'Llama-3.1-8B-Instruct'),
	('cogito-v1-preview-llama-8B', 'Llama-3.1-8B'),
	('Llama-3.1-Tulu-3.1-8B', 'Llama-3.1-8B'),
	('DeepHermes-3-Llama-3-8B-Preview', 'Llama-3.1-8B'),
	('Fireball-R1.1-Llama-3.1-8B', 'Llama-3.1-8B'),
	('OpenMath2-Llama3.1-8B', 'Llama-3.1-8B-Instruct'),
	('Foundation-Sec-8B', 'Llama-3.1-8B'),
	('Bio-Medical-Llama-3-8B', 'Meta-Llama-3-8B-Instruct'),
	('Llama-3.1-Hawkish-8B', 'Llama-3.1-8B-Instruct'),
	('Einstein-v6.1-Llama3-8B', 'Meta-Llama-3-8B'),
	('Llama-3-Instruct-8B-SimPO-v0.2', 'Meta-Llama-3-8B-Instruct'),
	('Llama-3.1_OpenScholar-8B', 'Llama-3.1-8B-Instruct'),
	('L3-8B-Stheno-v3.2', 'Meta-Llama-3-8B-Instruct'),
	('L3.1-EtherealRainbow-v1.0-rc1-8B', 'Llama-3.1-8B-Instruct'),
	('Llama3.1-8B-ShiningValiant2', 'Llama-3.1-8B-Instruct'),
	('Pantheon-RP-1.0-8b-Llama-3', 'Meta-Llama-3-8B'),
	('SillyTilly-SlopJob-8b-RP-ForFree', 'Meta-Llama-3-8B'),
	('opus-v1.2-llama-3-8b-base-run3.4-epoch2', 'Meta-Llama-3-8B'),
	('llama-3-fantasy-writer-8b', 'Meta-Llama-3-8B-Instruct'),
	('Llama-3.1-SuperNova-Lite', 'Llama-3.1-8B-Instruct'),
	]
	task_add = [
	('meta-llama-3-8b-instruct-hf-ortho-baukit-2fail-128total', 'Meta-Llama-3-8B-Instruct')
	]
	all_models = models + task_add
	model_path = "./models/l38/"
	in_model = "Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct"
	out_model = 'Llama-3.1-SuperNova-Lite'
	root_model = 'Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct'
	```

	* with thanks to [NVIDIA](https://huggingface.co/nvidia), [Arcee](https://huggingface.co/arcee-ai), [Nous](https://huggingface.co/NousResearch), the geniuses in [SillyTilly](https://huggingface.co/SillyTilly), [Cognitive Computations](https://huggingface.co/cognitivecomputations), and all of the other AI labs and independent model creators for your hard work!

	Llama 3 may be the last open model trained in the US based on the highly valuable [LibGen](https://libgen.is/) data set. While the usage of
	this dataset has been highly controversial, there is no arguing that it represents some of the finest text data that mankind has produced.

	In light of this, and given the open model community has made a lot of advancements since my last release of Badger Mu, I thought it might be time to give Llama 3 8B another look.

	One of the primary motivators of this decision was [Unsloth publishing turnkey GRPO notebooks](https://docs.unsloth.ai/basics/reasoning-grpo-and-rl), which I found to be quite easy to run on Paperspace A6000s using the shivamb25/unsloth-dev container. I'm really excited to try this model as the basis for my further experiments.

	### Format

	Use the Llama 3 Instruct format.

	### Models

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65b19c1b098c85365af5a83e/VSKVfdRFdAVe-LaO7gnif.png)

	We have a few strong clusters of models - UltraLong being the most different, is the base; the reasoning models bear a lot of similarity; and then we have a diversity of unique models for the latter group.

	## Correspondence to
	Praxis Maldevide ([email protected])

	## Citation
	<pre>
	@article{badger-nu,
	title={Llama 3 Is All You Need: LibGen Is The Best Source Of Human Textual Data},
	author={Praxis Maldevide},
	journal={None},
	year={2025}
	}
	</pre>