8B FP 16 weights

Prompt format is the same as Llama 3: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ Standard context length of 8192

This is a testing model trained on a custom 100MB dataset for 4 epochs geared for storytelling with a rolling context window, but might be good at other things too. There's significant evidence that the model is undertrained, longer training runs are baking now.

The dataset was constructed from cleaned long form dialogue, restructured, and then summarized with Llama-70B, and temporally stacked so that the summary of the past dialogue begins the next dialogue. Almost all samples were between 7500-8192 tokens long.

Downloads last month
16
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Blackroot/Llama-3-Gamma-Twist

Quantizations
3 models

Spaces using Blackroot/Llama-3-Gamma-Twist 6