Blackroot/Llama-3-Gamma-Twist

8B FP 16 weights

Prompt format is the same as Llama 3: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ Standard context length of 8192

This is a testing model trained on a custom 100MB dataset for 4 epochs geared for storytelling with a rolling context window, but might be good at other things too. There's significant evidence that the model is undertrained, longer training runs are baking now.

The dataset was constructed from cleaned long form dialogue, restructured, and then summarized with Llama-70B, and temporally stacked so that the summary of the past dialogue begins the next dialogue. Almost all samples were between 7500-8192 tokens long.

Blackroot
/

Llama-3-Gamma-Twist

Model tree for Blackroot/Llama-3-Gamma-Twist

Spaces using Blackroot/Llama-3-Gamma-Twist 6