Praxis Bookwriter Llama 3.1 8B
My last iteration of fantasy writer suffered from one glaring flaw: It did not really follow instructions well. After much consideration, I decided it would make sense to introduce some information about the story chapter text somewhere to link instructions to the text generated.
For this, I took strides of 16,384 tokens across each of the books in the ~140M token dataset, and used R1 to generate a summary of the text. With some careful modification, I used this to generate the first user turn. Each subsequent assistant turn takes approximately 512 tokens of content, and then the user turn is a chapter header, or one paragraph of content. This alternated until I consumed the entirity of the original stride.
Crafting the prompt
The system prompt should contain some variation of:
You are the user's helpful writing assistant.
// Title: The Title of Your Story
// Author: Author Name For Style
// Tags: some comma, delimited list, of genres
In an initial test, I tried putting the summary in the system prompt. The result was underwhelming. For this version, the first user turn should contain an overview of the setting (the summary), with the last line being of the format:
// Chapter n
The content of this block can contain all variety of instruction about what to write in the proceeding frame. The summaries I used were between 500 and 1500 tokens, so the more detail about setting, location, characters, their relationships, and plot points, the better.
Training
This model was trained on one Paperspace A6000 using unsloth rsLoRA:
from datasets import load_from_disk
from dotenv import dotenv_values
from unsloth import FastLanguageModel, is_bfloat16_supported
import torch
from transformers import TrainingArguments
from trl import SFTTrainer
import wandb
envconfig = dict(dotenv_values(".env"))
dtype = None
max_seq_length = 24576
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Meta-Llama-3.1-8B",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
model = FastLanguageModel.get_peft_model(
model,
r = 128,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 128**.5,
lora_dropout = 0,
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = True,
loftq_config = None,
)
dataset = load_from_disk('bookdata')
ds_train = dataset
ds_eval = dataset.shuffle(seed=12345).select(range(32))
targs = TrainingArguments(
per_device_train_batch_size = 3,
gradient_accumulation_steps = 4,
learning_rate = 4e-5,
weight_decay = 0,
gradient_checkpointing = True,
max_grad_norm = 1,
warmup_steps = 5,
num_train_epochs = 3,
optim = "paged_adamw_32bit",
lr_scheduler_type = "cosine",
seed = 3407,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
per_device_eval_batch_size = 1,
do_eval = True,
eval_steps = 25,
eval_strategy = "steps",
save_strategy = "steps",
save_steps = 20,
save_total_limit = 3,
output_dir = "outputs",
report_to="wandb",
)
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = ds_train,
eval_dataset = ds_eval,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 6,
packing = False,
args = targs,
)
wandb.login(key=envconfig['wandb_key'])
wandb.init(
project='bookwriter-596',
config={
"learning_rate": 4e-5,
"architecture": 'llama 3.1 8b',
"dataset": 'bookdata',
"epochs": 3,
}
)
#trainer_stats = trainer.train()
trainer.train(resume_from_checkpoint=True)
Merged
The rsLoRA I trained was applied on top of badger-nu-llama-3.1-8B UltraLong, which is RoPE scaled; so in theory this model should be able to perform at content lengths exceeding my original training data. I say this, but my training data was limited to sequence lengths of around 20k tokens, so anything after that might be out-of-distribution.
License
This model is released under the limitations of both the llama3 license and CC-BY-NC-4.0.
Author
Praxis Maldevide
Citation
If you find our work helpful, feel free to give us a cite.
@misc{praxis-bookwriter-llama3.1-8b-sft,
title = {Praxis Bookwriter Llama3.1 8B},
url = {https://huggingface.co/maldv/praxis-bookwriter-llama3.1-8b-sft},
author = {Praxis Maldevide},
month = {May},
year = {2025}
}
- Downloads last month
- 60