File size: 5,311 Bytes
92ebc45 0257f7f 92ebc45 fd697f2 92ebc45 90cda17 9007d94 7ba226e 92ebc45 f104aed 92ebc45 fc56e5a fd746d7 fc56e5a 92ebc45 4a23e5d 92ebc45 0257f7f f6d911c 92ebc45 106885e 92ebc45 587e76e 92ebc45 587e76e 92ebc45 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
---
license: cc-by-nc-4.0
library_name: transformers
language:
- en
tags:
- writing
base_model:
- maldv/badger-nu-llama-3.1-8B-UltraLong
pipeline_tags:
- text-generation
datasets:
- SillyTilly/fiction-writer-596
---

[GGUF](https://huggingface.co/mradermacher/praxis-bookwriter-llama3.1-8b-sft-GGUF) [iMat](https://huggingface.co/mradermacher/praxis-bookwriter-llama3.1-8b-sft-i1-GGUF)
# Praxis Bookwriter Llama 3.1 8B
My last iteration of fantasy writer suffered from one glaring flaw: It did not really follow instructions well.
After much consideration, I decided it would make sense to introduce some information about the story chapter text
somewhere to link instructions to the text generated.
For this, I took strides of 16,384 tokens across each of the books in the ~140M token dataset, and used R1 to generate a summary of the text. With
some careful modification, I used this to generate the first user turn. Each subsequent assistant turn takes approximately
512 tokens of content, and then the user turn is a chapter header, or one paragraph of content. This alternated until I
consumed the entirity of the original stride.
## Crafting the prompt
The system prompt should contain some variation of:
```text
You are the user's helpful writing assistant.
// Title: The Title of Your Story
// Author: Author Name For Style
// Tags: some comma, delimited list, of genres
```
In an initial test, I tried putting the summary in the system prompt. The result was underwhelming. For this
version, the first user turn should contain an overview of the setting (the summary), with the last line being of the format:
```
// Chapter n
```
The content of this block can contain all variety of instruction about what to write in the proceeding frame. The summaries I used were between 500 and 1500 tokens, so the more detail about setting, location, characters, their relationships, and plot points, the better.
## Training
This model was trained on one Paperspace A6000 using unsloth rsLoRA:
```python
from datasets import load_from_disk
from dotenv import dotenv_values
from unsloth import FastLanguageModel, is_bfloat16_supported
import torch
from transformers import TrainingArguments
from trl import SFTTrainer
import wandb
envconfig = dict(dotenv_values(".env"))
dtype = None
max_seq_length = 24576
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Meta-Llama-3.1-8B",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
model = FastLanguageModel.get_peft_model(
model,
r = 128,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 128**.5,
lora_dropout = 0,
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = True,
loftq_config = None,
)
dataset = load_from_disk('bookdata')
ds_train = dataset
ds_eval = dataset.shuffle(seed=12345).select(range(32))
targs = TrainingArguments(
per_device_train_batch_size = 3,
gradient_accumulation_steps = 4,
learning_rate = 4e-5,
weight_decay = 0,
gradient_checkpointing = True,
max_grad_norm = 1,
warmup_steps = 5,
num_train_epochs = 3,
optim = "paged_adamw_32bit",
lr_scheduler_type = "cosine",
seed = 3407,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
per_device_eval_batch_size = 1,
do_eval = True,
eval_steps = 25,
eval_strategy = "steps",
save_strategy = "steps",
save_steps = 20,
save_total_limit = 3,
output_dir = "outputs",
report_to="wandb",
)
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = ds_train,
eval_dataset = ds_eval,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 6,
packing = False,
args = targs,
)
wandb.login(key=envconfig['wandb_key'])
wandb.init(
project='bookwriter-596',
config={
"learning_rate": 4e-5,
"architecture": 'llama 3.1 8b',
"dataset": 'bookdata',
"epochs": 3,
}
)
#trainer_stats = trainer.train()
trainer.train(resume_from_checkpoint=True)
```

## Merged
The rsLoRA I trained was applied on top of badger-nu-llama-3.1-8B UltraLong, which is RoPE scaled; so in theory
this model should be able to perform at content lengths exceeding my original training data. I say this, but
my training data was limited to sequence lengths of around 20k tokens, so anything after that might be out-of-distribution.
## License
This model is released under the limitations of both the llama3 license and CC-BY-NC-4.0.
## Author
Praxis Maldevide
## Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{praxis-bookwriter-llama3.1-8b-sft,
title = {Praxis Bookwriter Llama3.1 8B},
url = {https://huggingface.co/maldv/praxis-bookwriter-llama3.1-8b-sft},
author = {Praxis Maldevide},
month = {May},
year = {2025}
}
``` |