File size: 5,311 Bytes
92ebc45
0257f7f
92ebc45
 
 
 
 
 
 
 
 
fd697f2
 
92ebc45
 
90cda17
 
9007d94
7ba226e
92ebc45
 
 
 
 
 
f104aed
92ebc45
 
 
 
fc56e5a
 
 
 
fd746d7
 
fc56e5a
 
 
 
 
 
92ebc45
4a23e5d
 
92ebc45
 
 
 
 
 
 
0257f7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6d911c
 
92ebc45
 
 
 
106885e
92ebc45
 
 
 
 
 
 
 
 
 
 
 
 
 
587e76e
92ebc45
587e76e
92ebc45
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
license: cc-by-nc-4.0
library_name: transformers
language:
- en
tags:
- writing
base_model:
- maldv/badger-nu-llama-3.1-8B-UltraLong
pipeline_tags:
- text-generation
datasets:
- SillyTilly/fiction-writer-596
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65b19c1b098c85365af5a83e/IYQqDTuu329tfl7IHo8J8.png)

[GGUF](https://huggingface.co/mradermacher/praxis-bookwriter-llama3.1-8b-sft-GGUF) [iMat](https://huggingface.co/mradermacher/praxis-bookwriter-llama3.1-8b-sft-i1-GGUF)

# Praxis Bookwriter Llama 3.1 8B

My last iteration of fantasy writer suffered from one glaring flaw: It did not really follow instructions well. 
After much consideration, I decided it would make sense to introduce some information about the story chapter text
somewhere to link instructions to the text generated.

For this, I took strides of 16,384 tokens across each of the books in the ~140M token dataset, and used R1 to generate a summary of the text. With
some careful modification, I used this to generate the first user turn. Each subsequent assistant turn takes approximately
512 tokens of content, and then the user turn is a chapter header, or one paragraph of content. This alternated until I
consumed the entirity of the original stride.

## Crafting the prompt

The system prompt should contain some variation of:

```text
You are the user's helpful writing assistant.

// Title: The Title of Your Story
// Author: Author Name For Style
// Tags: some comma, delimited list, of genres
```


In an initial test, I tried putting the summary in the system prompt. The result was underwhelming. For this
version, the first user turn should contain an overview of the setting (the summary), with the last line being of the format:

```
// Chapter n
```

The content of this block can contain all variety of instruction about what to write in the proceeding frame. The summaries I used were between 500 and 1500 tokens, so the more detail about setting, location, characters, their relationships, and plot points, the better.

## Training

This model was trained on one Paperspace A6000 using unsloth rsLoRA:

```python
from datasets import load_from_disk
from dotenv import dotenv_values
from unsloth import FastLanguageModel, is_bfloat16_supported
import torch
from transformers import TrainingArguments
from trl import SFTTrainer
import wandb

envconfig = dict(dotenv_values(".env"))

dtype = None
max_seq_length = 24576
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 128,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 128**.5,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = True,
    loftq_config = None,
)

dataset = load_from_disk('bookdata')
ds_train = dataset
ds_eval = dataset.shuffle(seed=12345).select(range(32))

targs = TrainingArguments(
    per_device_train_batch_size = 3,
    gradient_accumulation_steps = 4,
    learning_rate = 4e-5,
    weight_decay = 0,
    gradient_checkpointing = True,
    max_grad_norm = 1,
    warmup_steps = 5,
    num_train_epochs = 3,
    optim = "paged_adamw_32bit",
    lr_scheduler_type = "cosine",
    seed = 3407,
    fp16 = not is_bfloat16_supported(),
    bf16 = is_bfloat16_supported(),
    logging_steps = 1,
    per_device_eval_batch_size = 1,
    do_eval = True,
    eval_steps = 25,
    eval_strategy = "steps",
    save_strategy = "steps",
    save_steps = 20,
    save_total_limit = 3,
    output_dir = "outputs",
    report_to="wandb",
)

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = ds_train,
    eval_dataset = ds_eval,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 6,
    packing = False,
    args = targs,
)

wandb.login(key=envconfig['wandb_key'])
wandb.init(
    project='bookwriter-596',
    config={
        "learning_rate": 4e-5,
        "architecture": 'llama 3.1 8b',
        "dataset": 'bookdata',
        "epochs": 3,
    }
)

#trainer_stats = trainer.train()
trainer.train(resume_from_checkpoint=True)
```

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65b19c1b098c85365af5a83e/WRbDGDT9kv9ZnJFFSIRPJ.png)

## Merged

The rsLoRA I trained was applied on top of badger-nu-llama-3.1-8B UltraLong, which is RoPE scaled; so in theory
this model should be able to perform at content lengths exceeding my original training data. I say this, but
my training data was limited to sequence lengths of around 20k tokens, so anything after that might be out-of-distribution.

## License

This model is released under the limitations of both the llama3 license and CC-BY-NC-4.0.

## Author

Praxis Maldevide

## Citation

If you find our work helpful, feel free to give us a cite.

```
@misc{praxis-bookwriter-llama3.1-8b-sft,
    title = {Praxis Bookwriter Llama3.1 8B},
    url = {https://huggingface.co/maldv/praxis-bookwriter-llama3.1-8b-sft},
    author = {Praxis Maldevide},
    month = {May},
    year = {2025}
}
```