File size: 4,536 Bytes
642ae27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
614a574
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
642ae27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
614a574
642ae27
 
 
f9bfb36
614a574
642ae27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
614a574
642ae27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9bfb36
 
 
 
 
 
 
642ae27
 
f9bfb36
642ae27
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
library_name: transformers
license: mit
language:
- en
base_model:
- meta-llama/Llama-3.2-1B
pipeline_tag: text-generation
tags:
- prompts
- lora
- fine-tuning
- stable-diffusion
- sdxl
datasets:
- KavinduHansaka/prompt-gen-10k-flux-sdxl

model-index:
- name: Llama-3.2-1B-ImageGen
  results:
  - task:
      type: text-generation
      name: Prompt Generation (Dev)
    dataset:
      type: json
      name: prompt_gen_refined_dev500
    metrics:
    - name: eval_loss
      type: loss
      value: 0.7515
      verified: false
    - name: perplexity
      type: perplexity
      value: 2.12
      verified: false
    - name: avg_target_words
      type: length
      value: 106.8
      verified: false

  - task:
      type: text-generation
      name: Prompt Generation (Test)
    dataset:
      type: json
      name: prompt_gen_refined_test500
    metrics:
    - name: eval_loss
      type: loss
      value: 0.7483
      verified: false
    - name: perplexity
      type: perplexity
      value: 2.11
      verified: false
    - name: avg_target_words
      type: length
      value: 106.7
      verified: false
---

# Llama-3.2-1B — Image Prompt Generation (LoRA Merged)

This repository provides a **LoRA-finetuned & merged** version of `meta-llama/Llama-3.2-1B`, specialized for **image prompt generation**.  
It is designed to create **cinematic, detailed, and structured prompts** for text-to-image models such as **Stable Diffusion XL** and **Flux**.

> **Note:** This is a prompt-generation model, not an instruction/chat model. It is trained to produce concise, creative prompts suitable for diffusion-based image synthesis.

---

## Model Details

- **Maintainer:** [KavinduHansaka](https://huggingface.co/KavinduHansaka)  
- **Base model:** [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)  
- **Model type:** Decoder-only causal LM (1B parameters)  
- **Languages:** English (prompt tags, stylistic descriptors)  
- **License:** MIT  
- **Finetuned with:** LoRA adapters, then merged  
- **Training dataset:** [prompt-gen-10k-flux-sdxl](https://huggingface.co/datasets/KavinduHansaka/prompt-gen-10k-flux-sdxl)  

### Model Sources
- **Merged model repo:** https://huggingface.co/KavinduHansaka/Llama-3.2-1B-ImageGen  
- **LoRA adapter repo:** https://huggingface.co/KavinduHansaka/Llama-3.2-1B-ImageGen-LoRA 
- **Training dataset:** https://huggingface.co/datasets/KavinduHansaka/prompt-gen-10k-flux-sdxl  

---

## What’s Included

- `config.json`, `generation_config.json`  
- Merged model weights (`model.safetensors`)  
- Tokenizer files (`tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`)  

---

## Uses

### Direct Use
- Generate **stylized, cinematic, or structured prompts** for image synthesis models (Stable Diffusion, Flux, SDXL).  

### Downstream Use
- As a **base for further LoRA finetuning** on style-specific datasets.  
- As a **prompt generator inside T2I pipelines**.  

### Out-of-Scope Use
- General-purpose chat.  
- Safety-critical applications.  

---

## How to Get Started

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

REPO_ID = "KavinduHansaka/Llama-3.2-1B-ImageGen"

tok = AutoTokenizer.from_pretrained(REPO_ID)
model = AutoModelForCausalLM.from_pretrained(
    REPO_ID, device_map="auto", torch_dtype=torch.bfloat16
)

prompt = "Create a cinematic noir macro photo with film grain, 1:1 ratio, sharp focus."
inputs = tok(prompt, return_tensors="pt").to(model.device)

out = model.generate(**inputs, max_new_tokens=120, do_sample=True, temperature=0.5, top_p=0.9)
print(tok.decode(out[0], skip_special_tokens=True))
```

---

## Training Details

- **Training data:** [prompt-gen-8k-flux-sdxl](https://huggingface.co/datasets/KavinduHansaka/prompt-gen-10k-flux-sdxl)  
- **Training method:** LoRA with PEFT, adapters merged into base model.  
- **Precision:** bfloat16/float16 during training.  

---

## Technical Specifications

- **Architecture:** LLaMA 3.2 (1B parameters)  
- **Hardware:** NVIDIA GPU ≥6 GB VRAM  
- **Dependencies:** `transformers`, `peft`, `accelerate`, `torch`, `sentencepiece`  

---

## Citation

```
@misc{llama3.2-1b,
title = {LLaMA 3.2 (1B)},
author = {Meta AI},
year = {2024},
url = {https://huggingface.co/meta-llama/Llama-3.2-1B}
}

@misc{llama3.2-1b-promptgen,
  title  = {Llama-3.2-1B Image Prompt Generator (LoRA Merged)},
  author = {Kavindu Hansaka Jayasinghe},
  year   = {2025},
  url    = {https://huggingface.co/KavinduHansaka/Llama-3.2-1B-ImageGen}
}
```