Llama-3.2-1B — Image Prompt Generation (LoRA Merged)
This repository provides a LoRA-finetuned & merged version of meta-llama/Llama-3.2-1B
, specialized for image prompt generation.
It is designed to create cinematic, detailed, and structured prompts for text-to-image models such as Stable Diffusion XL and Flux.
Note: This is a prompt-generation model, not an instruction/chat model. It is trained to produce concise, creative prompts suitable for diffusion-based image synthesis.
Model Details
- Maintainer: KavinduHansaka
- Base model: meta-llama/Llama-3.2-1B
- Model type: Decoder-only causal LM (1B parameters)
- Languages: English (prompt tags, stylistic descriptors)
- License: MIT
- Finetuned with: LoRA adapters, then merged
- Training dataset: prompt-gen-10k-flux-sdxl
Model Sources
- Merged model repo: https://huggingface.co/KavinduHansaka/Llama-3.2-1B-ImageGen
- LoRA adapter repo: https://huggingface.co/KavinduHansaka/Llama-3.2-1B-ImageGen-LoRA
- Training dataset: https://huggingface.co/datasets/KavinduHansaka/prompt-gen-10k-flux-sdxl
What’s Included
config.json
,generation_config.json
- Merged model weights (
model.safetensors
) - Tokenizer files (
tokenizer.json
,tokenizer_config.json
,special_tokens_map.json
)
Uses
Direct Use
- Generate stylized, cinematic, or structured prompts for image synthesis models (Stable Diffusion, Flux, SDXL).
Downstream Use
- As a base for further LoRA finetuning on style-specific datasets.
- As a prompt generator inside T2I pipelines.
Out-of-Scope Use
- General-purpose chat.
- Safety-critical applications.
How to Get Started
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
REPO_ID = "KavinduHansaka/Llama-3.2-1B-ImageGen"
tok = AutoTokenizer.from_pretrained(REPO_ID)
model = AutoModelForCausalLM.from_pretrained(
REPO_ID, device_map="auto", torch_dtype=torch.bfloat16
)
prompt = "Create a cinematic noir macro photo with film grain, 1:1 ratio, sharp focus."
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=120, do_sample=True, temperature=0.5, top_p=0.9)
print(tok.decode(out[0], skip_special_tokens=True))
Training Details
- Training data: prompt-gen-8k-flux-sdxl
- Training method: LoRA with PEFT, adapters merged into base model.
- Precision: bfloat16/float16 during training.
Technical Specifications
- Architecture: LLaMA 3.2 (1B parameters)
- Hardware: NVIDIA GPU ≥6 GB VRAM
- Dependencies:
transformers
,peft
,accelerate
,torch
,sentencepiece
Citation
@misc{llama3.2-1b,
title = {LLaMA 3.2 (1B)},
author = {Meta AI},
year = {2024},
url = {https://huggingface.co/meta-llama/Llama-3.2-1B}
}
@misc{llama3.2-1b-promptgen,
title = {Llama-3.2-1B Image Prompt Generator (LoRA Merged)},
author = {Kavindu Hansaka Jayasinghe},
year = {2025},
url = {https://huggingface.co/KavinduHansaka/Llama-3.2-1B-ImageGen}
}
- Downloads last month
- 45
Model tree for KavinduHansaka/Llama-3.2-1B-ImageGen
Dataset used to train KavinduHansaka/Llama-3.2-1B-ImageGen
Evaluation results
- eval_loss on prompt_gen_refined_dev500self-reported0.751
- perplexity on prompt_gen_refined_dev500self-reported2.120
- avg_target_words on prompt_gen_refined_dev500self-reported106.800
- eval_loss on prompt_gen_refined_test500self-reported0.748
- perplexity on prompt_gen_refined_test500self-reported2.110
- avg_target_words on prompt_gen_refined_test500self-reported106.700