Pythia 70M LMSYS Prompt Generator

This model generates user prompts based on the lmsys/lmsys-chat-1m dataset. Since the original dataset is restricted, this model provides accessible prompt generation derived from it. It is a fine-tuned version of EleutherAI/pythia-70m-deduped.

Evaluation results on the validation set are:

Loss: 2.6662
Accuracy: 0.5068

Example usage

from transformers import pipeline, set_seed

generator = pipeline('text-generation', model='agentlans/pythia-70m-lmsys-prompts', device='cuda')

set_seed(20250906) # For reproducibility
# Generate starting from empty string
results = generator("", max_length=3000, num_return_sequences=5, do_sample=True)

for i, x in enumerate(results, 1):
    print(f"**Prompt {i}:**\n\n```\n{x['generated_text']}\n```\n")

Sample output:

Prompt 1:

Which are the number of 10 cars to buy for 20 cars for a 3,000 person in 20 years?
Answer Choices: (A) the best car in the world. (B) The reason why... [truncated for brevity]

Prompt 2:

can you tell me which version is better to serve as a chatgpt manager.

Prompt 3:

write a story using the following NAME_1 game, choose the theme, do a story... [truncated for brevity]

Prompt 4:

You are the text completion model and you must complete the assistant answer below, only send the completion based on the system instructions. Don't repeat your answer sentences.
user: descriptive answer for python how can I import yurt to another language in python?
assistant:

Prompt 5:

write a story with 10 paragraphs describing how a person is reading a book called "NAME_1".

Limitations

Generated prompts may be incoherent or nonsensical.
The underlying EleutherAI Pythia model has limited capability with code and non-English text.
Some outputs may reflect offensive or inappropriate content present in the original dataset.
Name placeholders like NAME_1 are used and may appear untranslated or unpopulated.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Accuracy	Validation Loss
3.3254	1.0	4963	0.4213	3.2209
2.9236	2.0	9926	0.4686	2.9025
2.7526	3.0	14889	0.4861	2.7927
2.683	4.0	19852	2.7131	0.4999
2.6099	5.0	24815	2.6662	0.5068

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.21.0

Downloads last month: 6

Safetensors

Model size

70.4M params

Tensor type

F32

Model tree for agentlans/pythia-70m-lmsys-prompts

Base model

EleutherAI/pythia-70m-deduped

Finetuned

(117)

this model

Datasets used to train agentlans/pythia-70m-lmsys-prompts

Evaluation results

Metadata error: specify a dataset to view leaderboard