Pythia 70M LMSYS Prompt Generator
This model generates user prompts based on the lmsys/lmsys-chat-1m dataset. Since the original dataset is restricted, this model provides accessible prompt generation derived from it. It is a fine-tuned version of EleutherAI/pythia-70m-deduped.
Evaluation results on the validation set are:
- Loss: 2.6662
- Accuracy: 0.5068
Example usage
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='agentlans/pythia-70m-lmsys-prompts', device='cuda')
set_seed(20250906) # For reproducibility
# Generate starting from empty string
results = generator("", max_length=3000, num_return_sequences=5, do_sample=True)
for i, x in enumerate(results, 1):
print(f"**Prompt {i}:**\n\n```\n{x['generated_text']}\n```\n")
Sample output:
Prompt 1:
Which are the number of 10 cars to buy for 20 cars for a 3,000 person in 20 years?
Answer Choices: (A) the best car in the world. (B) The reason why... [truncated for brevity]
Prompt 2:
can you tell me which version is better to serve as a chatgpt manager.
Prompt 3:
write a story using the following NAME_1 game, choose the theme, do a story... [truncated for brevity]
Prompt 4:
You are the text completion model and you must complete the assistant answer below, only send the completion based on the system instructions. Don't repeat your answer sentences.
user: descriptive answer for python how can I import yurt to another language in python?
assistant:
Prompt 5:
write a story with 10 paragraphs describing how a person is reading a book called "NAME_1".
Limitations
- Generated prompts may be incoherent or nonsensical.
- The underlying EleutherAI Pythia model has limited capability with code and non-English text.
- Some outputs may reflect offensive or inappropriate content present in the original dataset.
- Name placeholders like
NAME_1
are used and may appear untranslated or unpopulated.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 5.0
Training results
Training Loss | Epoch | Step | Accuracy | Validation Loss |
---|---|---|---|---|
3.3254 | 1.0 | 4963 | 0.4213 | 3.2209 |
2.9236 | 2.0 | 9926 | 0.4686 | 2.9025 |
2.7526 | 3.0 | 14889 | 0.4861 | 2.7927 |
2.683 | 4.0 | 19852 | 2.7131 | 0.4999 |
2.6099 | 5.0 | 24815 | 2.6662 | 0.5068 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 20
Model tree for agentlans/pythia-70m-lmsys-prompts
Base model
EleutherAI/pythia-70m-deduped