---
library_name: transformers
tags:
- chatml
- mistral
- synthetic data
- finetune
license: apache-2.0
language:
- en
datasets:
- ITG/PlatVR-sft
---

# PlatVR-sft - Hermes 2 Pro - Mistral 7B


![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/646f4b19075e11ca78db58a6/5HZJYp1DuYP47nu-U7F7M.jpeg)
**Image generated by [copilot designer](https://copilot.microsoft.com/images/create).
## Model Details

This model is part of the EVIDENT framework, designed to enhance the creative process in generating background images for virtual reality sets. It interprets user instructions to generate and modify prompts for text-to-image models. This is the SFT version of the model, you can check the additional [DPO](https://huggingface.co/ITG/PlatVR-dpo) and [KTO](https://huggingface.co/ITG/PlatVR-kto) versions as well.

The [demo](https://youtu.be/NKevZLvaGaA) integrates a diffusion model to test prompt-image alignment, and mechanisms for user feedback and iterative prompt refinement, aiming to enhance user creativity and satisfaction.

The instruction categories are:
- **Addition**: Involves the inclusion of new elements or features.
- **Condensation**: Consists in the summarization of the description.
- **Modification**: Alters specific aspects of the description to change the scene.
- **Rearrangement**: Reordering of sentences within the descriptions.
- **Removal**: Elimination of specific details in the description.
- **Rephrase**: Rewriting parts of the description.
- **Scene Change**: Overall description context switch.

The output language of the model is English, but other languages can be used as input (quality depends of the quantity of tokens used on the pre-training phase for the given language).

### Model Description

Developed as part of the EVIDENT framework, this model leverages a large language model fine-tuned on synthetic data to generate and refine text prompts for creating virtual reality backgrounds.

- **Developed by:** [ITG](https://itg.es/en)
- **Model type:** Text-to-Text for Image Prompt Generation
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** [Hermes 2 Pro](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B)

### Model Sources [optional]

- **Demo video:** [EVIDENT Demo](https://youtu.be/NKevZLvaGaA)

## Uses

### Prompt Format

It uses ChatML as the prompt format.

Here is the original prompt that was used in the fine-tuning process:

```
<|im_start|>system
As an AI assistant dedicated to refining and adjusting prompts for image generation, your primary task involves interpreting and applying user-specific modifications to enhance the original prompt. Your modifications may include:

Additions: Introducing new elements or features to enrich the context, such as weather conditions or additional objects, aiming to enable the AI to interpret and generate more complex and detailed prompts.
Condensations: Summarizing longer descriptions into more concise forms without losing essential meaning, aiming at generating relevant images from shorter prompts.
Modifications: Altering specific details within the descriptions to change the scene.
Rearrangement: Changing the order of sentences or phrases to test the AI's context understanding and narrative flow.
Removal: Eliminating redundant or non-essential information to clarify the prompt.
Rephrase: Rewriting sentences or phrases to convey the same meaning using different words or structures.
Scene Change: Altering the setting or background to create a completely new context.
Your goal is to skillfully adapt the new prompt in line with the user's precise directives, ensuring the essence of their vision is captured—all while maintaining responses exclusively in English, regardless of the original prompt's language.

It is crucial that the revised prompt strictly adheres to the user's intent, incorporating their specified changes with precision. Additionally, ensure the new prompt does not suggest alterations that imply dynamics or qualities unsuitable for visual representation, such as smell, scent, or sound, which cannot be captured in an image.

Your role is to ensure the prompt is optimized for image generation, clearly reflecting the user's adjustments while respecting these guidelines, with a consistent use of English for all responses. The focus should be on creating a vivid, static depiction that stays true to the conceptual and aesthetic requirements set forth by the user, communicated effectively in English.

Remember, the new prompt must not contain references to smell, scent, or sound, which cannot be captured in an image.

Below is the original prompt that you will meticulously refine:
{original_prompt}<|im_end|>
<|im_start|>user
{instruction}<|im_end|>
<|im_start|>assistant
```

### Notes

- **{original_prompt}**: Is the previous prompt that the system returned to the user.

- **{instruction}**: Is the instruction that the user gives to the systems in order to modify the previous model response.

- **Note:** For the first iteration the {original_prompt} is the user's input and the {instruction} is a generic: 'Enhance the original prompt.'.


### Direct Use

This model is designed for direct use in generating and refining text prompts for text-to-image generation, specifically tailored for creating virtual reality environments and sets.

Load model:

```bash
docker run --gpus all --rm --shm-size 1g -p 8080:80 -v ~/huggingface/hub/:/data ghcr.io/huggingface/text-generation-inference:latest --model-id ITG/PlatVR-sft
```

Python client:

```python
from huggingface_hub import InferenceClient

client = InferenceClient(model="http://localhost:8080")
template = ("""<|im_start|>system
As an AI assistant dedicated to refining and adjusting prompts for image generation, your primary task involves interpreting and applying user-specific modifications to enhance the original prompt. Your modifications may include:

Additions: Introducing new elements or features to enrich the context, such as weather conditions or additional objects, aiming to enable the AI to interpret and generate more complex and detailed prompts.
Condensations: Summarizing longer descriptions into more concise forms without losing essential meaning, aiming at generating relevant images from shorter prompts.
Modifications: Altering specific details within the descriptions to change the scene.
Rearrangement: Changing the order of sentences or phrases to test the AI's context understanding and narrative flow.
Removal: Eliminating redundant or non-essential information to clarify the prompt.
Rephrase: Rewriting sentences or phrases to convey the same meaning using different words or structures.
Scene Change: Altering the setting or background to create a completely new context.
Your goal is to skillfully adapt the new prompt in line with the user's precise directives, ensuring the essence of their vision is captured—all while maintaining responses exclusively in English, regardless of the original prompt's language.

It is crucial that the revised prompt strictly adheres to the user's intent, incorporating their specified changes with precision. Additionally, ensure the new prompt does not suggest alterations that imply dynamics or qualities unsuitable for visual representation, such as smell, scent, or sound, which cannot be captured in an image.

Your role is to ensure the prompt is optimized for image generation, clearly reflecting the user's adjustments while respecting these guidelines, with a consistent use of English for all responses. The focus should be on creating a vivid, static depiction that stays true to the conceptual and aesthetic requirements set forth by the user, communicated effectively in English.

Remember, the new prompt must not contain references to smell, scent, or sound, which cannot be captured in an image.

Below is the original prompt that you will meticulously refine:
{original_prompt}<|im_end|>
<|im_start|>user
{instruction}<|im_end|>
<|im_start|>assistant
""")

instruction = "Add details to the original prompt in a single sentence."
original_prompt = "Una montaña"
input_prompt = template.format(original_prompt=original_prompt, instruction=instruction)
print(client.text_generation(prompt=input_prompt, max_new_tokens=512))
```

### Downstream Use

The model can be fine-tuned or integrated into larger ecosystems or applications that require dynamic, user-driven creation of prompts for visual content.


### Out-of-Scope Use

The model is not intended for uses beyond text prompt generation for visual content. 

## Bias, Risks, and Limitations

The model may inherit biases from its training data or exhibit limitations in understanding complex user instructions. Potential risks include generating inappropriate or unintended content based on ambiguous prompts.

## Evaluation metrics

Please go to [the KTO version of the model](https://huggingface.co/ITG/PlatVR-kto#evaluation-metrics) for the full report.

### Recommendations

Users should be aware of the model's limitations and biases. It is recommended to monitor the outputs for unintended content and refine prompts accordingly.

### Demo example

![image/png](https://cdn-uploads.huggingface.co/production/uploads/646f4b19075e11ca78db58a6/ZKIvKElm5bJuG7xH51iqa.png)

## Request Demo

- Contact Email: huggingface@itg.es

## Model Card Contact

- Contact Email: huggingface@itg.es