Model Card for Model ID

Phi-2-chat-v05 is a finetuned version of Phi-2 to increase the model's understanding of instructions and multi-turn conversations. In essence: it now has a concept of shutting up after an answer is given - as opposed to just switching into random generator mode.

Finetuning used 25k records from the dataset HuggingFaceH4/ultrachat_200k

Prompt format

<|system|>
You are a helpful assistant....
<|user|>
Why is the sky blue?
<|assistant|>
The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere [...]
<|user|>
Who was the phenomenon named after?
<|assistant|>

The system generates its output after the special token <|assistant|>. You need to have that token in the input for a reliable response. Or you can use the tokenizer's chat_template, as shown below.

How to use it?

Dependencies

pip install -u torch[cuda] transformers einops

Code for inference.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "WeeRobots/phi-2-chat-v05"

model = AutoModelForCausalLM.from_pretrained(model_id, device_map={"": 0}, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)

payload = tokenizer.apply_chat_template([
    { 'role': 'system', 'content': '''You are a state machine. The user will add state slot values and you'll keep track of them.''' },
    { 'role': 'user', 'content': '''Place 15 into slot apple''' },
    { 'role': 'assistant', 'content': '''Roger that.''' },
    { 'role': 'user', 'content': '''Bananas slot should be 20''' },
    { 'role': 'assistant', 'content': '''Certainly''' },
    { 'role': 'user', 'content': '''What is the value of Apple + Banana?''' },
], tokenize=False, add_generation_prompt=True,)
device = "cuda"
model_input = tokenizer(payload, return_tensors="pt").to(device)
with torch.no_grad():
  # IMPORTANT: always set the eos_token_id in this call. the model is trained to emit the eos_token the right time
  # but it might continue generating irrelevant text. this way the model will stop at the right place
  model_response = model.generate(**model_input, max_new_tokens=512, eos_token_id=tokenizer.eos_token_id, )
  print(tokenizer.decode(model_result[0], skip_special_tokens=False))

Non production quality

Be aware that this model tuning wasn't thoroughly tested, and isn't meant to be used in production, only for experimentation or hobby projects.

Downloads last month
100
Safetensors
Model size
2.78B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Datasets used to train WeeRobots/phi-2-chat-v05

Space using WeeRobots/phi-2-chat-v05 1