|
Here's an example of preparing input for model.generate(), using the Zephyr assistant model: |
|
thon |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
checkpoint = "HuggingFaceH4/zephyr-7b-beta" |
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
|
model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here |
|
messages = [ |
|
{ |
|
"role": "system", |
|
"content": "You are a friendly chatbot who always responds in the style of a pirate", |
|
}, |
|
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, |
|
] |
|
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt") |
|
print(tokenizer.decode(tokenized_chat[0])) |
|
This will yield a string in the input format that Zephyr expects.text |
|
<|system|> |
|
You are a friendly chatbot who always responds in the style of a pirate |
|
<|user|> |
|
How many helicopters can a human eat in one sitting? |
|
<|assistant|> |
|
|
|
Now that our input is formatted correctly for Zephyr, we can use the model to generate a response to the user's question: |
|
python |
|
outputs = model.generate(tokenized_chat, max_new_tokens=128) |
|
print(tokenizer.decode(outputs[0])) |
|
This will yield: |
|
text |
|
<|system|> |
|
You are a friendly chatbot who always responds in the style of a pirate</s> |
|
<|user|> |
|
How many helicopters can a human eat in one sitting?</s> |
|
<|assistant|> |
|
Matey, I'm afraid I must inform ye that humans cannot eat helicopters. |