Stokmark-2
Collection
3 items
•
Updated
•
1
This repo contains the AWQ-quantized 4-bit version of Stockmark-2-100B-Instruct-beta
Please use the float16 data type when loading the model. The bfloat16 data type is not supported in this model.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ")
model = AutoModelForCausalLM.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ", device_map="auto", torch_dtype=torch.float16)
instruction = "自然言語処理とは?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": instruction}], add_generation_prompt=True, return_tensors="pt"
).to(model.device)
with torch.inference_mode():
tokens = model.generate(
input_ids,
max_new_tokens = 512,
do_sample = True,
temperature = 0.7,
top_p = 0.95,
repetition_penalty = 1.05
)
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)
Base model
stockmark/Stockmark-2-100B-Instruct-beta