Stockmark-2-100B-Instruct-beta-AWQ

This repo contains the AWQ-quantized 4-bit version of Stockmark-2-100B-Instruct-beta

Example

Please use the float16 data type when loading the model. The bfloat16 data type is not supported in this model.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ")
model = AutoModelForCausalLM.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ", device_map="auto", torch_dtype=torch.float16)

instruction = "自然言語処理とは?"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": instruction}], add_generation_prompt=True, return_tensors="pt"
).to(model.device)

with torch.inference_mode():
    tokens = model.generate(
        input_ids,
        max_new_tokens = 512,
        do_sample = True,
        temperature = 0.7,
        top_p = 0.95,
        repetition_penalty = 1.05
    )
    
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)
Downloads last month
75
Safetensors
Model size
14.4B params
Tensor type
I32
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stockmark/Stockmark-2-100B-Instruct-beta-AWQ

Quantized
(5)
this model

Space using stockmark/Stockmark-2-100B-Instruct-beta-AWQ 1

Collection including stockmark/Stockmark-2-100B-Instruct-beta-AWQ