Stockmark-2-100B-Instruct-beta-AWQ

This repo contains the AWQ-quantized 4-bit version of Stockmark-2-100B-Instruct-beta

Example

Please use the float16 data type when loading the model. The bfloat16 data type is not supported in this model.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ")
model = AutoModelForCausalLM.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ", device_map="auto", torch_dtype=torch.float16)

instruction = "自然言語処理とは？"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": instruction}], add_generation_prompt=True, return_tensors="pt"
).to(model.device)

with torch.inference_mode():
    tokens = model.generate(
        input_ids,
        max_new_tokens = 512,
        do_sample = True,
        temperature = 0.7,
        top_p = 0.95,
        repetition_penalty = 1.05
    )
    
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)

stockmark
/

Stockmark-2-100B-Instruct-beta-AWQ

Stockmark-2-100B-Instruct-beta-AWQ

Example

Model tree for stockmark/Stockmark-2-100B-Instruct-beta-AWQ

Space using stockmark/Stockmark-2-100B-Instruct-beta-AWQ 1

Collection including stockmark/Stockmark-2-100B-Instruct-beta-AWQ

Stokmark-2