Gemma-7B in 8-bit with bitsandbytes
This is the repository for Gemma-7B-it quantized to 8-bit using bitsandbytes. Original model card and license for Gemma-7B can be found here. This is the base model and it's not instruction fine-tuned.
Usage
Please visit original Gemma-7B-it model card for intended uses and limitations.
You can use this model like following:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"merve/gemma-7b-it-8bit"
)
from transformers import AutoTokenizer
tokenizer =AutoTokenizer.from_pretrained(
"google/gemma-7b-it"
)
#outputs = model.generate(**input_ids)
chat = [
{ "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
tokenizer.decode(outputs[0])
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.