Instructions to use google/gemma-7b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-7b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-7b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use google/gemma-7b-it with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="google/gemma-7b-it",
	filename="gemma-7b-it.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use google/gemma-7b-it with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf google/gemma-7b-it
# Run inference directly in the terminal:
llama-cli -hf google/gemma-7b-it

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf google/gemma-7b-it
# Run inference directly in the terminal:
llama-cli -hf google/gemma-7b-it

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf google/gemma-7b-it
# Run inference directly in the terminal:
./llama-cli -hf google/gemma-7b-it

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf google/gemma-7b-it
# Run inference directly in the terminal:
./build/bin/llama-cli -hf google/gemma-7b-it

Use Docker

docker model run hf.co/google/gemma-7b-it

LM Studio
Jan

vLLM

How to use google/gemma-7b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-7b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-7b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-7b-it

SGLang

How to use google/gemma-7b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-7b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-7b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-7b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-7b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use google/gemma-7b-it with Ollama:
```
ollama run hf.co/google/gemma-7b-it
```

Unsloth Studio new

How to use google/gemma-7b-it with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for google/gemma-7b-it to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for google/gemma-7b-it to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for google/gemma-7b-it to start chatting

Docker Model Runner
How to use google/gemma-7b-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-7b-it
```

Lemonade

How to use google/gemma-7b-it with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull google/gemma-7b-it

Run and chat with the model

lemonade run user.gemma-7b-it-{{QUANT_TAG}}

List all available models

lemonade list

New discussion

Resources

View closed (63)

Install & run google/gemma-7b-it easily using llmpm

#101 opened 2 months ago by

sarthak-saxena

Asking for Access

#100 opened 4 months ago by

Bonhemde

Request: DOI

#98 opened 12 months ago by

JacksonOsvaldo

Access to the gated repo & gemma-7b-it model from hugging face

#97 opened over 1 year ago by

hk199

Request: DOI

#96 opened over 1 year ago by

GOjira491

Request: DOI

#95 opened almost 2 years ago by

naman2k23

Rename README.md to !huggingface-cli login

#94 opened almost 2 years ago by

Thanhtran2209

Rename README.md to !huggingface-cli login

#91 opened about 2 years ago by

mandyLO

Bug in logits for BOS token.

#90 opened about 2 years ago by

Izarel

Inference with RTX 3090 got OOM

➕ 1

#89 opened about 2 years ago by

kathylee

gemma 支持agent能力吗？是需要自己微调出来吗

#87 opened about 2 years ago by

qijizhuahuli

[AUTOMATED] Model Memory Requirements

#84 opened about 2 years ago by

model-sizer-bot

[AUTOMATED] Model Memory Requirements

#83 opened about 2 years ago by

model-sizer-bot

Weird Output "emphat emphat emphat"

#82 opened about 2 years ago by

bill13031

ValueError: Trying to set a tensor of shape torch.Size([4096, 3072]) in "weight" (which has shape torch.Size([6291456, 1])), this look incorrect.

#81 opened about 2 years ago by

Subhasisdasgupta

add_special_tokens=False results in poor generation

👀 1

#80 opened about 2 years ago by

DMaksimov

MPS does not support cumsum op with int64 input

#79 opened about 2 years ago by

adityapotdar

"No output was generated. Something went wrong."

#78 opened about 2 years ago by

JJJJJPSYCHIC

Can several different prompts be handled together?

#77 opened about 2 years ago by

WENJINLIU

PermissionError

#75 opened about 2 years ago by

Youngdong2

问题2

#74 opened about 2 years ago by

wqljwj

Difficulty importing Pipeline - AttributeError: module 'keras._tf_keras.keras' has no attribute 'internal'

#71 opened about 2 years ago by

mqureshi

How to set a system instruction?

👍 2

#69 opened about 2 years ago by

areumtecnologia

Need info on pre-training and instruction-tuning data

#64 opened about 2 years ago by

markding

Import Error when trying to quantize and get any of the gemma models working on my local machine

👍 1

#63 opened about 2 years ago by

Prajwalll

undefined symbol error

#62 opened about 2 years ago by

Cgodwin

My vibe-check of Gemma-7B-it. It's pretty good!

#60 opened about 2 years ago by

harpreetsahota

How to get a different responce from the model using the same input

#59 opened about 2 years ago by deleted

Racial discrimination just like in Gemini

#57 opened about 2 years ago by

justinian336

gemma-7b-it doesn't answer for some questions and returns '/n'

#55 opened about 2 years ago by

mudogruer

gemma-2b-it model works but gemma-7b-it model generates errors

#51 opened about 2 years ago by

saurabhkumar

Could not find GemmaForCausalLM neither in <module 'transformers.models.gemma'

#44 opened about 2 years ago by

chenwei1984

Instrct version models keep missing the first few letters of the answer

#43 opened about 2 years ago by

cooldog

<pad> spam issue

#40 opened about 2 years ago by

Zewsic

Is it a joke?😅

👍 6

#39 opened about 2 years ago by

Horned

Buggy GGUF Output

😔 2

#38 opened about 2 years ago by

mattjcly

Alignment Issues

🤝👍 9

#37 opened about 2 years ago by deleted

Bug about number generation?

#30 opened about 2 years ago by

myownskyW7

Install & run google/gemma-7b-it easily using llmpm

Asking for Access

Request: DOI

Access to the gated repo & gemma-7b-it model from hugging face

Request: DOI

Request: DOI

Rename README.md to !huggingface-cli login

Rename README.md to !huggingface-cli login

Bug in logits for BOS token.

Inference with RTX 3090 got OOM

gemma 支持agent能力吗？是需要自己微调出来吗

[AUTOMATED] Model Memory Requirements

[AUTOMATED] Model Memory Requirements

Weird Output "emphat emphat emphat"

ValueError: Trying to set a tensor of shape torch.Size([4096, 3072]) in "weight" (which has shape torch.Size([6291456, 1])), this look incorrect.

add_special_tokens=False results in poor generation

MPS does not support cumsum op with int64 input

"No output was generated. Something went wrong."

Can several different prompts be handled together?

PermissionError

问题2

Difficulty importing Pipeline - AttributeError: module 'keras._tf_keras.keras' has no attribute '__internal__'

How to set a system instruction?

Need info on pre-training and instruction-tuning data

Import Error when trying to quantize and get any of the gemma models working on my local machine

undefined symbol error

My vibe-check of Gemma-7B-it. It's pretty good!

How to get a different responce from the model using the same input

Racial discrimination just like in Gemini

gemma-7b-it doesn't answer for some questions and returns '/n'

gemma-2b-it model works but gemma-7b-it model generates errors

Could not find GemmaForCausalLM neither in <module 'transformers.models.gemma'

Instrct version models keep missing the first few letters of the answer

<pad> spam issue

Is it a joke?😅

Buggy GGUF Output

Alignment Issues

Bug about number generation?

Difficulty importing Pipeline - AttributeError: module 'keras._tf_keras.keras' has no attribute 'internal'