CUDA error: misaligned address
#24
by
msi-sbraun-11
- opened
Hi there,
I am trying to use Gemma 3 12b it model to generate QA pairs. The pipeline is defined as follows:
model_id = "google/gemma-3-12b-it" # google/gemma-3-12b-it
bnb_config = BitsAndBytesConfig(
load_in_4bit=True
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype = torch.float32,
device_map="cuda",
quantization_config=bnb_config
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
if tokenizer.pad_token is None:
eos_token_id = model.config.eos_token_id
eos_token = tokenizer.decode(eos_token_id)
tokenizer.pad_token = eos_token # this is a string, which is expected
text_gen_pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
torch_dtype=torch.float32,
top_p = 0.95,
top_k = 70,
temperature = 1.25,
do_sample=True,
repetition_penalty=1.3,
)
llm = HuggingFacePipeline(pipeline=text_gen_pipeline)
model = ChatHuggingFace(llm=llm)
When I use this model using invoke
function, at some point it threw an error:
File "/home/nokia-proj/miniconda3/envs/vrag/lib/python3.10/site-packages/transformers/integrations/sdpa_attent
ion.py", line 54, in sdpa_attention_forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: CUDA error: misaligned address
Any ideas why this error was encountered and how to resolve this?
Thank you!