GaborMadarasz/Qwen3-0.6B-8bit-gptq_hungarian_news

Model Description

This is a 8-bit GPTQ quantized Qwen3-0.6B model. The calibrating dataset contains only hungarian news.

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Gábor Madarász
Model type: Transformer
Language(s) (NLP): Hungarian, English
License: apache-2.0
Finetuned from model [optional]: Qwen3-0.6B

Uses

Chat in hungarian with "thinking" mode.

Direct Use

This model is capable of better Hungarian than the original Qwen3-0.6B, but it is not perfect.

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "GaborMadarasz/Qwen3-0.6B-8bit-qptq_hungarian_news"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Szia! Mi lehet az autóbalesetek legfőbb okozója?!"
messages = [
    {"role": "user", "content": "Always answer in hungarian!\n" + prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=568,
    do_sample=True,
    temperature=0.6,
    top_k=20,
    repetition_penalty=1.2,
    top_p=0.96,
    
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Evaluation

Compute Infrastructure

Quantized on free Google Colab.

Hardware

1 pcs. NVIDIA T4 GPU

Model Card Contact

[email protected]

GaborMadarasz
/

Qwen3-0.6B-8bit-gptq_hungarian_news

Model Description

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Evaluation

Compute Infrastructure

Hardware

Model Card Contact

Model tree for GaborMadarasz/Qwen3-0.6B-8bit-gptq_hungarian_news

Collection including GaborMadarasz/Qwen3-0.6B-8bit-gptq_hungarian_news

My models