Model Description

This is a 8-bit GPTQ quantized Qwen3-0.6B model. The calibrating dataset contains only hungarian news.

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: Gábor Madarász
  • Model type: Transformer
  • Language(s) (NLP): Hungarian, English
  • License: apache-2.0
  • Finetuned from model [optional]: Qwen3-0.6B

Uses

Chat in hungarian with "thinking" mode.

Direct Use

This model is capable of better Hungarian than the original Qwen3-0.6B, but it is not perfect.

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "GaborMadarasz/Qwen3-0.6B-8bit-qptq_hungarian_news"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Szia! Mi lehet az autóbalesetek legfőbb okozója?!"
messages = [
    {"role": "user", "content": "Always answer in hungarian!\n" + prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=568,
    do_sample=True,
    temperature=0.6,
    top_k=20,
    repetition_penalty=1.2,
    top_p=0.96,
    
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Evaluation

Compute Infrastructure

Quantized on free Google Colab.

Hardware

1 pcs. NVIDIA T4 GPU

Model Card Contact

[email protected]

Downloads last month
25
Safetensors
Model size
270M params
Tensor type
I32
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GaborMadarasz/Qwen3-0.6B-8bit-gptq_hungarian_news

Finetuned
Qwen/Qwen3-0.6B
Quantized
(92)
this model

Collection including GaborMadarasz/Qwen3-0.6B-8bit-gptq_hungarian_news