Model Description
This is a 8-bit GPTQ quantized Qwen3-0.6B model. The calibrating dataset contains only hungarian news.
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: Gábor Madarász
- Model type: Transformer
- Language(s) (NLP): Hungarian, English
- License: apache-2.0
- Finetuned from model [optional]: Qwen3-0.6B
Uses
Chat in hungarian with "thinking" mode.
Direct Use
This model is capable of better Hungarian than the original Qwen3-0.6B, but it is not perfect.
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "GaborMadarasz/Qwen3-0.6B-8bit-qptq_hungarian_news"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Szia! Mi lehet az autóbalesetek legfőbb okozója?!"
messages = [
{"role": "user", "content": "Always answer in hungarian!\n" + prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=568,
do_sample=True,
temperature=0.6,
top_k=20,
repetition_penalty=1.2,
top_p=0.96,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)
Evaluation
Compute Infrastructure
Quantized on free Google Colab.
Hardware
1 pcs. NVIDIA T4 GPU
Model Card Contact
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for GaborMadarasz/Qwen3-0.6B-8bit-gptq_hungarian_news
Collection including GaborMadarasz/Qwen3-0.6B-8bit-gptq_hungarian_news
Collection
Models I have created / trained
•
5 items
•
Updated