agentlans
/

Qwen3-4B-multilingual-sft

instruction-tuned

Model card Files Files and versions

Qwen 3 4B – Multilingual Fine-Tuned

This is a fine-tuned version of Qwen 3 4B, optimized using the agentlans/multilingual-sft dataset to improve performance across 100+ languages and dialects.

Compared to the original Qwen 3 4B, this model focuses on clear, concise outputs, minimizing verbose reasoning. It's designed as a compact, multilingual alternative similar in behaviour to the Aya models.

💡 Intended Use

Enhanced multilingual support for over 100 languages
Generates short, direct answers rather than long chain-of-thought responses
Suitable for general-purpose multilingual tasks where speed and clarity matter

⚠️ Limitations

Inherits known biases and limitations from the base Qwen 3 4B model
Performance may vary across languages and specific domains
Not intended for highly specialized or low-resource language tasks
Optimized for single-turn question answering, not for long conversations

🛠️ Training Details

Dataset

100 000 samples from agentlans/multilingual-sft

Method

Fine-tuned using LoRA (Low-Rank Adaptation)
- rank=32, alpha=64, dropout=0.3
Quantized to 4-bit precision using the BnB method
Attention boosted with FlashAttention 2

Hyperparameters

Learning rate: 5e-5
Batch size: 1 (with gradient accumulation for effective batch size of 8)
Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-8)
Scheduler: Cosine decay
Epochs: 1
Random seed: 42

Software

peft==0.15.1
transformers==4.51.3
torch==2.6.0+cu124
datasets==3.5.0
tokenizers==0.21.1

📄 License

This model is released under the Apache 2.0 License.

Downloads last month: 12

Safetensors

Model size

4.02B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentlans/Qwen3-4B-multilingual-sft

Base model

Qwen/Qwen3-4B-Base

Finetuned

Finetuned

(179)

this model

Quantizations

1 model

Dataset used to train agentlans/Qwen3-4B-multilingual-sft