Qwen 3 4B β Multilingual Fine-Tuned
This is a fine-tuned version of Qwen 3 4B, optimized using the agentlans/multilingual-sft dataset to improve performance across 100+ languages and dialects.
Compared to the original Qwen 3 4B, this model focuses on clear, concise outputs, minimizing verbose reasoning. It's designed as a compact, multilingual alternative similar in behaviour to the Aya models.
π‘ Intended Use
- Enhanced multilingual support for over 100 languages
- Generates short, direct answers rather than long chain-of-thought responses
- Suitable for general-purpose multilingual tasks where speed and clarity matter
β οΈ Limitations
- Inherits known biases and limitations from the base Qwen 3 4B model
- Performance may vary across languages and specific domains
- Not intended for highly specialized or low-resource language tasks
- Optimized for single-turn question answering, not for long conversations
π οΈ Training Details
Dataset
- 100β000 samples from
agentlans/multilingual-sft
Method
- Fine-tuned using LoRA (Low-Rank Adaptation)
rank=32
,alpha=64
,dropout=0.3
- Quantized to 4-bit precision using the BnB method
- Attention boosted with FlashAttention 2
Hyperparameters
- Learning rate:
5e-5
- Batch size:
1
(with gradient accumulation for effective batch size of 8) - Optimizer: AdamW (
betas=(0.9, 0.999)
,epsilon=1e-8
) - Scheduler: Cosine decay
- Epochs: 1
- Random seed: 42
Software
peft==0.15.1
transformers==4.51.3
torch==2.6.0+cu124
datasets==3.5.0
tokenizers==0.21.1
π License
This model is released under the Apache 2.0 License.
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support