Qwen 3 4B – Multilingual Fine-Tuned

This is a fine-tuned version of Qwen 3 4B, optimized using the agentlans/multilingual-sft dataset to improve performance across 100+ languages and dialects.

Compared to the original Qwen 3 4B, this model focuses on clear, concise outputs, minimizing verbose reasoning. It's designed as a compact, multilingual alternative similar in behaviour to the Aya models.

πŸ’‘ Intended Use

  • Enhanced multilingual support for over 100 languages
  • Generates short, direct answers rather than long chain-of-thought responses
  • Suitable for general-purpose multilingual tasks where speed and clarity matter

⚠️ Limitations

  • Inherits known biases and limitations from the base Qwen 3 4B model
  • Performance may vary across languages and specific domains
  • Not intended for highly specialized or low-resource language tasks
  • Optimized for single-turn question answering, not for long conversations

πŸ› οΈ Training Details

Dataset

Method

  • Fine-tuned using LoRA (Low-Rank Adaptation)
    • rank=32, alpha=64, dropout=0.3
  • Quantized to 4-bit precision using the BnB method
  • Attention boosted with FlashAttention 2

Hyperparameters

  • Learning rate: 5e-5
  • Batch size: 1 (with gradient accumulation for effective batch size of 8)
  • Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-8)
  • Scheduler: Cosine decay
  • Epochs: 1
  • Random seed: 42

Software

  • peft==0.15.1
  • transformers==4.51.3
  • torch==2.6.0+cu124
  • datasets==3.5.0
  • tokenizers==0.21.1

πŸ“„ License

This model is released under the Apache 2.0 License.

Downloads last month
10
Safetensors
Model size
4.02B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for agentlans/Qwen3-4B-multilingual-sft

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(175)
this model
Quantizations
1 model

Dataset used to train agentlans/Qwen3-4B-multilingual-sft