Qwen3-Embedding-4B Encoder Router for Combined_Routing_Dataset_V1.3

This is a trained encoder router model based on Qwen/Qwen3-Embedding-4B that intelligently selects between different language models. The router was trained on the Combined_Routing_Dataset_V1.3, which contains data from the Zubi collection.

Model Description

  • Base Model: Qwen/Qwen3-Embedding-4B
  • Training Dataset: hazyresearch/Combined_Routing_Dataset_V1.3
  • Routing Target: Llama family models only
  • Loss Function: Focal Loss (Ξ±=1.0, Ξ³=2.0)

Dataset Information

The Combined_Routing_Dataset_V1.3 is a combined routing dataset that merges multiple specialized datasets to create a comprehensive training set. This dataset includes:

  • Multi-domain coverage: Combined data from various conversational and reasoning domains
  • Diverse query types: Mix of conversational, reasoning, and task-oriented queries
  • Balanced representation: Carefully curated combination of different data sources
  • Enhanced generalization: Training on combined data improves router performance across different scenarios

This combined approach enables the router to handle a wider variety of user queries and make better routing decisions across different domains.

Training Configuration

{
  "model_name": "Qwen/Qwen3-Embedding-4B",
  "max_length": 2048,
  "mlp_hidden_dims": [
    2560,
    1280,
    640
  ],
  "num_epochs": 30,
  "batch_size": 32,
  "learning_rate": 3e-05,
  "warmup_ratio": 0.1,
  "weight_decay": 0.01,
  "dropout_rate": 0.1,
  "gradient_accumulation_steps": 4,
  "max_grad_norm": 1.0,
  "use_amp": true,
  "use_multi_gpu": false,
  "multi_gpu_strategy": "dataparallel",
  "num_gpus": null,
  "dataset_paths": [
    "hazyresearch/Combined_Routing_Dataset_V1.3"
  ],
  "max_rows": null,
  "use_pareto_optimal": false,
  "use_cheapest_best": false,
  "single_best_model": false,
  "filter_solvable": false,
  "excluded_models": [],
  "llama_family_only": true,
  "output_dir": "checkpoints/Combined_Routing_Dataset_V1.3/Qwen3-Embedding-4_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL_V1.2",
  "use_wandb": true,
  "early_stopping_patience": 3,
  "early_stopping_min_delta": 1e-05,
  "loss_type": "focal",
  "focal_alpha": 1.0,
  "focal_gamma": 2.0,
  "temperature": 1.0,
  "seed": 42
}

Training Command

python3 train_encoder_router.py \
    --model_name Qwen/Qwen3-Embedding-4B \
    --dataset_paths hazyresearch/Combined_Routing_Dataset_V1.3 \
    --output_dir checkpoints/Combined_Routing_Dataset_V1.3/Qwen3-Embedding-4B_3e-05_32_30ep \
    --use_wandb \
    --num_epochs 30 \
    --batch_size 32 \
    --learning_rate 3e-05 \
    --llama_family_only \
    --loss_type focal \
    --focal_alpha 1.0 \
    --focal_gamma 2.0

Evaluation Command

python3 evaluation/eval_encoder_router.py \
    --model_path training/checkpoints/Combined_Routing_Dataset_V1.3/Qwen3-Embedding-4B_3e-05_32_30ep/best_model.pt \
    --config_path training/checkpoints/Combined_Routing_Dataset_V1.3/Qwen3-Embedding-4B_3e-05_32_30ep/config.json \
    --eval_dataset_path hazyresearch/Combined_Routing_Dataset_V1.3

How It Works

This encoder router uses a transformer encoder with an MLP classification head to predict which model will perform best on a given query. The training process involves:

  1. Multi-label Classification: The model learns to predict correctness probabilities for multiple target models simultaneously
  2. Focal Loss Training: Uses focal loss to handle class imbalance and focus on hard examples
  3. Llama Family Focus: Specialized for routing among Llama family models
  4. Early Stopping: Training with patience-based early stopping to prevent overfitting

Usage

To use this model with the Zubi routing system:

from routing.classes.EncoderRouter import RouterEvaluator
import torch

# Load the trained router
model_path = "path/to/best_model.pt"
config_path = "path/to/config.json"

# Initialize evaluator
evaluator = RouterEvaluator(model_path, config_path)

# Route a query
query = "Your query here"
selected_model = evaluator.predict_best_model(query)
print(f"Selected model: {selected_model}")

Repository Structure

β”œβ”€β”€ best_model.pt     # Trained model checkpoint
β”œβ”€β”€ config.json       # Training configuration
└── README.md         # This file

License

This model is released under the Apache 2.0 License.

More Information

For more details about the encoder router system, training procedures, and evaluation methods, please refer to the Encoder Router README in the Zubi repository.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for hazyresearch/Qwen3-Embedding-4_for_Combined_Dataset_V1.3

Base model

Qwen/Qwen3-4B-Base
Finetuned
(1)
this model

Collection including hazyresearch/Qwen3-Embedding-4_for_Combined_Dataset_V1.3