Qwen3-Embedding-4B Encoder Router for Combined_Routing_Dataset_V1.3
This is a trained encoder router model based on Qwen/Qwen3-Embedding-4B that intelligently selects between different language models. The router was trained on the Combined_Routing_Dataset_V1.3, which contains data from the Zubi collection.
Model Description
- Base Model: Qwen/Qwen3-Embedding-4B
- Training Dataset: hazyresearch/Combined_Routing_Dataset_V1.3
- Routing Target: Llama family models only
- Loss Function: Focal Loss (Ξ±=1.0, Ξ³=2.0)
Dataset Information
The Combined_Routing_Dataset_V1.3 is a combined routing dataset that merges multiple specialized datasets to create a comprehensive training set. This dataset includes:
- Multi-domain coverage: Combined data from various conversational and reasoning domains
- Diverse query types: Mix of conversational, reasoning, and task-oriented queries
- Balanced representation: Carefully curated combination of different data sources
- Enhanced generalization: Training on combined data improves router performance across different scenarios
This combined approach enables the router to handle a wider variety of user queries and make better routing decisions across different domains.
Training Configuration
{
"model_name": "Qwen/Qwen3-Embedding-4B",
"max_length": 2048,
"mlp_hidden_dims": [
2560,
1280,
640
],
"num_epochs": 30,
"batch_size": 32,
"learning_rate": 3e-05,
"warmup_ratio": 0.1,
"weight_decay": 0.01,
"dropout_rate": 0.1,
"gradient_accumulation_steps": 4,
"max_grad_norm": 1.0,
"use_amp": true,
"use_multi_gpu": false,
"multi_gpu_strategy": "dataparallel",
"num_gpus": null,
"dataset_paths": [
"hazyresearch/Combined_Routing_Dataset_V1.3"
],
"max_rows": null,
"use_pareto_optimal": false,
"use_cheapest_best": false,
"single_best_model": false,
"filter_solvable": false,
"excluded_models": [],
"llama_family_only": true,
"output_dir": "checkpoints/Combined_Routing_Dataset_V1.3/Qwen3-Embedding-4_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL_V1.2",
"use_wandb": true,
"early_stopping_patience": 3,
"early_stopping_min_delta": 1e-05,
"loss_type": "focal",
"focal_alpha": 1.0,
"focal_gamma": 2.0,
"temperature": 1.0,
"seed": 42
}
Training Command
python3 train_encoder_router.py \
--model_name Qwen/Qwen3-Embedding-4B \
--dataset_paths hazyresearch/Combined_Routing_Dataset_V1.3 \
--output_dir checkpoints/Combined_Routing_Dataset_V1.3/Qwen3-Embedding-4B_3e-05_32_30ep \
--use_wandb \
--num_epochs 30 \
--batch_size 32 \
--learning_rate 3e-05 \
--llama_family_only \
--loss_type focal \
--focal_alpha 1.0 \
--focal_gamma 2.0
Evaluation Command
python3 evaluation/eval_encoder_router.py \
--model_path training/checkpoints/Combined_Routing_Dataset_V1.3/Qwen3-Embedding-4B_3e-05_32_30ep/best_model.pt \
--config_path training/checkpoints/Combined_Routing_Dataset_V1.3/Qwen3-Embedding-4B_3e-05_32_30ep/config.json \
--eval_dataset_path hazyresearch/Combined_Routing_Dataset_V1.3
How It Works
This encoder router uses a transformer encoder with an MLP classification head to predict which model will perform best on a given query. The training process involves:
- Multi-label Classification: The model learns to predict correctness probabilities for multiple target models simultaneously
- Focal Loss Training: Uses focal loss to handle class imbalance and focus on hard examples
- Llama Family Focus: Specialized for routing among Llama family models
- Early Stopping: Training with patience-based early stopping to prevent overfitting
Usage
To use this model with the Zubi routing system:
from routing.classes.EncoderRouter import RouterEvaluator
import torch
# Load the trained router
model_path = "path/to/best_model.pt"
config_path = "path/to/config.json"
# Initialize evaluator
evaluator = RouterEvaluator(model_path, config_path)
# Route a query
query = "Your query here"
selected_model = evaluator.predict_best_model(query)
print(f"Selected model: {selected_model}")
Repository Structure
βββ best_model.pt # Trained model checkpoint
βββ config.json # Training configuration
βββ README.md # This file
License
This model is released under the Apache 2.0 License.
More Information
For more details about the encoder router system, training procedures, and evaluation methods, please refer to the Encoder Router README in the Zubi repository.
- Downloads last month
- -