SemiQwenn - Distilled Qwen2.5 7B

Model Description

SemiQwenn-7B is a distilled version of Devstral knowledge transferred to the Qwen2.5-7B architecture through Supervised Fine-Tuning (SFT) distillation. The model learns from Devstral's responses on the training dataset, effectively capturing the teacher model's capabilities while leveraging the enhanced capacity of the 7B parameter architecture. This model was created as part of a datathon project focused on efficient language model training and deployment.

Model Details

  • Model Name: SemiQwenn-7B
  • Student Model: Qwen2.5-7B
  • Teacher Model: Devstral
  • Model Size: 7 billion parameters
  • Training Method: SFT (Supervised Fine-Tuning) Distillation with QLoRA adapters
  • Language(s): English (primary), with multilingual capabilities inherited from base model
  • License: Same as base Qwen2.5 model
  • Model Type: Causal Language Model

Training Details

Training Data

  • Dataset: Code Alpaca + GSM8K (30k samples)
  • Training Split: Stratified split for balanced learning
  • Data Format: JSONL format with instruction-response pairs

Training Configuration

  • Training Method: QLoRA (Quantized Low-Rank Adaptation)
  • Teacher Model: Devstral (for SFT distillation)
  • Training Framework: Transformers/PEFT
  • Hardware: GPU-optimized training with quantization

Training Process

  • Fine-tuned using SFT distillation from Devstral (teacher) to Qwen2.5-7B (student)
  • QLoRA adapters applied to the student model for memory-efficient training
  • Adapters merged with base model for final deployment
  • Optimized to transfer Devstral's knowledge to the larger Qwen architecture

Performance

GSM8K Evaluation Results

  • Model demonstrates superior performance on mathematical reasoning tasks
  • Evaluation results available in project evaluation files
  • Comparison with base models, teacher model, and smaller variants included
  • Expected to show best performance among the SemiQwenn model family

Resource Usage

  • Higher computational requirements than smaller variants
  • Best performance-to-size ratio in the SemiQwenn family
  • Suitable for applications where performance is prioritized over efficiency

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model
model = AutoModelForCausalLM.from_pretrained("alfiwillianz/SemiQwenn-7b")
tokenizer = AutoTokenizer.from_pretrained("alfiwillianz/SemiQwenn-7b")

# Example usage
prompt = "Solve this math problem: What is 15 * 24?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Model Architecture

  • Architecture: Transformer-based decoder-only model
  • Attention: Multi-head attention mechanism
  • Vocabulary Size: Inherited from Qwen2.5 tokenizer
  • Context Length: Supports extended context as per base model
  • Parameters: 7 billion parameters for enhanced capacity

Intended Use

Primary Use Cases

  • Mathematical reasoning and complex problem solving
  • Advanced code generation and understanding
  • Educational applications requiring high accuracy
  • Research in efficient language models
  • Production applications requiring strong performance
  • Complex reasoning tasks

Out-of-Scope Uses

  • This model should not be used for generating harmful, biased, or inappropriate content
  • Not suitable for high-stakes decision making without human oversight
  • Not designed for real-time critical applications

Limitations and Biases

  • Higher computational requirements compared to smaller models
  • May inherit biases from training data and base model
  • Performance may vary on tasks outside the training distribution
  • Limited by the knowledge cutoff of the base model
  • Requires more memory and compute resources for inference

Ethical Considerations

  • Model outputs should be reviewed for accuracy, especially in educational contexts
  • Users should be aware of potential biases and limitations
  • Appropriate safeguards should be implemented for production use
  • Consider computational impact and resource usage

Citation

If you use SemiQwenn-7B in your research or applications, please cite:

@misc{semiqwenn7b2025,
  title={SemiQwenn-7B: A Distilled Qwen2.5 7B Model},
  author={Alfi Willianz},
  year={2025},
  note={Knowledge distilled model based on Qwen2.5-7B}
}

Acknowledgments

  • Built upon Qwen2.5 by Alibaba Cloud
  • Training methodology inspired by knowledge distillation techniques
  • Part of Datathon 2025 project on efficient language models
  • QLoRA methodology for efficient large model training

Model Files

This directory contains:

  • Merged model weights combining QLoRA adapters with base model
  • Tokenizer configuration
  • Model configuration files
  • Training artifacts and logs

Contact

For questions about this model or the training process, please refer to the project documentation or contact the development team.


Downloads last month
9
Safetensors
Model size
7.62B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for alfiwillianz/SemiQwenn-7b

Datasets used to train alfiwillianz/SemiQwenn-7b

Collection including alfiwillianz/SemiQwenn-7b