SemiQwenn - Distilled Qwen2.5 7B

Model Description

SemiQwenn-7B is a distilled version of Devstral knowledge transferred to the Qwen2.5-7B architecture through Supervised Fine-Tuning (SFT) distillation. The model learns from Devstral's responses on the training dataset, effectively capturing the teacher model's capabilities while leveraging the enhanced capacity of the 7B parameter architecture. This model was created as part of a datathon project focused on efficient language model training and deployment.

Model Details

Model Name: SemiQwenn-7B
Student Model: Qwen2.5-7B
Teacher Model: Devstral
Model Size: 7 billion parameters
Training Method: SFT (Supervised Fine-Tuning) Distillation with QLoRA adapters
Language(s): English (primary), with multilingual capabilities inherited from base model
License: Same as base Qwen2.5 model
Model Type: Causal Language Model

Training Details

Training Data

Dataset: Code Alpaca + GSM8K (30k samples)
Training Split: Stratified split for balanced learning
Data Format: JSONL format with instruction-response pairs

Training Configuration

Training Method: QLoRA (Quantized Low-Rank Adaptation)
Teacher Model: Devstral (for SFT distillation)
Training Framework: Transformers/PEFT
Hardware: GPU-optimized training with quantization

Training Process

Fine-tuned using SFT distillation from Devstral (teacher) to Qwen2.5-7B (student)
QLoRA adapters applied to the student model for memory-efficient training
Adapters merged with base model for final deployment
Optimized to transfer Devstral's knowledge to the larger Qwen architecture

Performance

GSM8K Evaluation Results

Model demonstrates superior performance on mathematical reasoning tasks
Evaluation results available in project evaluation files
Comparison with base models, teacher model, and smaller variants included
Expected to show best performance among the SemiQwenn model family

Resource Usage

Higher computational requirements than smaller variants
Best performance-to-size ratio in the SemiQwenn family
Suitable for applications where performance is prioritized over efficiency

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model
model = AutoModelForCausalLM.from_pretrained("alfiwillianz/SemiQwenn-7b")
tokenizer = AutoTokenizer.from_pretrained("alfiwillianz/SemiQwenn-7b")

# Example usage
prompt = "Solve this math problem: What is 15 * 24?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Model Architecture

Architecture: Transformer-based decoder-only model
Attention: Multi-head attention mechanism
Vocabulary Size: Inherited from Qwen2.5 tokenizer
Context Length: Supports extended context as per base model
Parameters: 7 billion parameters for enhanced capacity

Intended Use

Primary Use Cases

Mathematical reasoning and complex problem solving
Advanced code generation and understanding
Educational applications requiring high accuracy
Research in efficient language models
Production applications requiring strong performance
Complex reasoning tasks

Out-of-Scope Uses

This model should not be used for generating harmful, biased, or inappropriate content
Not suitable for high-stakes decision making without human oversight
Not designed for real-time critical applications

Limitations and Biases

Higher computational requirements compared to smaller models
May inherit biases from training data and base model
Performance may vary on tasks outside the training distribution
Limited by the knowledge cutoff of the base model
Requires more memory and compute resources for inference

Ethical Considerations

Model outputs should be reviewed for accuracy, especially in educational contexts
Users should be aware of potential biases and limitations
Appropriate safeguards should be implemented for production use
Consider computational impact and resource usage

Citation

If you use SemiQwenn-7B in your research or applications, please cite:

@misc{semiqwenn7b2025,
  title={SemiQwenn-7B: A Distilled Qwen2.5 7B Model},
  author={Alfi Willianz},
  year={2025},
  note={Knowledge distilled model based on Qwen2.5-7B}
}

Acknowledgments

Built upon Qwen2.5 by Alibaba Cloud
Training methodology inspired by knowledge distillation techniques
Part of Datathon 2025 project on efficient language models
QLoRA methodology for efficient large model training

Model Files

This directory contains:

Merged model weights combining QLoRA adapters with base model
Tokenizer configuration
Model configuration files
Training artifacts and logs

Contact

For questions about this model or the training process, please refer to the project documentation or contact the development team.

alfiwillianz
/

SemiQwenn-7b