Bleta-Meditor 27B GRPO Albanian Reasoning Model

Model Description

Developed by: klei aliaj
Model type: Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
License: apache-2.0
Finetuned from model: Bleta-Meditor 27B (based on Gemma 3 architecture)
Language: Albanian
Framework: Hugging Face Transformers

This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.

Capabilities & Training

Fine-tuning Approach

This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:

Follow a specific reasoning format with dedicated sections for workings and solutions
Produce correct mathematical solutions in Albanian
Show clear step-by-step reasoning processes

Special Formatting

The model has been trained to follow a specific reasoning format:

Working out/reasoning sections are enclosed within <start_working_out> and <end_working_out> tags
Final solutions are provided between <SOLUTION> and </SOLUTION> tags

Training Configuration

Framework: Hugging Face's TRL library
Optimization: LoRA fine-tuning (r=8, alpha=8)
Reward Functions: Format adherence, answer accuracy, and reasoning quality
Language Focus: Optimized for Albanian

Technical Specifications

Available Formats

This model is available in two formats:

Standard adapter format (adapter_model.safetensors)
GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp

Bleta-Meditor Architecture Benefits

27B parameters
128K context window
QK normalization
5 sliding + 1 global attention pattern
1024 sliding window attention
Albanian language optimization

Limitations

While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
The model's performance might vary depending on problem complexity and wording.
Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.

Acknowledgments

Google for developing the Gemma 3 architecture
Hugging Face for their TRL library and GRPO implementation

Citation

If you use this model in your research, please cite:

@misc{klei_aliaj_bleta_meditor,
  author = {Klei Aliaj},
  title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
}

klei1
/

bleta-meditor-27b