Bleta-Meditor 27B GRPO Albanian Reasoning Model

Model Description

  • Developed by: klei aliaj
  • Model type: Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
  • License: apache-2.0
  • Finetuned from model: Bleta-Meditor 27B (based on Gemma 3 architecture)
  • Language: Albanian
  • Framework: Hugging Face Transformers

This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.

Capabilities & Training

Fine-tuning Approach

This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:

  1. Follow a specific reasoning format with dedicated sections for workings and solutions
  2. Produce correct mathematical solutions in Albanian
  3. Show clear step-by-step reasoning processes

Special Formatting

The model has been trained to follow a specific reasoning format:

  • Working out/reasoning sections are enclosed within <start_working_out> and <end_working_out> tags
  • Final solutions are provided between <SOLUTION> and </SOLUTION> tags

Training Configuration

  • Framework: Hugging Face's TRL library
  • Optimization: LoRA fine-tuning (r=8, alpha=8)
  • Reward Functions: Format adherence, answer accuracy, and reasoning quality
  • Language Focus: Optimized for Albanian

Technical Specifications

Available Formats

This model is available in two formats:

  • Standard adapter format (adapter_model.safetensors)
  • GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp

Bleta-Meditor Architecture Benefits

  • 27B parameters
  • 128K context window
  • QK normalization
  • 5 sliding + 1 global attention pattern
  • 1024 sliding window attention
  • Albanian language optimization

Limitations

  • While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
  • The model's performance might vary depending on problem complexity and wording.
  • Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.

Acknowledgments

  • Google for developing the Gemma 3 architecture
  • Hugging Face for their TRL library and GRPO implementation

Citation

If you use this model in your research, please cite:

@misc{klei_aliaj_bleta_meditor,
  author = {Klei Aliaj},
  title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
}
Downloads last month
136
GGUF
Model size
27B params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including klei1/bleta-meditor-27b