base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- gguf
license: apache-2.0
language:
- en
- es
datasets:
- Kukedlc/dpo-orpo-spanish-15k
library_name: transformers
Fine-Tuned Model
fjmgAI/b1-R1-Zero-3B-GGUF
Base Model
unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
Fine-Tuning Method
Fine-tuning was performed using unsloth
, an efficient fine-tuning framework optimized for low-resource environments and Huggingface's TRL library.
Dataset
Description
A Spanish-language dataset containing 15,000 examples, designed for Direct Preference Optimization (DPO) or Outcome-Regularized Preference Optimization (ORPO).
Adaptation
The dataset was adapted to a reasoning-based format for GPRO, enhancing its ability to guide preference-based decision-making during fine-tuning. This adaptation ensures better alignment with instruction-following tasks in Spanish.
Fine-Tuning Details
- The model was trained using the GPRO algorithm, leveraging structured preference data to refine its response generation.
- The model was fine-tuned to maintain its 4-bit quantization (
bnb-4bit
) for memory efficiency while aligning its outputs with the characteristics of the Spanish dataset. - The focus was on retaining the model's instructional abilities while improving its understanding and generation of Spanish text.
Purpose
This fine-tuned model is intended for Spanish-language applications that require efficient AI that follows instructions using a lightweight reasoning process.
- Developed by: fjmgAI
- License: apache-2.0