README.md · fjmgAI/b1-R1-Zero-3B-GGUF at main

metadata

base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen2
  - gguf
license: apache-2.0
language:
  - en
  - es
datasets:
  - Kukedlc/dpo-orpo-spanish-15k
library_name: transformers

Fine-Tuned Model

fjmgAI/b1-R1-Zero-3B-GGUF

Base Model

unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit

Fine-Tuning Method

Fine-tuning was performed using unsloth, an efficient fine-tuning framework optimized for low-resource environments and Huggingface's TRL library.

Dataset

Kukedlc/dpo-orpo-spanish-15k

Description

A Spanish-language dataset containing 15,000 examples, designed for Direct Preference Optimization (DPO) or Outcome-Regularized Preference Optimization (ORPO).

Adaptation

The dataset was adapted to a reasoning-based format for GPRO, enhancing its ability to guide preference-based decision-making during fine-tuning. This adaptation ensures better alignment with instruction-following tasks in Spanish.

Fine-Tuning Details

The model was trained using the GPRO algorithm, leveraging structured preference data to refine its response generation.
The model was fine-tuned to maintain its 4-bit quantization (bnb-4bit) for memory efficiency while aligning its outputs with the characteristics of the Spanish dataset.
The focus was on retaining the model's instructional abilities while improving its understanding and generation of Spanish text.

Purpose

This fine-tuned model is intended for Spanish-language applications that require efficient AI that follows instructions using a lightweight reasoning process.

Developed by: fjmgAI
License: apache-2.0