Text2Text Generation
Transformers
Safetensors
English
gazelle
Inference Endpoints

Gazelle v0.2 is the mid-March release from Tincans of a joint speech-language model.

This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.

The datasets used were snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset (first iteration) and jondurbin/truthy-dpo-v0.1. We trained for 2 epochs with max_lr=3e-4, batch size 32, 10 warmup steps, cosine decay.

We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.

Please see this notebook for an inference example.

Downloads last month
4
Safetensors
Model size
7.37B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Datasets used to train tincans-ai/gazelle-v0.2-dpo

Collection including tincans-ai/gazelle-v0.2-dpo