kaitchup
/

OPT-350M-RM-DSChat

Text Generation

text-generation-inference

Model card Files Files and versions Community

Model Card for Model ID

This a model is a reward model for RLHF fine-tuned using DeepSpeed Chat. It is based on OPT-350M.

Model Details

Model Description

Developed by: The Kaitchup
Model type: Reward model
Language(s) (NLP): English
License: cc-by-nc-sa-4.0
Finetuned from model: facebook/opt-350m

Model Sources

The model has been trained with the procedure described in this article:

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model

Downloads last month: 36

Safetensors

Model size

331M params

Tensor type

FP16

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train kaitchup/OPT-350M-RM-DSChat