Text Generation
Transformers
PyTorch
Safetensors
English
opt
text-generation-inference
Inference Endpoints

Model Card for Model ID

This a model is a reward model for RLHF fine-tuned using DeepSpeed Chat. It is based on OPT-350M.

Model Details

Model Description

  • Developed by: The Kaitchup
  • Model type: Reward model
  • Language(s) (NLP): English
  • License: cc-by-nc-sa-4.0
  • Finetuned from model: facebook/opt-350m

Model Sources

The model has been trained with the procedure described in this article:

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model

Downloads last month
12
Safetensors
Model size
331M params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Datasets used to train kaitchup/OPT-350M-RM-DSChat