Model Detail

Training data: 21K reward data is released in weiminw/heliumos_reward_score which is normalized by nvidia/helpsteer2.
Base model: we use Qwen2.5 Instruct series (3B, 7B) as our base models.
Score: from 0 to 1, below 0.5 means that the AI's response has little value, that is, the response is incorrect or does not follow the instructions.

How to use

Here's a minimal example of using Heliumos-Reward-3B:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model and tokenizer
model_path ="weiminw/Heliumos-RM-3B"
model = AutoModelForSequenceClassification.from_pretrained(
      model_path, 
     torch_dtype=torch.float16, # torch_dtype="auto" 加载为float32精度, 精度较高占用显存较大(15G左右), float16 大约是其一半,精度损失大概3%左右, 本奖励模型打分范围为0-1, 由于bfloat16精度问题, 不要加载为bfloat16
     device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)


# Define the messages
messages = [
    {'role': 'user', 'content': 'what is 92 * 23'},
    {'role': 'assistant', 'content': 'the answer is 2116'}
]

# Generate prompt and get the model's output
encoded_text = tokenizer.apply_chat_template(messages, return_dict=True, return_tensors="pt", tokenize=True)
score = model(**encoded_text)

# Print result
print(f"Model output for the evaluation: {score}")  # 0.6885 表示回复对问题的正确价值为68.85%.
Downloads last month
1
Safetensors
Model size
3.09B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train weiminw/Heliumos-RM-3B