Model Detail
Training data: 21K reward data is released in weiminw/heliumos_reward_score which is normalized by nvidia/helpsteer2.
Base model: we use Qwen2.5 Instruct series (3B, 7B) as our base models.
Score: from 0 to 1, below 0.5 means that the AI's response has little value, that is, the response is incorrect or does not follow the instructions.
How to use
Here's a minimal example of using Heliumos-Reward-3B:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model and tokenizer
model_path ="weiminw/Heliumos-RM-3B"
model = AutoModelForSequenceClassification.from_pretrained(
model_path,
torch_dtype=torch.float16, # torch_dtype="auto" 加载为float32精度, 精度较高占用显存较大(15G左右), float16 大约是其一半,精度损失大概3%左右, 本奖励模型打分范围为0-1, 由于bfloat16精度问题, 不要加载为bfloat16
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Define the messages
messages = [
{'role': 'user', 'content': 'what is 92 * 23'},
{'role': 'assistant', 'content': 'the answer is 2116'}
]
# Generate prompt and get the model's output
encoded_text = tokenizer.apply_chat_template(messages, return_dict=True, return_tensors="pt", tokenize=True)
score = model(**encoded_text)
# Print result
print(f"Model output for the evaluation: {score}") # 0.6885 表示回复对问题的正确价值为68.85%.
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.