--- license: apache-2.0 datasets: - weiminw/heliumos_reward_score --- # Model Detail Training data: 21K reward data is released in weiminw/heliumos_reward_score which is normalized by nvidia/helpsteer2. Base model: we use Qwen2.5 Instruct series (3B, 7B) as our base models. **Score:** from 0 to 1, below 0.5 means that the AI's response has little value, that is, the response is incorrect or does not follow the instructions. # How to use Here's a minimal example of using Heliumos-Reward-3B: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load model and tokenizer model_path ="weiminw/Heliumos-RM-3B" model = AutoModelForSequenceClassification.from_pretrained( model_path, torch_dtype=torch.float16, # torch_dtype="auto" 加载为float32精度, 精度较高占用显存较大(15G左右), float16 大约是其一半,精度损失大概3%左右, 本奖励模型打分范围为0-1, 由于bfloat16精度问题, 不要加载为bfloat16 device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_path) # Define the messages messages = [ {'role': 'user', 'content': 'what is 92 * 23'}, {'role': 'assistant', 'content': 'the answer is 2116'} ] # Generate prompt and get the model's output encoded_text = tokenizer.apply_chat_template(messages, return_dict=True, return_tensors="pt", tokenize=True) score = model(**encoded_text) # Print result print(f"Model output for the evaluation: {score}") # 0.6885 表示回复对问题的正确价值为68.85%. ```