mukaj/deepseek-math-7b-rl-prm-v0.1

v0.1

PRM Model adapted from: https://huggingface.co/deepseek-ai/deepseek-math-7b-rl

This is a process reward model mostly trained on a flattened version of PRM800k using LORA and merged back to full model.

1. How to Use

prm_tokenizer = AutoTokenizer.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1")
prm_tokenizer.pad_token = prm_tokenizer.eos_token

prm_model = AutoModelForSequenceClassification.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1").eval()

encoded_inputs = [prm_tokenizer.encode(candidate, return_tensors="pt") for candidate in batch_candidates]

max_length = max([input_id.shape[1] for input_id in encoded_inputs])  # Find the longest sequence
padded_inputs = [
    torch.nn.functional.pad(input_id, (0, max_length - input_id.size(1)), value=prm_tokenizer.pad_token_id) for
    input_id in encoded_inputs]
input_ids = torch.cat(padded_inputs, dim=0).to("cuda") 

with torch.no_grad():
    outputs = prm_model(input_ids)

    logits = outputs.logits[0]
    
    scores = logits.softmax(dim=-1)
    
    log_probs = scores.log()

2. License

This code repository is licensed under the MIT License. The use of DeepSeekMath models is subject to the Model License. DeepSeekMath supports commercial use.

See the LICENSE-MODEL for more details.

mukaj
/

deepseek-math-7b-rl-prm-v0.1

1. How to Use

2. License

3. have any questions, please raise an issue or contact original team at [email protected].