Edit model card

v0.1

PRM Model adapted from: https://huggingface.co/deepseek-ai/deepseek-math-7b-rl

This is a process reward model mostly trained on a flattened version of PRM800k using LORA and merged back to full model.

1. How to Use

prm_tokenizer = AutoTokenizer.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1")
prm_tokenizer.pad_token = prm_tokenizer.eos_token

prm_model = AutoModelForSequenceClassification.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1").eval()

encoded_inputs = [prm_tokenizer.encode(candidate, return_tensors="pt") for candidate in batch_candidates]

max_length = max([input_id.shape[1] for input_id in encoded_inputs])  # Find the longest sequence
padded_inputs = [
    torch.nn.functional.pad(input_id, (0, max_length - input_id.size(1)), value=prm_tokenizer.pad_token_id) for
    input_id in encoded_inputs]
input_ids = torch.cat(padded_inputs, dim=0).to("cuda") 

with torch.no_grad():
    outputs = prm_model(input_ids)

    logits = outputs.logits[0]
    
    scores = logits.softmax(dim=-1)
    
    log_probs = scores.log()
            

2. License

This code repository is licensed under the MIT License. The use of DeepSeekMath models is subject to the Model License. DeepSeekMath supports commercial use.

See the LICENSE-MODEL for more details.

3. have any questions, please raise an issue or contact original team at [email protected].

Downloads last month
14
Safetensors
Model size
6.49B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.