DiffuCoder-7B-Base

The DiffuCoder-7B-Base model is our foundational masked diffusion LLM for code generation.

  • Training recipe: Using DiffuLLaMA's adaptation approach, trained on a large corpus of code: with Stage 1 65B tokens and Stage 2 65B tokens.

  • Benchmarks: Strong baseline performance on HumanEval, MBPP and BigCodeBench.

More details and usage examples:

import torch
from transformers import AutoModel, AutoTokenizer

model_path = "apple/DiffuCoder-7B-Base"
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = model.to("cuda").eval()

prompt = """
from typing import List

def has_close_elements(numbers: List[float], threshold: float) -> bool:
    \"\"\"
    Check if in given list of numbers, are any two numbers closer to each other than given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    \"\"\"
"""

TOKEN_PER_STEP = 1 # diffusion timesteps * TOKEN_PER_STEP = total new tokens

inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs.input_ids.to(device="cuda")
attention_mask = inputs.attention_mask.to(device="cuda")

output = model.diffusion_generate(
    input_ids,
    attention_mask=attention_mask,
    max_new_tokens=256,
    output_history=True,
    return_dict_in_generate=True,
    steps=256//TOKEN_PER_STEP,
    temperature=0.2,
    top_p=0.95,
    alg="entropy",
    alg_temp=0.,
)
generations = [
    tokenizer.decode(g[len(p) :].tolist())
    for p, g in zip(input_ids, output.sequences)
]

print(generations[0].split(tokenizer.eos_token)[0])

Acknowledgement

To power this HuggingFace model release, we reuse Dream's modeling architecture and generation utils.

Downloads last month
49
Safetensors
Model size
7.62B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for apple/DiffuCoder-7B-Base

Base model

Qwen/Qwen2.5-7B
Finetuned
(51)
this model
Finetunes
1 model

Collection including apple/DiffuCoder-7B-Base