CodeV-R1-Distill-Qwen-7B

1. Introduction

Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges: the lack of automated and accurate verification environments, the scarcity of high‐quality NL–code pairs, and the prohibitive computation cost of RLVR.

To this end, we introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs, As a continuation of the work initiated with CodeV. First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM‐generated NL descriptions, verifies code–NL–code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage distill-then-RL training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate.

This model, CodeV-R1-Distill-Qwen-7B, is the model after distillation. The RL model, CodeV-R1-Qwen-7B, is provided here. For more training details, please refer to our paper.

2. Evaluation Results

During the evaluation phase, the maximum generation length is configured to 16,384 tokens. A temperature setting of 0.6 is applied, and 20 responses are generated per query to estimate the pass@1 score.

Our evaluation encompasses Verilog benchmarks, including VerilogEval and RTLLM. For VerilogEval v2, we examine zero-shot scenarios in both specification-to-RTL translation and code completion tasks. Concerning RTLLM, results are reported for version 1.1, which offers a broader spectrum of comparative analyses. Furthermore, we find that the acquisition of the reasoning process in Verilog problems, as facilitated by DeepSeek-R1, enhances the model's out-of-domain mathematical capabilities.

VerilogEval (v2)

Model Model size Type Spec-to-rtl Completion
GPT-4o Undisclosed General 62.5% 59.0%
GPT-4 Turbo Undisclosed General 61.1% 53.9%
GPT-4 Undisclosed General 32.0% 42.3%
Mistral Large Undisclosed General 37.5% 34.0%
Llama3.1 405B General 57.2% 56.4%
Llama3.1 70B General 42.8% 35.3%
Llama3 70B General 43.9% 37.8%
Llama2 70B General 5.3% 1.3%
Llama3.1 8B General 19.1% 2.6%
CodeLlama 70B Coding 34.9% 37.2%
DeepSeek Coder 33B Coding 21.7% 25.0%
CodeGemma 7B Coding 9.5% 8.3%
DeepSeek Coder 6.7B Coding 29.6% 24.4%
RTL-Coder 6.7B Verilog RTL 36.8% 35.9%
CodeV-R1-distill (ours) 7B Verilog RTL 65.2% 65.5%
CodeV-R1 (ours) 7B Verilog RTL 68.8% 69.9%

RTLLM (v1.1)

Model Model size Type Pass@1
GPT-4o Undisclosed General 33.8%
GPT-3.5 Turbo Undisclosed General 28.3%
Llama3.1 405B General 38.9%
Nemotron-4 340B General 18.9%
Llama3.1 8B General 19.1%
CodeLlama 7B Coding 17.9%
CodeQwen 7B Coding 24.1%
Starcoder2 15B Coding 15.5%
DeepSeek Coder 6.7B Coding 23.1%
DeepSeek-Coder-V2 16B Coding 33.1%
DeepSeek-Coder-V2 236B Coding 34.5%
RTL-Coder 6.7B Verilog RTL 36.8%
CraftRTL 6.7B Verilog RTL 53.1%
CodeV-R1-distill (ours) 7B Verilog RTL 56.2%
CodeV-R1 (ours) 7B Verilog RTL 72.9%

4. Usage

CodeV-R1-Distill-Qwen-7B can be utilized in the same manner as Qwen or Llama models.

For instance, you can easily start a service using vLLM:

vllm serve zhuyaoyu/CodeV-R1-Distill-Qwen-7B --tensor-parallel-size 2 --max-model-len 16384 --enforce-eager

Usage Recommendations

During training and evaluation, we use a system prompt

You are a helpful assistant. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and<answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>.  Now the user asks you to write verilog code. After thinking, when you finally reach a conclusion, enclose the final verilog code in ```verilog ``` within <answer> </answer> tags. i.e., <answer> ```verilog\n module top_module(in, out, ...) ... ``` </answer>.\n

It is recommended to use this prompt during inference.

5. License

CodeV-R1-Distill-Qwen-7B is derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 87k samples curated with DeepSeek-R1.

6. Citation

If you find our model helpful, please cite our paper:

@misc{zhu2025codevr1reasoningenhancedveriloggeneration,
      title={CodeV-R1: Reasoning-Enhanced Verilog Generation}, 
      author={Yaoyu Zhu and Di Huang and Hanqi Lyu and Xiaoyun Zhang and Chongxiao Li and Wenxuan Shi and Yutong Wu and Jianan Mu and Jinghua Wang and Yang Zhao and Pengwei Jin and Shuyao Cheng and Shengwen Liang and Xishan Zhang and Rui Zhang and Zidong Du and Qi Guo and Xing Hu and Yunji Chen},
      year={2025},
      eprint={2505.24183},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.24183}, 
}
Downloads last month
430
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zhuyaoyu/CodeV-R1-Distill-Qwen-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(150)
this model

Space using zhuyaoyu/CodeV-R1-Distill-Qwen-7B 1

Collection including zhuyaoyu/CodeV-R1-Distill-Qwen-7B