CodeV-R1-Distill-Qwen-7B
1. Introduction
Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges: the lack of automated and accurate verification environments, the scarcity of high‐quality NL–code pairs, and the prohibitive computation cost of RLVR.
To this end, we introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs, As a continuation of the work initiated with CodeV. First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM‐generated NL descriptions, verifies code–NL–code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage distill-then-RL training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate.
This model, CodeV-R1-Distill-Qwen-7B, is the model after distillation. The RL model, CodeV-R1-Qwen-7B, is provided here. For more training details, please refer to our paper.
2. Evaluation Results
During the evaluation phase, the maximum generation length is configured to 16,384 tokens. A temperature setting of 0.6 is applied, and 20 responses are generated per query to estimate the pass@1 score.
Our evaluation encompasses Verilog benchmarks, including VerilogEval and RTLLM. For VerilogEval v2, we examine zero-shot scenarios in both specification-to-RTL translation and code completion tasks. Concerning RTLLM, results are reported for version 1.1, which offers a broader spectrum of comparative analyses. Furthermore, we find that the acquisition of the reasoning process in Verilog problems, as facilitated by DeepSeek-R1, enhances the model's out-of-domain mathematical capabilities.
VerilogEval (v2)
Model | Model size | Type | Spec-to-rtl | Completion |
---|---|---|---|---|
GPT-4o | Undisclosed | General | 62.5% | 59.0% |
GPT-4 Turbo | Undisclosed | General | 61.1% | 53.9% |
GPT-4 | Undisclosed | General | 32.0% | 42.3% |
Mistral Large | Undisclosed | General | 37.5% | 34.0% |
Llama3.1 | 405B | General | 57.2% | 56.4% |
Llama3.1 | 70B | General | 42.8% | 35.3% |
Llama3 | 70B | General | 43.9% | 37.8% |
Llama2 | 70B | General | 5.3% | 1.3% |
Llama3.1 | 8B | General | 19.1% | 2.6% |
CodeLlama | 70B | Coding | 34.9% | 37.2% |
DeepSeek Coder | 33B | Coding | 21.7% | 25.0% |
CodeGemma | 7B | Coding | 9.5% | 8.3% |
DeepSeek Coder | 6.7B | Coding | 29.6% | 24.4% |
RTL-Coder | 6.7B | Verilog RTL | 36.8% | 35.9% |
CodeV-R1-distill (ours) | 7B | Verilog RTL | 65.2% | 65.5% |
CodeV-R1 (ours) | 7B | Verilog RTL | 68.8% | 69.9% |
RTLLM (v1.1)
Model | Model size | Type | Pass@1 |
---|---|---|---|
GPT-4o | Undisclosed | General | 33.8% |
GPT-3.5 Turbo | Undisclosed | General | 28.3% |
Llama3.1 | 405B | General | 38.9% |
Nemotron-4 | 340B | General | 18.9% |
Llama3.1 | 8B | General | 19.1% |
CodeLlama | 7B | Coding | 17.9% |
CodeQwen | 7B | Coding | 24.1% |
Starcoder2 | 15B | Coding | 15.5% |
DeepSeek Coder | 6.7B | Coding | 23.1% |
DeepSeek-Coder-V2 | 16B | Coding | 33.1% |
DeepSeek-Coder-V2 | 236B | Coding | 34.5% |
RTL-Coder | 6.7B | Verilog RTL | 36.8% |
CraftRTL | 6.7B | Verilog RTL | 53.1% |
CodeV-R1-distill (ours) | 7B | Verilog RTL | 56.2% |
CodeV-R1 (ours) | 7B | Verilog RTL | 72.9% |
4. Usage
CodeV-R1-Distill-Qwen-7B can be utilized in the same manner as Qwen or Llama models.
For instance, you can easily start a service using vLLM:
vllm serve zhuyaoyu/CodeV-R1-Distill-Qwen-7B --tensor-parallel-size 2 --max-model-len 16384 --enforce-eager
Usage Recommendations
During training and evaluation, we use a system prompt
You are a helpful assistant. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and<answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>. Now the user asks you to write verilog code. After thinking, when you finally reach a conclusion, enclose the final verilog code in ```verilog ``` within <answer> </answer> tags. i.e., <answer> ```verilog\n module top_module(in, out, ...) ... ``` </answer>.\n
It is recommended to use this prompt during inference.
5. License
CodeV-R1-Distill-Qwen-7B is derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 87k samples curated with DeepSeek-R1.
6. Citation
If you find our model helpful, please cite our paper:
@misc{zhu2025codevr1reasoningenhancedveriloggeneration,
title={CodeV-R1: Reasoning-Enhanced Verilog Generation},
author={Yaoyu Zhu and Di Huang and Hanqi Lyu and Xiaoyun Zhang and Chongxiao Li and Wenxuan Shi and Yutong Wu and Jianan Mu and Jinghua Wang and Yang Zhao and Pengwei Jin and Shuyao Cheng and Shengwen Liang and Xishan Zhang and Rui Zhang and Zidong Du and Qi Guo and Xing Hu and Yunji Chen},
year={2025},
eprint={2505.24183},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.24183},
}
- Downloads last month
- 430