--- library_name: transformers tags: - verilog - reasoning - reinforcement-learning - rtl --- # VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb For implementation details, visit our GitHub repository: [VeriReason](https://github.com/NellyW8/VeriReason) and our [page](https://nellyw8.github.io/VeriReason/) Check out our paper: [VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation](https://arxiv.org/abs/2505.11849) ## Update Log 2025.05.17: Initial release of VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb ## Project Description This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation. Using our curated high-quality training examples alongside a feedback-driven reward model, VeriReason achieves 83.1% functional correctness on the VerilogEval Machine benchmark, substantially outperforming both comparable-sized models and much larger commercial systems like GPT-4 Turbo. The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis. Our 7B parameter model based on Code Llama demonstrates up to a 2.8× increase in first-attempt functional correctness compared to baseline methods and exhibits robust generalization to unseen designs. ## Installation To install this project, follow these steps: 1. Clone the repository: `git clone https://github.com/NellyW8/VeriReason.git` 2. Navigate to the project directory: `cd VeriReason` 3. Install the dependencies as specified in the repository ## Usage You can use the model with the transformers library: ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "Nellyw888/VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16) model.eval() prompt = """ Please act as a professional verilog designer. Develop a module that implements a 8-bit comparator. The module should have two 8-bit inputs and one output. If the first input is greater than the second input, the output should be high. Otherwise, the output should be low. First, think through the design approach, considering the functionality, inputs, outputs, and implementation details. Then provide the complete Verilog code implementation. Respond in the following format: ... ```verilog ...``` """ input_ids = tokenizer(prompt, return_tensors="pt").input_ids outputs = model.generate(input_ids, max_length=1024, temperature=0.2, top_p=0.95) result = tokenizer.decode(outputs[0], skip_special_tokens=True) print(result) ``` ## Training The GRPO (Generative Reinforcement Learning from Preference Optimization) training is based on the OpenR1 framework. For training with GRPO: 1. Move the necessary files to the OpenR1 directory: ```bash mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/ ``` 2. Create a directory for the Verilog recipe: ```bash mkdir verilog_recipe mv verilog_grpo_tb.yaml verilog_recipe/ ``` 3. Run training: ```bash NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=0,1,2 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false ``` ## Citation Please cite our paper if you use our model or dataset: ```bibtex @misc{wang2025verireason, title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation}, author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li}, year={2025}, eprint={2505.11849}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2505.11849}, } ``` ## Acknowledgement This repo benefits from OpenR1 and LLamaFactory.