VERL Code Datasets
High-quality code generation datasets in VERL format: Python, competitive programming, and Verilog HDL for RL training
Viewer • Updated • 927k • 9.27k • 1Note Unified code reasoning dataset with 7 splits (958K+ examples): Python, competitive programming, and Verilog. Includes rstar-coder (386K), kodcode (435K), and 5 other datasets.
sungyub/kodcode-v1-verl
Viewer • Updated • 435k • 57Note Largest dataset. High-quality Python from LeetCode, HumanEval, docs. Includes GPT-4 quality metrics (89.8% retention).
sungyub/rstar-coder-verl
Viewer • Updated • 345k • 72Note Second largest. Microsoft rStar-Coder with test case-based evaluation. Synthetic large-scale dataset.
sungyub/acecode-87k-verl
Viewer • Updated • 87.1k • 69Note TIGER-Lab AceCode. Uses pytest-style assertions for Sandbox Fusion compatibility.
sungyub/eurus-2-code-verl
Viewer • Updated • 25.1k • 56Note Competitive programming: CodeContests, TACO, APPS, Codeforces. Schema unified with skywork format.
sungyub/skywork-or1-code-verl
Viewer • Updated • 14.1k • 55Note Reference standard with model difficulty ratings. 80.4% cleaned of instruction prefixes.
sungyub/code-contests-plus-verl
Viewer • Updated • 6.54k • 34Note ByteDance Code-Contests-Plus. Sandbox-validated test cases (72.1% success rate).
sungyub/codev-r1-verl
Viewer • Updated • 3.13k • 83Note Verilog HDL for hardware design. Filtered version with 87.1% test pass rate.