Diff Datasets
Datasets containing github diffs
Viewer • Updated • 10.7M • 369 • 3Note Diffs only, no full files
bigcode/github-commits-diff-dedup-pjjs-april
Viewer • Updated • 146k • 532 • 3Note Contains full new and old file
ASSERT-KTH/megadiff-single-function
Viewer • Updated • 72.4k • 152 • 2Note Megadiff: A Dataset of 600k Java Source Code Changes Categorized by Diff Size -- https://arxiv.org/pdf/2108.04631 Refined version of "ASSERT-KTH/megadiff" where each line has the old buggy function and the corrected one Only code, no commit messages
ASSERT-KTH/megadiff
Viewer • Updated • 657k • 505 • 1
mamiksik/processed-commit-diffs
Viewer • Updated • 77.8k • 38 • 3Note Patches, no diffs or full files. Only taken from high quality files
epinnock/commit-diffs
Viewer • Updated • 117k • 21 • 1Note Has new file, old file and diff
bigcode/commitpackft
Viewer • Updated • 702k • 299k • 75Note Has full old and new files Filtered bigcode/commitpack for high quality commit messages
ObscuraCoder/commit-chronicle
Viewer • Updated • 3.01M • 250 • 2Note Diff and commit message only Filtered version of the JetBrains-Research/commit-chronicle
JetBrains-Research/commit-chronicle
Viewer • Updated • 10.9M • 195 • 10Note Diffs with meta data
chargoddard/commitpack-ft-instruct
Viewer • Updated • 491k • 14 • 2Note Add a prefix question to the commit message as an instruction Data taken from bigcode/commitpackft
Maxscha/commitbench
Viewer • Updated • 1.66M • 57 • 9Note 4 years old but looks good quality Diff and commit message
ASSERT-KTH/repairllama-datasets
Viewer • Updated • 460k • 330 • 2Note 6 splits containing input output pairs where the input is the code with a bug and the output is the correction RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair -- https://arxiv.org/abs/2312.15698