File size: 3,809 Bytes
768fb2e
 
 
 
 
93590d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
768fb2e
 
 
 
3f8ec4a
768fb2e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93590d0
768fb2e
 
93590d0
768fb2e
 
93590d0
 
 
 
 
507ff2d
93590d0
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
inference: false
datasets:
- bigcode/commitpackft
model-index:
- name: patched-coder-34b
  results:
  - task:
      type: text-generation
    dataset:
      type: openai_humaneval
      name: HumanEval
    metrics:
    - name: pass@1
      type: pass@1
      value: 53.567
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFix Python
    metrics:
    - name: pass@1
      type: pass@1
      value: 41.341
      verified: false
  - task:
      type: text-generation
    dataset:
      type: patched-codes/static-analysis-eval
      name: Static Analysis Eval
    metrics:
    - name: pass@1
      type: pass@1
      value: 51.316
      verified: false
license: llama2
---
# Model Card for patched-coder-34b

This is an instruction fine-tuned model focussed on the task of patching code. Patching may include fixing bugs, remediating security vulnerabilities, 
doing API migrations and other kinds of code maintainence.

## Model Details

### Model Description

- **Developed by:** [codelion](https://huggingface.co/codelion)
- **Model type:** Code Llama
- **Finetuned from model:** [CodeLlama-34b-Python](https://huggingface.co/codellama/CodeLlama-34b-Python-hf)


## How to Get Started with the Model

Make sure to install Transformers from the main git branch:

```bash
pip install git+https://github.com/huggingface/transformers.git
```

## How to Prompt the Model

This model accepts the alpaca instruction format.

For example: 

```
### Instruction:
{instruction}

### Input:
{input}

### Response:
...
```

## Bias, Risks, and Limitations

This model has undergone very limited testing. Additional safety testing should be performed before any real-world deployments.

## Training Details

- **GPU:** A100 80 GB
- **Time:** ~8 hrs

### Training Data

The model was fine-tuned on [commitpackft](https://huggingface.co/datasets/bigcode/commitpackft), an open dataset consisting of commits.
We started with the commits for the `python` langauge from the dataset and then filtered all the commits that were related to fixing bugs. 

### Training Procedure 

Instruction fine-tuning to follow instructions in natural langauge related to code. We load the quantized base model in 4 bits 
and then use QLoRA for Parameter-Efficient Fine-Tuning (PEFT) with Flash Attention. The model was trained for 2 epochs.

#### Training Hyperparameters

**Training regime:** 

The following `bitsandbytes` quantization config was used during training:
  - quant_method: bitsandbytes
  - load_in_8bit: False
  - load_in_4bit: True
  - llm_int8_threshold: 6.0
  - llm_int8_skip_modules: None
  - llm_int8_enable_fp32_cpu_offload: False
  - llm_int8_has_fp16_weight: False
  - bnb_4bit_quant_type: nf4
  - bnb_4bit_use_double_quant: True
  - bnb_4bit_compute_dtype: bfloat16

## Evaluation

We evaluated the model on `HumanEval` (for code generation) and `HumanEvalFix Python` (for bug fixing) benchmarks using 
[Code Generation LM Evaluation Harness](https://github.com/bigcode-project/bigcode-evaluation-harness).

To evaluate the model for vulnerability remediation we used the `Static Analysis Eval` benchmark available [here](https://huggingface.co/datasets/patched-codes/static-analysis-eval).

### Results

| Model | HumanEval | HumanEval Fix Python| Static Analysis Eval |
| ----- | ----------| ------------------- | -------------------- |
| patched-coder-34b | 53.57 | 41.34 | 51.32 |
| CodeLlama-34b-Python | 53.29 | 33.14 | 27.63 |
| GPT-4 | 86.6 | 47 | 55.26 |

Based on the results on these benchmarks, patched-coder-34b is the SOTA open code LLM. Other code LLMs (e.g. from WizardCoder and Phind) are trained on 
either unknown proprietary datasets or used OpenAI's APIs for training, thus making them unviable for commercial use.