metadata

inference: false
datasets:
  - bigcode/commitpackft
model-index:
  - name: patched-coder-34b
    results:
      - task:
          type: text-generation
        dataset:
          type: openai_humaneval
          name: HumanEval
        metrics:
          - name: pass@1
            type: pass@1
            value: 53.567
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalFix Python
        metrics:
          - name: pass@1
            type: pass@1
            value: 41.341
            verified: false
      - task:
          type: text-generation
        dataset:
          type: patched-codes/static-analysis-eval
          name: Static Analysis Eval
        metrics:
          - name: pass@1
            type: pass@1
            value: 51.316
            verified: false
license: llama2

Model Card for patched-coder-34b

This is an instruction fine-tuned model focussed on the task of patching code. Patching may include fixing bugs, remediating security vulnerabilities, doing API migrations and other kinds of code maintainence.

Model Details

Model Description

Developed by: codelion
Model type: Code Llama
Finetuned from model: CodeLlama-34b-Python

How to Get Started with the Model

Make sure to install Transformers from the main git branch:

pip install git+https://github.com/huggingface/transformers.git

How to Prompt the Model

This model accepts the alpaca instruction format.

For example:

### Instruction:
{instruction}

### Input:
{input}

### Response:
...

Bias, Risks, and Limitations

This model has undergone very limited testing. Additional safety testing should be performed before any real-world deployments.

Training Details

GPU: A100 80 GB
Time: ~8 hrs

Training Data

The model was fine-tuned on commitpackft, an open dataset consisting of commits. We started with the commits for the python langauge from the dataset and then filtered all the commits that were related to fixing bugs.

Training Procedure

Instruction fine-tuning to follow instructions in natural langauge related to code. We load the quantized base model in 4 bits and then use QLoRA for Parameter-Efficient Fine-Tuning (PEFT) with Flash Attention. The model was trained for 2 epochs.

Training Hyperparameters

Training regime:

The following bitsandbytes quantization config was used during training:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: bfloat16

Evaluation

We evaluated the model on HumanEval (for code generation) and HumanEvalFix Python (for bug fixing) benchmarks using Code Generation LM Evaluation Harness.

To evaluate the model for vulnerability remediation we used the Static Analysis Eval benchmark available here.

Results

Model	HumanEval	HumanEval Fix Python	Static Analysis Eval
patched-coder-34b	53.57	41.34	51.32
CodeLlama-34b-Python	53.29	33.14	27.63
GPT-4	86.6	47	55.26

Based on the results on these benchmarks, patched-coder-34b is the SOTA open code LLM. Other code LLMs (e.g. from WizardCoder and Phind) are trained on either unknown proprietary datasets or used OpenAI's APIs for training, thus making them unviable for commercial use.