Edit model card

tFINE-base-300m-instruct-L2

This is an "instruct" model fine-tuned originally from pszemraj/tFINE-base-300m in the following phases:

  1. two epochs on supernatural instructions
  2. instruct tuning on 2M "easy"/L1 instructions
  3. instruct tuning on 1M "harder"/L2 instructions

Usage example

from transformers import pipeline

pipe = pipeline(
    "text2text-generation",
    model="pszemraj/tFINE-300m-instruct-L2",
)
prompt = "write a python script to download a file from a url and save as a local file using requests. explain how it works"
res = pipe(
    prompt,
    num_beams=4,
    early_stopping=True,
    max_new_tokens=384,
    no_repeat_ngram_size=7,
)
print(res[0]["generated_text"])

Quick eval

Quick eval for: pszemraj/tFINE-base-300m-instruct-L2

hf (pretrained=pszemraj/tFINE-base-300m-instruct-L2,trust_remote_code=True,dtype=bfloat16,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
boolq 2 none 0 acc ↑ 0.6193 ± 0.0085
openbookqa 1 none 0 acc ↑ 0.1440 ± 0.0157
none 0 acc_norm ↑ 0.3040 ± 0.0206
piqa 1 none 0 acc ↑ 0.6083 ± 0.0114
none 0 acc_norm ↑ 0.6061 ± 0.0114
social_iqa 0 none 0 acc ↑ 0.3823 ± 0.0110
tinyArc 0 none 25 acc_norm ↑ 0.3469 ± N/A
tinyGSM8k 0 flexible-extract 5 exact_match ↑ 0.0371 ± N/A
strict-match 5 exact_match ↑ 0.0154 ± N/A
tinyHellaswag 0 none 10 acc_norm ↑ 0.3044 ± N/A
tinyMMLU 0 none 0 acc_norm ↑ 0.3311 ± N/A
winogrande 1 none 0 acc ↑ 0.5107 ± 0.0140

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 17868
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0
Downloads last month
5
Safetensors
Model size
301M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/tFINE-300m-instruct-L2

Quantizations
1 model