Update README.md

082438f verified 11 months ago

413 Bytes

metadata

library_name: transformers
license: mit
base_model:
  - google/gemma-2b

Model Details

google/gemma-2b model finetuned on 100,000 CLRS-Text examples.

Training Details

Learning Rate: 1e-4, 150 warmup steps then cosine decayed to 5e-06 using AdamW optimiser
Batch size: 128
Loss taken over answer only, not on question.