jasperyeoh2's picture
Update README.md
257d07d verified
---
base_model: meta-llama/Llama-2-7b-hf
library_name: peft
license: llama2
datasets:
- timdettmers/openassistant-guanaco
language:
- en
- th
- zh
metrics:
- accuracy
pipeline_tag: question-answering
---
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [Jixin Yang @ HKUST]
- **Model type:** [PEFT (LoRA) fine-tuned LLaMA-2 7B for backward text generation]
- **Finetuned from model [optional]:** [meta-llama/Llama-2-7b-hf]
## Uses
This model is designed for backward text generation - given an output text, it generates the corresponding input.
## How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "jasperyeoh2/llama2-7b-backward-model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
input_text = "Output text to reverse"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
- Dataset: [OpenAssistant-Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco)
- Number of examples used: ~3,200
- Task: Instruction Backtranslation (Answer → Prompt)
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
- Method: PEFT with LoRA (Low-Rank Adaptation)
- Quantization: 4-bit (NF4)
- LoRA config:
- `r`: 8
- `alpha`: 16
- `target_modules`: ["q_proj", "v_proj"]
- `dropout`: 0.05
- Max sequence length: 512 tokens
- Epochs: 10
- Batch size: 2
- Gradient accumulation steps: 8
- Effective batch size: 16
- Learning rate: 2e-5
- Scheduler: linear with warmup
- Optimizer: AdamW
- Early stopping: enabled (patience=2)
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[wandb: https://wandb.ai/jyang577-hong-kong-university-of-science-and-technology/huggingface?nw=nwuserjyang577]
### Results
[- Final eval loss: ~1.436
- Final train loss: ~1.4
- Training completed in ~8 epochs]
### Compute Infrastructure
- GPU: 1× NVIDIA A800 (80GB)
- CUDA Version: 12.1
#### Software
- OS: Ubuntu 20.04
- Python: 3.10
- Transformers: 4.38.2
- PEFT: 0.15.1
- Accelerate: 0.28.0
- BitsAndBytes: 0.41.2]
#### Hardware
NVIDIA A800 GPU
### Framework versions
- PEFT 0.15.1