Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,52 @@ It achieves the following results on the evaluation set:
|
|
37 |
|
38 |
## Model description
|
39 |
|
40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
## Intended uses & limitations
|
43 |
|
|
|
37 |
|
38 |
## Model description
|
39 |
|
40 |
+
The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). In order to scale LoRa approached for LLMs, I recommend looking at [predibase/lorax](https://github.com/predibase/lorax).
|
41 |
+
|
42 |
+
You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).
|
43 |
+
|
44 |
+
```python
|
45 |
+
import torch
|
46 |
+
import torch
|
47 |
+
from transformers import (
|
48 |
+
AutoModelForCausalLM,
|
49 |
+
AutoTokenizer,
|
50 |
+
BitsAndBytesConfig
|
51 |
+
)
|
52 |
+
from peft import PeftModel
|
53 |
+
|
54 |
+
# template used for fine-tune
|
55 |
+
# template = """\
|
56 |
+
# Instruct: {instruction}\n
|
57 |
+
# Output: {response}"""
|
58 |
+
|
59 |
+
if torch.cuda.is_available():
|
60 |
+
device = torch.device("cuda")
|
61 |
+
print(f"Using {torch.cuda.get_device_name(0)}")
|
62 |
+
bnb_config = BitsAndBytesConfig(
|
63 |
+
load_in_4bit=True,
|
64 |
+
bnb_4bit_quant_type='nf4',
|
65 |
+
bnb_4bit_compute_dtype='float16',
|
66 |
+
bnb_4bit_use_double_quant=False,
|
67 |
+
)
|
68 |
+
elif torch.backends.mps.is_available():
|
69 |
+
device = torch.device("mps")
|
70 |
+
bnb_config = None
|
71 |
+
else:
|
72 |
+
device = torch.device("cpu")
|
73 |
+
bnb_config = None
|
74 |
+
print("No GPU available, using CPU instead.")
|
75 |
+
|
76 |
+
config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
|
77 |
+
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
|
78 |
+
model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)
|
79 |
+
|
80 |
+
prompt = "Instruct: What is the capital of France? \nOutput:""
|
81 |
+
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
|
82 |
+
|
83 |
+
outputs = model.generate(**inputs)
|
84 |
+
text = tokenizer.batch_decode(outputs)[0]
|
85 |
+
```
|
86 |
|
87 |
## Intended uses & limitations
|
88 |
|