argilla
/

phi2-lora-distilabel-intel-orca-dpo-pairs

Text Generation

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

davidberenstein1957 commited on Jan 25, 2024

Commit

e6c1c4a

·

verified ·

1 Parent(s): 42ec821

Update README.md

Files changed (1) hide show

README.md +46 -1

README.md CHANGED Viewed

@@ -37,7 +37,52 @@ It achieves the following results on the evaluation set:
 ## Model description
-This model is a basic fine-tune
 ## Intended uses & limitations

 ## Model description
+The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). In order to scale LoRa approached for LLMs, I recommend looking at [predibase/lorax](https://github.com/predibase/lorax).
+You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).
+```python
+import torch
+import torch
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    BitsAndBytesConfig
+)
+from peft import PeftModel
+# template used for fine-tune
+# template = """\
+# Instruct: {instruction}\n
+# Output: {response}"""
+if torch.cuda.is_available():
+    device = torch.device("cuda")
+    print(f"Using {torch.cuda.get_device_name(0)}")
+    bnb_config = BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_quant_type='nf4',
+        bnb_4bit_compute_dtype='float16',
+        bnb_4bit_use_double_quant=False,
+    )
+elif torch.backends.mps.is_available():
+    device = torch.device("mps")
+    bnb_config = None
+else:
+    device = torch.device("cpu")
+    bnb_config = None
+    print("No GPU available, using CPU instead.")
+config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
+model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
+model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)
+prompt = "Instruct: What is the capital of France? \nOutput:""
+inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
+outputs = model.generate(**inputs)
+text = tokenizer.batch_decode(outputs)[0]
+```
 ## Intended uses & limitations