davidberenstein1957 commited on
Commit
e6c1c4a
·
verified ·
1 Parent(s): 42ec821

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -1
README.md CHANGED
@@ -37,7 +37,52 @@ It achieves the following results on the evaluation set:
37
 
38
  ## Model description
39
 
40
- This model is a basic fine-tune
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Intended uses & limitations
43
 
 
37
 
38
  ## Model description
39
 
40
+ The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). In order to scale LoRa approached for LLMs, I recommend looking at [predibase/lorax](https://github.com/predibase/lorax).
41
+
42
+ You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).
43
+
44
+ ```python
45
+ import torch
46
+ import torch
47
+ from transformers import (
48
+ AutoModelForCausalLM,
49
+ AutoTokenizer,
50
+ BitsAndBytesConfig
51
+ )
52
+ from peft import PeftModel
53
+
54
+ # template used for fine-tune
55
+ # template = """\
56
+ # Instruct: {instruction}\n
57
+ # Output: {response}"""
58
+
59
+ if torch.cuda.is_available():
60
+ device = torch.device("cuda")
61
+ print(f"Using {torch.cuda.get_device_name(0)}")
62
+ bnb_config = BitsAndBytesConfig(
63
+ load_in_4bit=True,
64
+ bnb_4bit_quant_type='nf4',
65
+ bnb_4bit_compute_dtype='float16',
66
+ bnb_4bit_use_double_quant=False,
67
+ )
68
+ elif torch.backends.mps.is_available():
69
+ device = torch.device("mps")
70
+ bnb_config = None
71
+ else:
72
+ device = torch.device("cpu")
73
+ bnb_config = None
74
+ print("No GPU available, using CPU instead.")
75
+
76
+ config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
77
+ model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
78
+ model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)
79
+
80
+ prompt = "Instruct: What is the capital of France? \nOutput:""
81
+ inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
82
+
83
+ outputs = model.generate(**inputs)
84
+ text = tokenizer.batch_decode(outputs)[0]
85
+ ```
86
 
87
  ## Intended uses & limitations
88