Update README.md
Browse files
README.md
CHANGED
@@ -18,6 +18,184 @@ base_model: LeroyDyer/Mixtral_AI_Vision-Instruct_X
|
|
18 |
- **License:** apache-2.0
|
19 |
- **Finetuned from model :** LeroyDyer/Mixtral_AI_Vision-Instruct_X
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
22 |
|
23 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
|
|
18 |
- **License:** apache-2.0
|
19 |
- **Finetuned from model :** LeroyDyer/Mixtral_AI_Vision-Instruct_X
|
20 |
|
21 |
+
|
22 |
+
# Vision/multimodal capabilities:
|
23 |
+
|
24 |
+
If you want to use vision functionality:
|
25 |
+
|
26 |
+
* You must use the latest versions of [Koboldcpp](https://github.com/LostRuins/koboldcpp).
|
27 |
+
|
28 |
+
To use the multimodal capabilities of this model and use **vision** you need to load the specified **mmproj** file, this can be found inside this model repo. ([LeroyDyer/Mixtral_AI_Vision-Instruct_X](https://huggingface.co/LeroyDyer/Mixtral_AI_Vision-Instruct_X))
|
29 |
+
|
30 |
+
* You can load the **mmproj** by using the corresponding section in the interface:
|
31 |
+
|
32 |
+

|
33 |
+
|
34 |
+
## Vision/multimodal capabilities:
|
35 |
+
|
36 |
+
* For loading 4-bit use 4-bit mmproj file.- mmproj-Mixtral_AI_Vision-Instruct_X-Q4_0
|
37 |
+
|
38 |
+
* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-Q8_0
|
39 |
+
|
40 |
+
* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-f16
|
41 |
+
|
42 |
+
|
43 |
+
|
44 |
+
## Extended capabilities:
|
45 |
+
|
46 |
+
```
|
47 |
+
* mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base
|
48 |
+
|
49 |
+
* ChaoticNeutrals/Eris-LelantaclesV2-7b - role play
|
50 |
+
|
51 |
+
* ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision
|
52 |
+
|
53 |
+
* rvv-karma/BASH-Coder-Mistral-7B - coding
|
54 |
+
|
55 |
+
* Locutusque/Hercules-3.1-Mistral-7B - Unhinging
|
56 |
+
|
57 |
+
* KoboldAI/Mistral-7B-Erebus-v3 - NSFW
|
58 |
+
|
59 |
+
* Locutusque/Hyperion-2.1-Mistral-7B - CHAT
|
60 |
+
|
61 |
+
* Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking
|
62 |
+
|
63 |
+
* NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing
|
64 |
+
|
65 |
+
* mistralai/Mistral-7B-Instruct-v0.2 - BASE
|
66 |
+
|
67 |
+
* Nitral-AI/ProdigyXBioMistral_7B - medical
|
68 |
+
|
69 |
+
* Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement
|
70 |
+
|
71 |
+
* Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion
|
72 |
+
|
73 |
+
* yanismiraoui/Yarn-Mistral-7b-128k-sharded
|
74 |
+
|
75 |
+
* ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay
|
76 |
+
|
77 |
+
```
|
78 |
+
|
79 |
+
# "image-text-text"
|
80 |
+
|
81 |
+
|
82 |
+
## using transformers
|
83 |
+
|
84 |
+
``` python
|
85 |
+
from transformers import AutoProcessor, LlavaForConditionalGeneration
|
86 |
+
from transformers import BitsAndBytesConfig
|
87 |
+
import torch
|
88 |
+
|
89 |
+
quantization_config = BitsAndBytesConfig(
|
90 |
+
load_in_4bit=True,
|
91 |
+
bnb_4bit_compute_dtype=torch.float16
|
92 |
+
)
|
93 |
+
|
94 |
+
|
95 |
+
model_id = "LeroyDyer/Mixtral_AI_Vision-Instruct_X"
|
96 |
+
|
97 |
+
processor = AutoProcessor.from_pretrained(model_id)
|
98 |
+
model = LlavaForConditionalGeneration.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto")
|
99 |
+
|
100 |
+
|
101 |
+
import requests
|
102 |
+
from PIL import Image
|
103 |
+
|
104 |
+
image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw)
|
105 |
+
image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
|
106 |
+
display(image1)
|
107 |
+
display(image2)
|
108 |
+
|
109 |
+
prompts = [
|
110 |
+
"USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me?\nASSISTANT:",
|
111 |
+
"USER: <image>\nPlease describe this image\nASSISTANT:",
|
112 |
+
]
|
113 |
+
|
114 |
+
inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt").to("cuda")
|
115 |
+
for k,v in inputs.items():
|
116 |
+
print(k,v.shape)
|
117 |
+
|
118 |
+
```
|
119 |
+
|
120 |
+
## Using pipeline
|
121 |
+
|
122 |
+
``` python
|
123 |
+
|
124 |
+
from transformers import pipeline
|
125 |
+
from PIL import Image
|
126 |
+
import requests
|
127 |
+
|
128 |
+
model_id = LeroyDyer/Mixtral_AI_Vision-Instruct_X
|
129 |
+
pipe = pipeline("image-to-text", model=model_id)
|
130 |
+
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
|
131 |
+
|
132 |
+
image = Image.open(requests.get(url, stream=True).raw)
|
133 |
+
question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
|
134 |
+
prompt = f"A chat between a curious human and an artificial intelligence assistant.
|
135 |
+
The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:"
|
136 |
+
|
137 |
+
outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
|
138 |
+
print(outputs)
|
139 |
+
```
|
140 |
+
|
141 |
+
|
142 |
+
|
143 |
+
|
144 |
+
|
145 |
+
|
146 |
+
## Mistral ChatTemplating
|
147 |
+
Instruction format
|
148 |
+
In order to leverage instruction fine-tuning,
|
149 |
+
your prompt should be surrounded by [INST] and [/INST] tokens.
|
150 |
+
The very first instruction should begin with a begin of sentence id. The next instructions should not.
|
151 |
+
The assistant generation will be ended by the end-of-sentence token id.
|
152 |
+
|
153 |
+
|
154 |
+
|
155 |
+
```python
|
156 |
+
from transformers import AutoTokenizer
|
157 |
+
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
|
158 |
+
|
159 |
+
chat = [
|
160 |
+
{"role": "user", "content": "Hello, how are you?"},
|
161 |
+
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
|
162 |
+
{"role": "user", "content": "I'd like to show off how chat templating works!"},
|
163 |
+
]
|
164 |
+
|
165 |
+
tokenizer.apply_chat_template(chat, tokenize=False)
|
166 |
+
|
167 |
+
```
|
168 |
+
|
169 |
+
# TextToText
|
170 |
+
|
171 |
+
``` python
|
172 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
173 |
+
|
174 |
+
device = "cuda" # the device to load the model onto
|
175 |
+
|
176 |
+
model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
|
177 |
+
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
|
178 |
+
|
179 |
+
messages = [
|
180 |
+
{"role": "user", "content": "What is your favourite condiment?"},
|
181 |
+
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
|
182 |
+
{"role": "user", "content": "Do you have mayonnaise recipes?"}
|
183 |
+
]
|
184 |
+
|
185 |
+
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
|
186 |
+
|
187 |
+
model_inputs = encodeds.to(device)
|
188 |
+
model.to(device)
|
189 |
+
|
190 |
+
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
|
191 |
+
decoded = tokenizer.batch_decode(generated_ids)
|
192 |
+
print(decoded[0])
|
193 |
+
```
|
194 |
+
|
195 |
+
|
196 |
+
|
197 |
+
|
198 |
+
|
199 |
This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
200 |
|
201 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|