oddadmix
/

Qaari-0.1-Urdu-OCR-VL-2B-Instruct

@@ -1,5 +1,12 @@
 ---
 library_name: peft
 ---
 # Qaari 0.1 Urdu: OCR Model for Urdu Language
@@ -59,27 +66,70 @@ The model has been tested and optimized for the following font sizes:
 ## Usage
-```python
-from transformers import AutoProcessor, AutoModelForVision2Seq
-import requests
-from PIL import Image
-# Load model and processor
-model = AutoModelForVision2Seq.from_pretrained("your-username/qaari-0.1-urdu")
-processor = AutoProcessor.from_pretrained("your-username/qaari-0.1-urdu")
-# Prepare image
-url = "path_to_your_urdu_text_image.jpg"
-image = Image.open(requests.get(url, stream=True).raw)
-# Process image and generate text
-inputs = processor(images=image, return_tensors="pt")
-outputs = model.generate(**inputs, max_length=512)
-text = processor.batch_decode(outputs, skip_special_tokens=True)[0]
-print(text)
 ```
 ## Limitations
 - Performance may degrade when using fonts not included in the fine-tuning dataset
@@ -121,5 +171,4 @@ If you use this model in your research, please cite:
 ## License
-This model is subject to the [license terms](https://huggingface.co/Qwen/Qwen2-VL-2B/blob/main/LICENSE) of the base Qwen2-VL-2B model.

 ---
 library_name: peft
+base_model:
+- unsloth/Qwen2-VL-2B-Instruct-bnb-4bit
+pipeline_tag: image-text-to-text
+tags:
+- ocr
+- urdu
+- qwen2vl
 ---
 # Qaari 0.1 Urdu: OCR Model for Urdu Language
 ## Usage
+<!-- [Try Qari - Google Colab](https://colab.research.google.com/github/NAMAA-ORG/public-notebooks/blob/main/Qari_Free_Colab.ipynb) -->
+You can load this model using the `transformers` and `qwen_vl_utils` library:
+```
+!pip install transformers qwen_vl_utils accelerate>=0.26.0 PEFT -U
+!pip install -U bitsandbytes
+```
+```python
+from PIL import Image
+from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
+import torch
+import os
+from qwen_vl_utils import process_vision_info
+model_name = "oddadmix/Qaari-0.1-Urdu-OCR-Qwen2VL-2B"
+model = Qwen2VLForConditionalGeneration.from_pretrained(
+                model_name,
+                torch_dtype="auto",
+                device_map="auto"
+            )
+processor = AutoProcessor.from_pretrained(model_name)
+max_tokens = 2000
+prompt = "Below is the image of one page of a document, as well as some raw textual content that was previously extracted for it. Just return the plain text representation of this document as if you were reading it naturally. Do not hallucinate."
+image.save("image.png")
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "image": f"file://{src}"},
+            {"type": "text", "text": prompt},
+        ],
+    }
+]
+text = processor.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+image_inputs, video_inputs = process_vision_info(messages)
+inputs = processor(
+    text=[text],
+    images=image_inputs,
+    videos=video_inputs,
+    padding=True,
+    return_tensors="pt",
+)
+inputs = inputs.to("cuda")
+generated_ids = model.generate(**inputs, max_new_tokens=max_tokens)
+generated_ids_trimmed = [
+    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+output_text = processor.batch_decode(
+    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
+)[0]
+os.remove(src)
+print(output_text)
 ```
 ## Limitations
 - Performance may degrade when using fonts not included in the fine-tuning dataset
 ## License
+This model is subject to the [license terms](https://huggingface.co/Qwen/Qwen2-VL-2B/blob/main/LICENSE) of the base Qwen2-VL-2B model.