oddadmix commited on
Commit
7f5101f
·
verified ·
1 Parent(s): 5e01d8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -17
README.md CHANGED
@@ -1,5 +1,12 @@
1
  ---
2
  library_name: peft
 
 
 
 
 
 
 
3
  ---
4
 
5
  # Qaari 0.1 Urdu: OCR Model for Urdu Language
@@ -59,27 +66,70 @@ The model has been tested and optimized for the following font sizes:
59
 
60
  ## Usage
61
 
62
- ```python
63
- from transformers import AutoProcessor, AutoModelForVision2Seq
64
- import requests
65
- from PIL import Image
66
 
67
- # Load model and processor
68
- model = AutoModelForVision2Seq.from_pretrained("your-username/qaari-0.1-urdu")
69
- processor = AutoProcessor.from_pretrained("your-username/qaari-0.1-urdu")
70
 
71
- # Prepare image
72
- url = "path_to_your_urdu_text_image.jpg"
73
- image = Image.open(requests.get(url, stream=True).raw)
 
 
74
 
75
- # Process image and generate text
76
- inputs = processor(images=image, return_tensors="pt")
77
- outputs = model.generate(**inputs, max_length=512)
78
- text = processor.batch_decode(outputs, skip_special_tokens=True)[0]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
- print(text)
81
  ```
82
 
 
83
  ## Limitations
84
 
85
  - Performance may degrade when using fonts not included in the fine-tuning dataset
@@ -121,5 +171,4 @@ If you use this model in your research, please cite:
121
 
122
  ## License
123
 
124
- This model is subject to the [license terms](https://huggingface.co/Qwen/Qwen2-VL-2B/blob/main/LICENSE) of the base Qwen2-VL-2B model.
125
-
 
1
  ---
2
  library_name: peft
3
+ base_model:
4
+ - unsloth/Qwen2-VL-2B-Instruct-bnb-4bit
5
+ pipeline_tag: image-text-to-text
6
+ tags:
7
+ - ocr
8
+ - urdu
9
+ - qwen2vl
10
  ---
11
 
12
  # Qaari 0.1 Urdu: OCR Model for Urdu Language
 
66
 
67
  ## Usage
68
 
 
 
 
 
69
 
70
+ <!-- [Try Qari - Google Colab](https://colab.research.google.com/github/NAMAA-ORG/public-notebooks/blob/main/Qari_Free_Colab.ipynb) -->
 
 
71
 
72
+ You can load this model using the `transformers` and `qwen_vl_utils` library:
73
+ ```
74
+ !pip install transformers qwen_vl_utils accelerate>=0.26.0 PEFT -U
75
+ !pip install -U bitsandbytes
76
+ ```
77
 
78
+ ```python
79
+ from PIL import Image
80
+ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
81
+ import torch
82
+ import os
83
+ from qwen_vl_utils import process_vision_info
84
+
85
+
86
+
87
+ model_name = "oddadmix/Qaari-0.1-Urdu-OCR-Qwen2VL-2B"
88
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
89
+ model_name,
90
+ torch_dtype="auto",
91
+ device_map="auto"
92
+ )
93
+ processor = AutoProcessor.from_pretrained(model_name)
94
+ max_tokens = 2000
95
+
96
+ prompt = "Below is the image of one page of a document, as well as some raw textual content that was previously extracted for it. Just return the plain text representation of this document as if you were reading it naturally. Do not hallucinate."
97
+ image.save("image.png")
98
+
99
+ messages = [
100
+ {
101
+ "role": "user",
102
+ "content": [
103
+ {"type": "image", "image": f"file://{src}"},
104
+ {"type": "text", "text": prompt},
105
+ ],
106
+ }
107
+ ]
108
+ text = processor.apply_chat_template(
109
+ messages, tokenize=False, add_generation_prompt=True
110
+ )
111
+ image_inputs, video_inputs = process_vision_info(messages)
112
+ inputs = processor(
113
+ text=[text],
114
+ images=image_inputs,
115
+ videos=video_inputs,
116
+ padding=True,
117
+ return_tensors="pt",
118
+ )
119
+ inputs = inputs.to("cuda")
120
+ generated_ids = model.generate(**inputs, max_new_tokens=max_tokens)
121
+ generated_ids_trimmed = [
122
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
123
+ ]
124
+ output_text = processor.batch_decode(
125
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
126
+ )[0]
127
+ os.remove(src)
128
+ print(output_text)
129
 
 
130
  ```
131
 
132
+
133
  ## Limitations
134
 
135
  - Performance may degrade when using fonts not included in the fine-tuning dataset
 
171
 
172
  ## License
173
 
174
+ This model is subject to the [license terms](https://huggingface.co/Qwen/Qwen2-VL-2B/blob/main/LICENSE) of the base Qwen2-VL-2B model.