Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -7,39 +7,39 @@ base_model:
7
  pipeline_tag: image-text-to-text
8
  library_name: transformers
9
  tags:
10
- - Highlights
11
  - Generation
12
  - OCR
13
  - KIE
 
14
  ---
15
 
16
- ![fxghdfgh.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/svbsK16pmYR9Q4FoFHNYs.png)
17
-
18
- # **Needle-2B-VL-Highlights**
19
 
20
- > [!Note]
21
- > The **Needle-2B-VL-Highlights** model is a fine-tuned version of *Qwen2-VL-2B-Instruct*, specifically optimized for **image highlights extraction**, **messy handwriting recognition**, **Optical Character Recognition (OCR)**, **English language understanding**, and **math problem solving with LaTeX formatting**. This model uses a conversational visual-language interface to effectively handle multi-modal tasks.
22
 
23
- [![Open Demo in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/prithivMLmods/Needle-2B-VL-Highlights/blob/main/Callisto_OCR3_2B_Instruct.ipynb)
 
 
 
24
 
25
  # **Key Enhancements:**
26
 
27
- * **State-of-the-art image comprehension** across varying resolutions and aspect ratios:
28
- Needle-2B-VL-Highlights delivers top-tier performance on benchmarks such as MathVista, DocVQA, RealWorldQA, and MTVQA.
29
 
30
- * **Image Highlighting Expertise**:
31
  Specially tuned to **identify and summarize key visual elements** in an image — ideal for **creating visual highlights**, annotations, and summaries.
32
 
33
- * **Handwriting OCR Enhanced**:
34
  Recognizes **messy and complex handwritten notes** with precision, perfect for digitizing real-world documents.
35
 
36
- * **Video Content Understanding**:
37
- Capable of processing videos longer than 20 minutes for **context-aware Q&A, transcription**, and **highlight extraction**.
38
 
39
- * **Multi-device Integration**:
40
  Can be used as an intelligent agent for mobile phones, robots, and other devices — able to **understand visual scenes and execute actions**.
41
 
42
- * **Multilingual OCR Support**:
43
  In addition to English and Chinese, supports OCR for European languages, Japanese, Korean, Arabic, and Vietnamese.
44
 
45
  # **Run with Transformers🤗**
@@ -76,7 +76,7 @@ from docx.enum.text import WD_ALIGN_PARAGRAPH
76
 
77
  # Define model options
78
  MODEL_OPTIONS = {
79
- "Needle-2B-VL-Highlights": "prithivMLmods/Needle-2B-VL-Highlights",
80
  }
81
 
82
  # Preload models and processors into CUDA
@@ -288,7 +288,7 @@ with gr.Blocks(css=css) as demo:
288
  model_choice = gr.Dropdown(
289
  label="Model Selection",
290
  choices=list(MODEL_OPTIONS.keys()),
291
- value="Needle-2B-VL-Highlights"
292
  )
293
  input_media = gr.File(
294
  label="Upload Image", type="filepath"
 
7
  pipeline_tag: image-text-to-text
8
  library_name: transformers
9
  tags:
 
10
  - Generation
11
  - OCR
12
  - KIE
13
+ - Highlights-Generator
14
  ---
15
 
16
+ ![WASP.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/BpmMrx7Vsm3Pnfqb2xGxC.png)
 
 
17
 
18
+ # **WASP-2B-VL-Highlights**
 
19
 
20
+ > \[!Note]
21
+ > The **WASP-2B-VL-Highlights** model is a fine-tuned version of *Qwen2-VL-2B-Instruct*, specifically optimized for **image highlights extraction**, **messy handwriting recognition**, **Optical Character Recognition (OCR)**, **English language understanding**, and **math problem solving with LaTeX formatting**. This model uses a conversational visual-language interface to effectively handle multi-modal tasks.
22
+
23
+ [![Open Demo in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/prithivMLmods/WASP-2B-VL-Highlights/blob/main/Callisto_OCR3_2B_Instruct.ipynb)
24
 
25
  # **Key Enhancements:**
26
 
27
+ * **State-of-the-art image comprehension** across varying resolutions and aspect ratios:
28
+ WASP-2B-VL-Highlights delivers top-tier performance on benchmarks such as MathVista, DocVQA, RealWorldQA, and MTVQA.
29
 
30
+ * **Image Highlighting Expertise**:
31
  Specially tuned to **identify and summarize key visual elements** in an image — ideal for **creating visual highlights**, annotations, and summaries.
32
 
33
+ * **Handwriting OCR Enhanced**:
34
  Recognizes **messy and complex handwritten notes** with precision, perfect for digitizing real-world documents.
35
 
36
+ * **Video Content Understanding**:
37
+ Capable of processing videos longer than 20 minutes for **context-aware Q\&A, transcription**, and **highlight extraction**.
38
 
39
+ * **Multi-device Integration**:
40
  Can be used as an intelligent agent for mobile phones, robots, and other devices — able to **understand visual scenes and execute actions**.
41
 
42
+ * **Multilingual OCR Support**:
43
  In addition to English and Chinese, supports OCR for European languages, Japanese, Korean, Arabic, and Vietnamese.
44
 
45
  # **Run with Transformers🤗**
 
76
 
77
  # Define model options
78
  MODEL_OPTIONS = {
79
+ "Needle-2B-VL-Highlights": "prithivMLmods/WASP-2B-VL-Highlights",
80
  }
81
 
82
  # Preload models and processors into CUDA
 
288
  model_choice = gr.Dropdown(
289
  label="Model Selection",
290
  choices=list(MODEL_OPTIONS.keys()),
291
+ value="WASP-2B-VL-Highlights"
292
  )
293
  input_media = gr.File(
294
  label="Upload Image", type="filepath"