prithivMLmods
/

WASP-2B-VL-Highlights

@@ -7,39 +7,39 @@ base_model:
 pipeline_tag: image-text-to-text
 library_name: transformers
 tags:
-- Highlights
 - Generation
 - OCR
 - KIE
 ---
-![fxghdfgh.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/svbsK16pmYR9Q4FoFHNYs.png)
-# **Needle-2B-VL-Highlights**
->  [!Note]
->  The **Needle-2B-VL-Highlights** model is a fine-tuned version of *Qwen2-VL-2B-Instruct*, specifically optimized for **image highlights extraction**, **messy handwriting recognition**, **Optical Character Recognition (OCR)**, **English language understanding**, and **math problem solving with LaTeX formatting**. This model uses a conversational visual-language interface to effectively handle multi-modal tasks.
-[![Open Demo in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/prithivMLmods/Needle-2B-VL-Highlights/blob/main/Callisto_OCR3_2B_Instruct.ipynb)
 # **Key Enhancements:**
-* **State-of-the-art image comprehension** across varying resolutions and aspect ratios:
-  Needle-2B-VL-Highlights delivers top-tier performance on benchmarks such as MathVista, DocVQA, RealWorldQA, and MTVQA.
-* **Image Highlighting Expertise**:
   Specially tuned to **identify and summarize key visual elements** in an image — ideal for **creating visual highlights**, annotations, and summaries.
-* **Handwriting OCR Enhanced**:
   Recognizes **messy and complex handwritten notes** with precision, perfect for digitizing real-world documents.
-* **Video Content Understanding**:
-  Capable of processing videos longer than 20 minutes for **context-aware Q&A, transcription**, and **highlight extraction**.
-* **Multi-device Integration**:
   Can be used as an intelligent agent for mobile phones, robots, and other devices — able to **understand visual scenes and execute actions**.
-* **Multilingual OCR Support**:
   In addition to English and Chinese, supports OCR for European languages, Japanese, Korean, Arabic, and Vietnamese.
 # **Run with Transformers🤗**
@@ -76,7 +76,7 @@ from docx.enum.text import WD_ALIGN_PARAGRAPH
 # Define model options
 MODEL_OPTIONS = {
-    "Needle-2B-VL-Highlights": "prithivMLmods/Needle-2B-VL-Highlights",
 }
 # Preload models and processors into CUDA
@@ -288,7 +288,7 @@ with gr.Blocks(css=css) as demo:
                 model_choice = gr.Dropdown(
                     label="Model Selection",
                     choices=list(MODEL_OPTIONS.keys()),
-                    value="Needle-2B-VL-Highlights"
                 )
                 input_media = gr.File(
                     label="Upload Image", type="filepath"

 pipeline_tag: image-text-to-text
 library_name: transformers
 tags:
 - Generation
 - OCR
 - KIE
+- Highlights-Generator
 ---
+![WASP.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/BpmMrx7Vsm3Pnfqb2xGxC.png)
+ # **WASP-2B-VL-Highlights**
+> \[!Note]
+> The **WASP-2B-VL-Highlights** model is a fine-tuned version of *Qwen2-VL-2B-Instruct*, specifically optimized for **image highlights extraction**, **messy handwriting recognition**, **Optical Character Recognition (OCR)**, **English language understanding**, and **math problem solving with LaTeX formatting**. This model uses a conversational visual-language interface to effectively handle multi-modal tasks.
+[![Open Demo in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/prithivMLmods/WASP-2B-VL-Highlights/blob/main/Callisto_OCR3_2B_Instruct.ipynb)
 # **Key Enhancements:**
+* **State-of-the-art image comprehension** across varying resolutions and aspect ratios:
+  WASP-2B-VL-Highlights delivers top-tier performance on benchmarks such as MathVista, DocVQA, RealWorldQA, and MTVQA.
+* **Image Highlighting Expertise**:
   Specially tuned to **identify and summarize key visual elements** in an image — ideal for **creating visual highlights**, annotations, and summaries.
+* **Handwriting OCR Enhanced**:
   Recognizes **messy and complex handwritten notes** with precision, perfect for digitizing real-world documents.
+* **Video Content Understanding**:
+  Capable of processing videos longer than 20 minutes for **context-aware Q\&A, transcription**, and **highlight extraction**.
+* **Multi-device Integration**:
   Can be used as an intelligent agent for mobile phones, robots, and other devices — able to **understand visual scenes and execute actions**.
+* **Multilingual OCR Support**:
   In addition to English and Chinese, supports OCR for European languages, Japanese, Korean, Arabic, and Vietnamese.
 # **Run with Transformers🤗**
 # Define model options
 MODEL_OPTIONS = {
+    "Needle-2B-VL-Highlights": "prithivMLmods/WASP-2B-VL-Highlights",
 }
 # Preload models and processors into CUDA
                 model_choice = gr.Dropdown(
                     label="Model Selection",
                     choices=list(MODEL_OPTIONS.keys()),
+                    value="WASP-2B-VL-Highlights"
                 )
                 input_media = gr.File(
                     label="Upload Image", type="filepath"