jadechoghari
/

Ferret-UI-Gemma2b

@@ -10,43 +10,58 @@ This is the **Gemma-2B** version of ferret-ui. It follows from [this paper](http
 ## How to Use 🤗📱
-You will need first to download `builder.py`, `conversation.py`, and `inference.py` locally.
 ```bash
 wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
 wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
 wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
 ```
 ### Usage:
 ```python
-from inference import infer_ui_task
-# Pass an image and the online model path
-image_path = 'image.jpg'
-model_path = 'jadechoghari/Ferret-UI-Gemma2b'
-```
-### Task not requiring bounding box
-Choose a task from ['widget_listing', 'find_text', 'find_icons', 'find_widget', 'conversation_interaction']
-```python
-task = 'conversation_interaction'
-result = infer_ui_task(image_path, "How do I navigate to the Games tab?", model_path, task)
-print("Result:", result)
 ```
-### Task requiring bounding box
-Choose a task from ['widgetcaptions', 'taperception', 'ocr', 'icon_recognition', 'widget_classification', 'example_0']
 ```python
-task = 'widgetcaptions'
-region = (50, 50, 200, 200)
-result = infer_ui_task(image_path, "Describe the contents of the box.", model_path, task, region=region)
-print("Result:", result)
 ```
-### Task with no image processing
-Choose a task from ['screen2words', 'detailed_description', 'conversation_perception', 'gpt4']
 ```python
-task = 'detailed_description'
-result = infer_ui_task(image_path, "Please describe the screen in detail.", model_path, task)
-print("Result:", result)
 ```

 ## How to Use 🤗📱
+You will need first to download `builder.py`, `conversation.py`, `inference.py` and `model_UI.py` locally.
 ```bash
 wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
 wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
 wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
+wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/model_UI.py
 ```
 ### Usage:
 ```python
+from inference import inference_and_run
+image_path = "appstore_reminders.png"
+prompt = "Describe the image in details"
+# Call the function without a box
+processed_image, inference_text = inference_and_run(image_path, prompt, conv_mode="ferret_gemma_instruct", model_path="jadechoghari/Ferret-UI-Gemma2b")
+# Output processed text
+print("Inference Text:", inference_text)
 ```
 ```python
+# Task with bounding boxes
+image_path = "appstore_reminders.png"
+prompt = "What's inside the selected region?"
+box = [189, 906, 404, 970]
+processed_image, inference_text = inference_and_run(
+    image_path=image_path,
+    prompt=prompt,
+    conv_mode="ferret_gemma_instruct",
+    model_path="jadechoghari/Ferret-UI-Gemma2b",
+    box=box
+)
+# otput the inference text and optionally save the processed image
+print("Inference Text:", inference_text)
 ```
 ```python
+# GROUNDING PROMPTS
+GROUNDING_TEMPLATES = [
+	'\nProvide the bounding boxes of the mentioned objects.',
+ 	'\nInclude the coordinates for each mentioned object.',
+	'\nLocate the objects with their coordinates.',
+	'\nAnswer in [x1, y1, x2, y2] format.',
+	'\nMention the objects and their locations using the format [x1, y1, x2, y2].',
+	'\nDraw boxes around the mentioned objects.',
+	'\nUse boxes to show where each thing is.',
+	'\nTell me where the objects are with coordinates.',
+	'\nList where each object is with boxes.',
+	'\nShow me the regions with boxes.'
+]
 ```