jadechoghari commited on
Commit
c659477
•
1 Parent(s): 03d67e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -23
README.md CHANGED
@@ -10,43 +10,58 @@ This is the **Gemma-2B** version of ferret-ui. It follows from [this paper](http
10
 
11
  ## How to Use 🤗📱
12
 
13
- You will need first to download `builder.py`, `conversation.py`, and `inference.py` locally.
14
 
15
  ```bash
16
  wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
17
  wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
18
  wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
 
19
  ```
20
 
21
  ### Usage:
22
  ```python
23
- from inference import infer_ui_task
24
- # Pass an image and the online model path
25
- image_path = 'image.jpg'
26
- model_path = 'jadechoghari/Ferret-UI-Gemma2b'
27
- ```
28
 
29
- ### Task not requiring bounding box
30
- Choose a task from ['widget_listing', 'find_text', 'find_icons', 'find_widget', 'conversation_interaction']
31
- ```python
32
- task = 'conversation_interaction'
33
- result = infer_ui_task(image_path, "How do I navigate to the Games tab?", model_path, task)
34
- print("Result:", result)
35
  ```
36
 
37
- ### Task requiring bounding box
38
- Choose a task from ['widgetcaptions', 'taperception', 'ocr', 'icon_recognition', 'widget_classification', 'example_0']
39
  ```python
40
- task = 'widgetcaptions'
41
- region = (50, 50, 200, 200)
42
- result = infer_ui_task(image_path, "Describe the contents of the box.", model_path, task, region=region)
43
- print("Result:", result)
 
 
 
 
 
 
 
 
 
 
 
44
  ```
45
 
46
- ### Task with no image processing
47
- Choose a task from ['screen2words', 'detailed_description', 'conversation_perception', 'gpt4']
48
  ```python
49
- task = 'detailed_description'
50
- result = infer_ui_task(image_path, "Please describe the screen in detail.", model_path, task)
51
- print("Result:", result)
 
 
 
 
 
 
 
 
 
 
52
  ```
 
10
 
11
  ## How to Use 🤗📱
12
 
13
+ You will need first to download `builder.py`, `conversation.py`, `inference.py` and `model_UI.py` locally.
14
 
15
  ```bash
16
  wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
17
  wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
18
  wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
19
+ wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/model_UI.py
20
  ```
21
 
22
  ### Usage:
23
  ```python
24
+ from inference import inference_and_run
25
+ image_path = "appstore_reminders.png"
26
+ prompt = "Describe the image in details"
 
 
27
 
28
+ # Call the function without a box
29
+ processed_image, inference_text = inference_and_run(image_path, prompt, conv_mode="ferret_gemma_instruct", model_path="jadechoghari/Ferret-UI-Gemma2b")
30
+
31
+ # Output processed text
32
+ print("Inference Text:", inference_text)
 
33
  ```
34
 
 
 
35
  ```python
36
+ # Task with bounding boxes
37
+ image_path = "appstore_reminders.png"
38
+ prompt = "What's inside the selected region?"
39
+ box = [189, 906, 404, 970]
40
+
41
+ processed_image, inference_text = inference_and_run(
42
+ image_path=image_path,
43
+ prompt=prompt,
44
+ conv_mode="ferret_gemma_instruct",
45
+ model_path="jadechoghari/Ferret-UI-Gemma2b",
46
+ box=box
47
+ )
48
+
49
+ # otput the inference text and optionally save the processed image
50
+ print("Inference Text:", inference_text)
51
  ```
52
 
 
 
53
  ```python
54
+ # GROUNDING PROMPTS
55
+ GROUNDING_TEMPLATES = [
56
+ '\nProvide the bounding boxes of the mentioned objects.',
57
+ '\nInclude the coordinates for each mentioned object.',
58
+ '\nLocate the objects with their coordinates.',
59
+ '\nAnswer in [x1, y1, x2, y2] format.',
60
+ '\nMention the objects and their locations using the format [x1, y1, x2, y2].',
61
+ '\nDraw boxes around the mentioned objects.',
62
+ '\nUse boxes to show where each thing is.',
63
+ '\nTell me where the objects are with coordinates.',
64
+ '\nList where each object is with boxes.',
65
+ '\nShow me the regions with boxes.'
66
+ ]
67
  ```