File size: 8,185 Bytes
f22e658
 
 
 
 
 
 
 
 
 
5a1a8e5
1186ac8
 
 
5a1a8e5
4986874
5a1a8e5
 
 
 
1186ac8
dad190a
1186ac8
 
 
5a1a8e5
 
92fe8d8
5a1a8e5
 
e16626b
1186ac8
dad190a
1186ac8
 
 
92fe8d8
e16626b
 
 
 
 
 
 
027a214
 
 
1186ac8
 
 
 
 
025ea7d
1186ac8
025ea7d
 
 
1186ac8
92fe8d8
1186ac8
e16626b
1186ac8
e16626b
1186ac8
5a1a8e5
1186ac8
5a1a8e5
1186ac8
5a1a8e5
d5b7c14
5a1a8e5
 
 
 
 
1186ac8
5a1a8e5
1186ac8
5a1a8e5
1186ac8
e16626b
 
5a1a8e5
1186ac8
5a1a8e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1186ac8
5a1a8e5
1186ac8
36e77f1
1186ac8
e16626b
 
36e77f1
1186ac8
5a1a8e5
 
 
1186ac8
5a1a8e5
 
1186ac8
5a1a8e5
 
1186ac8
5a1a8e5
 
 
1186ac8
5a1a8e5
1186ac8
5a1a8e5
 
 
 
 
 
 
 
1186ac8
5a1a8e5
1186ac8
5a1a8e5
 
1186ac8
5a1a8e5
1186ac8
5a1a8e5
 
 
 
 
 
 
1186ac8
dad190a
 
 
1186ac8
5a1a8e5
1186ac8
24cbe07
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1186ac8
dad190a
 
 
1186ac8
5a1a8e5
1186ac8
5a1a8e5
 
1186ac8
5a1a8e5
 
1186ac8
5a1a8e5
 
1186ac8
5a1a8e5
 
1186ac8
dad190a
1186ac8
5a1a8e5
1186ac8
 
 
 
 
5a1a8e5
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
library_name: transformers
tags:
- agent
- action
- vlm
base_model: Qwen/Qwen2.5-VL-3B-Instruct
license: cc-by-nc-4.0
---

# Model Card for Proxy Lite

<!-- Provide a quick summary of what the model is/does. -->

<div align="center">
  <img src="https://raw.githubusercontent.com/convergence-ai/proxy-lite/refs/heads/main/assets/proxy-lite.png" alt="Proxy Lite logo" width="600" height="auto" style="margin-bottom: 20px;" />
  <h2>
    A mini, open-weights, version of <a href="https://proxy.convergence.ai">Proxy</a>.
  </h2>
</div>

## Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Convergence AI
- **Model type:** 3B Vision-Language Model
- **Agent type**: Web-browsing Agent
- **License:** CC-BY-NC-4.0
- **Finetuned from model:** Qwen/Qwen2.5-VL-3B-Instruct
- [Running the agent](https://github.com/convergence-ai/proxy-lite)

## Running Proxy on the web

<!-- Provide the basic links for the model. -->

https://github.com/convergence-ai/proxy-lite to run Proxy lite on a browser

  ```
  git clone https://github.com/convergence-ai/proxy-lite.git
  make proxy
  proxy "Find some markets near Kings Cross and tell me their ratings."
  ```

<div align="center">
     <img src="https://raw.githubusercontent.com/convergence-ai/proxy-lite/refs/heads/main/assets/demo.gif" alt="Proxy Lite Demo" />
</div>

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

Proxy Lite is designed and trained to complete automated tasks in a web browser.

Full code for running the model is available in the [github repository](https://github.com/convergence-ai/proxy-lite).

This includes a CLI tool for running the model, as well as a streamlit app.

You can use this [endpoint](https://huggingface.co/spaces/convergence-ai/demo-api) for small-scale testing.

---

#### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

We recommend hosting your own endpoint with vLLM, you can use the following command:

```bash
vllm serve convergence-ai/proxy-lite-3b \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --port 8008 \
```

The tool arguments are **very important** for parsing the tool calls from the model appropriately.

> **Important:** Qwen-2.5-VL Support in `transformers` is not yet available in the latest release so be sure to install from source.

#### Message History

When it comes to using and prompting Proxy Lite, please refer to the [repository](https://github.com/convergence-ai/proxy-lite) for more information, but the model expects a message history of the form:

```python
message_history = [
    {
        "role": "system", 
        "content": "You are Proxy Lite...", # Full system prompt in src/proxy_lite/agents/proxy_lite_agent.py
    }, # System prompt
    {
        "role": "user", 
        "content": "Find some markets near Kings Cross and tell me their ratings.",
    }, # Set the task
    {
        "role": "user", 
        "content": [
            {"type": "image_url", "image_url": {base64_encoded_screenshot} },
            {"type": "text", "text": "URL: https://www.google.com/ \n- [0] <a>About</a> \n- [1] <a>Store</a>...."}
        ] # This is the observation from the environment
    },
]
```

This would then build up the message history, alternating between the assistant (who takes the *action*) and the user (who provides the *observation*).

> **Context-Window Management:** When making calls to the model, all the observations other than the current one are discarded in order to reduce the large number of image tokens required. Since the model responses include reflection on the observations and are all included in the message history, the model is still aware of the entire history when planning new actions.

#### Tools

You should also pass the `Tools` that the model has access to, these will define the action space available to the model. You can do this with `transformers`:

```python
from qwen_vl_utils import process_vision_info
from transformers import AutoProcessor

from proxy_lite.tools import ReturnValueTool, BrowserTool
from proxy_lite.serializer import OpenAICompatableSerializer

processor = AutoProcessor.from_pretrained("convergence-ai/proxy-lite-3b")
tools = OpenAICompatableSerializer().serialize_tools([ReturnValueTool(), BrowserTool(session=None)])

templated_messages = processor.apply_chat_template(
    message_history, tokenize=False, add_generation_prompt=True, tools=tools
)

image_inputs, video_inputs = process_vision_info(message_history)

batch = processor(
    text=[templated_messages],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
```

Or you can send to the endpoint directly, which will handle the formatting:

```python
from openai import OpenAI

client = OpenAI(base_url="http://convergence-ai-demo-api.hf.space/v1")

response = client.chat.completions.create(
    model="convergence-ai/proxy-lite-3b",
    messages=message_history,
    tools=tools,
    tool_choice="auto",
)
```

---

## Evaluation

Proxy Lite scored 72.4% on the [WebVoyager](https://huggingface.co/datasets/convergence-ai/WebVoyager2025Valid) benchmark, placing it 1st out of all available open-weights models.

A breakdown of the results by website is shown below:

| web_name            | Success Rate (%) | Finish Rate (%) | Avg. Steps |
|---------------------|-----------------|-----------------|------------|
| Allrecipes         | 87.8            | 95.1            | 10.3       |
| Amazon            | 70.0            | 90.0            | 7.1        |
| Apple             | 82.1            | 89.7            | 10.7       |
| ArXiv             | 60.5            | 79.1            | 16.0       |
| BBC News          | 69.4            | 77.8            | 15.9       |
| Booking           | 70.0            | 85.0            | 24.8       |
| Cambridge Dict.   | 86.0            | 97.7            | 5.7        |
| Coursera          | 82.5            | 97.5            | 4.7        |
| ESPN              | 53.8            | 87.2            | 14.9       |
| GitHub            | 85.0            | 92.5            | 10.0       |
| Google Flights    | 38.5            | 51.3            | 34.8       |
| Google Map        | 78.9            | 94.7            | 9.6        |
| Google Search     | 71.4            | 92.9            | 6.0        |
| Huggingface       | 68.6            | 74.3            | 18.4       |
| Wolfram Alpha     | 78.3            | 93.5            | 6.1        |


---

## Out-of-Scope Use

Proxy Lite is specifically designed to automate routine tasks within a web browser environment. However, it should **not be used** for:

- **High-Stakes or Safety-Critical Applications:**  
  _Avoid using Proxy Lite for tasks such as financial transactions, healthcare operations, legal decision-making, or emergency responses, where any error could lead to serious harm or significant financial loss._

- **Unauthorized or Invasive Data Extraction:**  
  _Automated scraping or extraction of data from websites should only be performed with explicit permission. Proxy Lite should not be used to bypass websites' terms of service, copyright restrictions, or privacy policies._

- **Interactions with Malicious or Unverified Websites:**  
  _Using the model to navigate or interact with suspicious or untrusted websites may expose the system to security threats such as malware, phishing attacks, or other forms of cyber exploitation._

- **Compliance-Regulated or Legally Sensitive Actions:**  
  _Tasks that require adherence to strict legal or regulatory standards (e.g., processing personal data or sensitive information) should employ additional safeguards beyond what the model provides._

---

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

```bibtex
@article{proxy-lite,
  title={Proxy Lite - A Mini, Open-weights, Autonomous Assistant},
  author={Convergence AI},
  year={2025}
}
```