Update README.md
Browse files
README.md
CHANGED
@@ -17,4 +17,29 @@ GraspMolmo is a generalizable open-vocabulary task-oriented grasping (TOG) model
|
|
17 |
|
18 |
## Code Sample
|
19 |
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
## Code Sample
|
19 |
|
20 |
+
```python
|
21 |
+
from PIL import Image
|
22 |
+
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
|
23 |
+
|
24 |
+
img = Image.open("<path_to_image>")
|
25 |
+
task = "Pour coffee from the blue mug."
|
26 |
+
|
27 |
+
processor = AutoProcessor.from_pretrained("allenai/GraspMolmo", torch_dtype="auto", device_map="auto", trust_remote_code=True)
|
28 |
+
model = AutoModelForCausalLM.from_pretrained("allenai/GraspMolmo", torch_dtype="auto", device_map="auto", trust_remote_code=True)
|
29 |
+
|
30 |
+
prompt = f"Point to where I should grasp to accomplish the following task: {task}"
|
31 |
+
inputs = processor.process(images=img, text=prompt, return_tensors="pt")
|
32 |
+
inputs = {k: v.to(model.device).unsqueeze(0) for k, v in inputs.items()}
|
33 |
+
|
34 |
+
output = model.generate_from_batch(inputs, GenerationConfig(max_new_tokens=256, stop_strings="<|endoftext|>"), tokenizer=processor.tokenizer)
|
35 |
+
generated_tokens = output[0, inputs["input_ids"].size(1):]
|
36 |
+
generated_text = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True)
|
37 |
+
print(generated_text)
|
38 |
+
```
|
39 |
+
|
40 |
+
Running the above code could result in the following output:
|
41 |
+
```
|
42 |
+
In order to accomplish the task "Pour coffee from the blue mug.", the optimal grasp is described as follows: "The grasp is on the middle handle of the blue mug, with fingers grasping the sides of the handle.".
|
43 |
+
|
44 |
+
<point x="28.6" y="20.7" alt="Where to grasp the object">Where to grasp the object</point>
|
45 |
+
```
|