abhaybd commited on
Commit
1f54ced
·
verified ·
1 Parent(s): ff0d5b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -17,4 +17,29 @@ GraspMolmo is a generalizable open-vocabulary task-oriented grasping (TOG) model
17
 
18
  ## Code Sample
19
 
20
- Coming soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Code Sample
19
 
20
+ ```python
21
+ from PIL import Image
22
+ from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
23
+
24
+ img = Image.open("<path_to_image>")
25
+ task = "Pour coffee from the blue mug."
26
+
27
+ processor = AutoProcessor.from_pretrained("allenai/GraspMolmo", torch_dtype="auto", device_map="auto", trust_remote_code=True)
28
+ model = AutoModelForCausalLM.from_pretrained("allenai/GraspMolmo", torch_dtype="auto", device_map="auto", trust_remote_code=True)
29
+
30
+ prompt = f"Point to where I should grasp to accomplish the following task: {task}"
31
+ inputs = processor.process(images=img, text=prompt, return_tensors="pt")
32
+ inputs = {k: v.to(model.device).unsqueeze(0) for k, v in inputs.items()}
33
+
34
+ output = model.generate_from_batch(inputs, GenerationConfig(max_new_tokens=256, stop_strings="<|endoftext|>"), tokenizer=processor.tokenizer)
35
+ generated_tokens = output[0, inputs["input_ids"].size(1):]
36
+ generated_text = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True)
37
+ print(generated_text)
38
+ ```
39
+
40
+ Running the above code could result in the following output:
41
+ ```
42
+ In order to accomplish the task "Pour coffee from the blue mug.", the optimal grasp is described as follows: "The grasp is on the middle handle of the blue mug, with fingers grasping the sides of the handle.".
43
+
44
+ <point x="28.6" y="20.7" alt="Where to grasp the object">Where to grasp the object</point>
45
+ ```