allenai
/

GraspMolmo

task-oriented-grasping

Model card Files Files and versions

abhaybd commited on 27 days ago

Commit

1f54ced

·

verified ·

1 Parent(s): ff0d5b0

Update README.md

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -17,4 +17,29 @@ GraspMolmo is a generalizable open-vocabulary task-oriented grasping (TOG) model
 ## Code Sample
-Coming soon!

 ## Code Sample
+```python
+from PIL import Image
+from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
+img = Image.open("<path_to_image>")
+task = "Pour coffee from the blue mug."
+processor = AutoProcessor.from_pretrained("allenai/GraspMolmo", torch_dtype="auto", device_map="auto", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("allenai/GraspMolmo", torch_dtype="auto", device_map="auto", trust_remote_code=True)
+prompt = f"Point to where I should grasp to accomplish the following task: {task}"
+inputs = processor.process(images=img, text=prompt, return_tensors="pt")
+inputs = {k: v.to(model.device).unsqueeze(0) for k, v in inputs.items()}
+output = model.generate_from_batch(inputs, GenerationConfig(max_new_tokens=256, stop_strings="<|endoftext|>"), tokenizer=processor.tokenizer)
+generated_tokens = output[0, inputs["input_ids"].size(1):]
+generated_text = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True)
+print(generated_text)
+```
+Running the above code could result in the following output:
+```
+In order to accomplish the task "Pour coffee from the blue mug.", the optimal grasp is described as follows: "The grasp is on the middle handle of the blue mug, with fingers grasping the sides of the handle.".
+<point x="28.6" y="20.7" alt="Where to grasp the object">Where to grasp the object</point>
+```