Improve model card: Add robotics pipeline tag and canonical links

This PR enhances the model card for MolmoAct-7B-D by:

- Adding the `pipeline_tag: robotics` to the metadata, which helps users discover the model via the Hugging Face Hub's pipeline filters (e.g., at https://huggingface.co/models?pipeline_tag=robotics).
- Updating the paper link in the "Quick links" section to point to the Hugging Face Papers page ([https://huggingface.co/papers/2508.07917](https://huggingface.co/papers/2508.07917)) for consistency and improved discoverability within the Hub.
- Adding a direct link to the GitHub repository ([https://github.com/allenai/MolmoAct](https://github.com/allenai/MolmoAct)) in the "Quick links" section for easier access to the codebase.

These changes will help researchers and practitioners more easily find and understand the model's capabilities and resources.

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -1,11 +1,12 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - Qwen/Qwen2.5-7B
 - google/siglip2-so400m-patch14-384
 library_name: transformers
 tags:
 - molmoact
 - molmo
@@ -21,7 +22,7 @@ tags:
 # MolmoAct 7B-D
 MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family [here](https://huggingface.co/collections/allenai/molmoact-689697591a3936fba38174d7).
-**Learn more about MolmoAct** in our announcement [blog post](https://allenai.org/blog/molmoact) or the [paper](https://huggingface.co/allenai/MolmoAct-7B-D-0812/blob/main/MolmoAct_Technical_Report.pdf).
 **MolmoAct 7B-D** is based on [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) and uses [SigLip2](https://huggingface.co/google/siglip2-so400m-patch14-384) as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's [Pre-training Mixture](https://huggingface.co/datasets/allenai/MolmoAct-Pretraining-Mixture), and then mid-trained on [MolmoAct Dataset](https://huggingface.co/datasets/allenai/MolmoAct-Midtraining-Mixture). This model is intended to be used for downstream post-training.
@@ -30,7 +31,8 @@ This checkpoint is a **preview** of the MolmoAct release. All artifacts used in
 Quick links:
 - 📂 [All Models](https://huggingface.co/collections/allenai/molmoact-689697591a3936fba38174d7)
 - 📂 [All Data](https://huggingface.co/collections/allenai/molmoact-data-mixture-6897e583e13b6c2cf3ea2b80)
-- 📃 [Paper](https://arxiv.org/pdf/2508.07917)
 - 🎥 [Blog Post](https://allenai.org/blog/molmoact)
 - 🎥 [Video](https://youtu.be/-_wag1X25OE?si=Xi_kUaJTmcQBx1f6)

 ---
 base_model:
 - Qwen/Qwen2.5-7B
 - google/siglip2-so400m-patch14-384
+language:
+- en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: robotics
 tags:
 - molmoact
 - molmo
 # MolmoAct 7B-D
 MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family [here](https://huggingface.co/collections/allenai/molmoact-689697591a3936fba38174d7).
+**Learn more about MolmoAct** in our announcement [blog post](https://allenai.org/blog/molmoact) or the [paper](https://huggingface.co/papers/2508.07917).
 **MolmoAct 7B-D** is based on [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) and uses [SigLip2](https://huggingface.co/google/siglip2-so400m-patch14-384) as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's [Pre-training Mixture](https://huggingface.co/datasets/allenai/MolmoAct-Pretraining-Mixture), and then mid-trained on [MolmoAct Dataset](https://huggingface.co/datasets/allenai/MolmoAct-Midtraining-Mixture). This model is intended to be used for downstream post-training.
 Quick links:
 - 📂 [All Models](https://huggingface.co/collections/allenai/molmoact-689697591a3936fba38174d7)
 - 📂 [All Data](https://huggingface.co/collections/allenai/molmoact-data-mixture-6897e583e13b6c2cf3ea2b80)
+- 📄 [Paper](https://huggingface.co/papers/2508.07917)
+- 💻 [GitHub Repository](https://github.com/allenai/MolmoAct)
 - 🎥 [Blog Post](https://allenai.org/blog/molmoact)
 - 🎥 [Video](https://youtu.be/-_wag1X25OE?si=Xi_kUaJTmcQBx1f6)