nvidia
/

Cosmos-Reason1-7B

@@ -129,7 +129,7 @@ See [Cosmos-Reason1](https://github.com/nvidia-cosmos/cosmos-reason1) for detail
 # Evaluation
-Please see our [technical paper](https://arxiv.org/pdf/2503.15558) for detailed evaluations on physical common sense and embodied reasoning. Part of the evaluation datasets are released under [Cosmos-Reason1-Benchmark](https://huggingface.co/datasets/nvidia/Cosmos-Reason1-Benchmark-Sample). The embodied reasoning datasets and benchmarks focus on the following areas: robotics (RoboVQA, BridgeDataV2, Agibot, RobFail), ego-centric human demonstration (HoloAssist), and Autonomous Vehicle (AV) driving video data. The AV dataset is collected and annotated by NVIDIA.
 All datasets go through the data annotation process described in the technical paper to prepare training and evaluation data and annotations.
 **Data Collection Method**:
@@ -159,6 +159,7 @@ Modality: Video (mp4) and Text
 ## Dataset Quantification
 We release the embodied reasoning data and benchmarks. Each data sample is a pair of video and text. The text annotations include understanding and reasoning annotations described in the Cosmos-Reason1 paper. Each video may have multiple text annotations. The quantity of the video and text pairs is described in the table below.
 |                   | [RoboVQA](https://robovqa.github.io/)        | AV       | [BridgeDataV2](https://rail-berkeley.github.io/bridgedata/)| [Agibot](https://github.com/OpenDriveLab/AgiBot-World)| [HoloAssist](https://holoassist.github.io/)       | [RoboFail](https://robot-reflect.github.io/)                               | Total Storage Size |
 |--------------------|---------------------------------------------|----------|------------------------------------------------------|------------------------------------------------|------------------------------------------------|------------------------------------------------|--------------------|

 # Evaluation
+Please see our [technical paper](https://arxiv.org/pdf/2503.15558) for detailed evaluations on physical common sense and embodied reasoning. Part of the evaluation datasets are released under [Cosmos-Reason1-Benchmark](https://huggingface.co/datasets/nvidia/Cosmos-Reason1-Benchmark). The embodied reasoning datasets and benchmarks focus on the following areas: robotics (RoboVQA, BridgeDataV2, Agibot, RobFail), ego-centric human demonstration (HoloAssist), and Autonomous Vehicle (AV) driving video data. The AV dataset is collected and annotated by NVIDIA.
 All datasets go through the data annotation process described in the technical paper to prepare training and evaluation data and annotations.
 **Data Collection Method**:
 ## Dataset Quantification
 We release the embodied reasoning data and benchmarks. Each data sample is a pair of video and text. The text annotations include understanding and reasoning annotations described in the Cosmos-Reason1 paper. Each video may have multiple text annotations. The quantity of the video and text pairs is described in the table below.
+**The AV data is currently unavailable and will be uploaded soon!**
 |                   | [RoboVQA](https://robovqa.github.io/)        | AV       | [BridgeDataV2](https://rail-berkeley.github.io/bridgedata/)| [Agibot](https://github.com/OpenDriveLab/AgiBot-World)| [HoloAssist](https://holoassist.github.io/)       | [RoboFail](https://robot-reflect.github.io/)                               | Total Storage Size |
 |--------------------|---------------------------------------------|----------|------------------------------------------------------|------------------------------------------------|------------------------------------------------|------------------------------------------------|--------------------|