nielsr HF Staff commited on
Commit
b1f6c1e
·
verified ·
1 Parent(s): 03d8897

Add transformers metadata, link to project page and paper

Browse files

Adds the transformers `library_name` to the model card. Also links to the [paper](https://huggingface.co/papers/2506.09930) and the project page at https://ai4ce.github.io/INT-ACT/.

Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -1,11 +1,17 @@
1
  ---
2
  license: mit
3
  pipeline_tag: robotics
 
4
  ---
 
5
  # Octo Small
6
 
7
  See https://github.com/octo-models/octo for instructions for using this model.
8
 
 
 
 
 
9
  Octo Small is trained with a window size of 2, predicting 7-dimensional actions 4 steps into the future using a diffusion policy. The model is a Transformer with 27M parameters (equivalent to a ViT-S). Images are tokenized by preprocessing with a lightweight convolutional encoder, then grouped into 16x16 patches. Language is tokenized by applying the T5 tokenizer, and then applying the T5-Base language encoder.
10
 
11
  Observations and tasks conform to the following spec:
 
1
  ---
2
  license: mit
3
  pipeline_tag: robotics
4
+ library_name: transformers
5
  ---
6
+
7
  # Octo Small
8
 
9
  See https://github.com/octo-models/octo for instructions for using this model.
10
 
11
+ Project page: https://ai4ce.github.io/INT-ACT/
12
+
13
+ This model was used for the following paper: From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models [https://huggingface.co/papers/2506.09930]
14
+
15
  Octo Small is trained with a window size of 2, predicting 7-dimensional actions 4 steps into the future using a diffusion policy. The model is a Transformer with 27M parameters (equivalent to a ViT-S). Images are tokenized by preprocessing with a lightweight convolutional encoder, then grouped into 16x16 patches. Language is tokenized by applying the T5 tokenizer, and then applying the T5-Base language encoder.
16
 
17
  Observations and tasks conform to the following spec: