Robotics
Transformers
Safetensors
qwen2
text-generation
text-generation-inference

AlphaSpace-1.5B

image/gif

Introduction

"AlphaSpace: (Paper), a novel methodology designed to enhance the spatial reasoning capabilities of language models for robotic manipulation in 3D Cartesian space. AlphaSpace employs a hierarchical semantics-based tokenization strategy that encodes spatial information at both coarse and fine-grained levels. Our approach represents objects with their attributes, positions, and height information through structured tokens, enabling precise spatial reasoning without relying on traditional vision-based embeddings. This approach enables LLMs to accurately manipulate objects by positioning them at specific [x, y, z] coordinates.

Code: https://github.com/AlanDao/AlphaSpace

Model Details

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
import torch
from utils import tokenize_desk, SYSTEM_PROMPT

# Load the mode


model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Define your workspace
objects = [
    {"red-cube": [51, 43, 17]},
    {"black-cube": [44, 58, 17]},
    {"purple-cube": [74, 59, 17]},
    {"green-cube": [65, 82, 17]},
]

# Give a natural language instruction
instruction = "Throw the red cube on top of the blue cylinder"
desk, object_height = tokenize_desk(objects)
final_instruction = SYSTEM_PROMPT.format(object_height=object_height,instruction=instruction,TABLE_MAP=desk)
chat = [
    {"role": "user", "content": final_instruction.strip()}
]
tokenized_chat = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, use_system_prompt=False, return_tensors="pt")
# print(len(tokenized_chat[0]))
generated_ids = model.generate(
    tokenized_chat.to("cuda"),
    max_new_tokens=2048,
    do_sample=False,
    temperature=0.6,
)
# Get the solution
result = tokenizer.decode(generated_ids[0][tokenized_chat.shape[1]:], skip_special_tokens=True)
print(result)

Hardware

GPU Configuration: Cluster of 8x NVIDIA H200-SXM-140GB.

GPU Usage:

  • SFT: 40 mins.

Training Arguments

We utilize Llama-Factory library to train the model.

Parameter Continual Training
Epoch 1
Global batch size 128
Learning Rate 1e-4
Learning Scheduler cosine with warmup
Optimizer AdamW Fused
Warmup Ratio 0.1
Max length 4096
Precision bf16

Citation

More Information

Downloads last month
40
Safetensors
Model size
1.78B params
Tensor type
BF16
·
Video Preview
loading

Model tree for Menlo/AlphaSpace-1.5B

Finetuned
(211)
this model
Quantizations
1 model

Dataset used to train Menlo/AlphaSpace-1.5B