TripoSG-scribble - Fast 3D Shape Prototyping with Scribble and Prompt

TripoSG-scribble converts a scribble image and a text prompt to a 3D shape. TripoSG-scribble is a variant of TripoSG. TripoSG is a state-of-the-art image-to-3D generation foundation model that leverages large-scale rectified flow transformers to produce high-fidelity 3D shapes from single images.

Model Description

Model Architecture

TripoSG utilizes a novel architecture combining:

  • Rectified Flow (RF) based Transformer for stable, linear trajectory modeling
  • Advanced VAE with SDF-based representation and hybrid geometric supervision
  • Cross-attention mechanism for image feature condition
  • 1.5B parameters operating on 2048 latent tokens

For inference efficiency, TripoSG-scribble is different from TripoSG in:

  • TripoSG-scribble is a CFG-distilled model and should be used with CFG=0
  • TripoSG-scribble is trained with 512 latent tokens

Intended Uses

This model is designed for:

  • Converting scribble image and text prompt to high-quality 3D meshes
  • Creative and design applications
  • Gaming and VFX asset creation
  • Prototyping and visualization

Requirements

  • CUDA-capable GPU (>8GB VRAM)

Usage

For detailed usage instructions, please visit our GitHub repository.

About

TripoSG-scribble is developed by Tripo, VAST AI Research, pushing the boundaries of 3D Generative AI. For more information:

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using VAST-AI/TripoSG-scribble 1