Prompt-Depth-Anything-Vits-Transparent

Introduction

Prompt Depth Anything is a high-resolution and accurate metric depth estimation method, with the following highlights:

using prompting to unleash the power of depth foundation models, inspired by success of prompting in VLM and LLM foundation models.
The widely available iPhone LiDAR is taken as the prompt, guiding the model to produce up to 4K resolution accurate metric depth.
A scalable data pipeline is introduced to train the method.
Prompt Depth Anything benefits downstream applications, including 3D reconstruction and generalized robotic grasping.

Installation

git clone https://github.com/DepthAnything/PromptDA.git
cd PromptDA
pip install -r requirements.txt
pip install -e .

Usage

import requests
from PIL import Image
from transformers import PromptDepthAnythingForDepthEstimation, PromptDepthAnythingImageProcessor

url = "https://github.com/DepthAnything/PromptDA/blob/main/assets/example_images/image.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)


image_processor = PromptDepthAnythingImageProcessor.from_pretrained("depth-anything/prompt-depth-anything-vits-transparent-hf")
model = PromptDepthAnythingForDepthEstimation.from_pretrained("depth-anything/prompt-depth-anything-vits-transparent-hf")

prompt_depth_url = "https://github.com/DepthAnything/PromptDA/blob/main/assets/example_images/arkit_depth.png?raw=true"
prompt_depth = Image.open(requests.get(prompt_depth_url, stream=True).raw)

inputs = image_processor(images=image, return_tensors="pt", prompt_depth=prompt_depth)
with torch.no_grad():
    outputs = model(**inputs)
post_processed_output = image_processor.post_process_depth_estimation(
    outputs,
    target_sizes=[(image.height, image.width)],
)

predicted_depth = post_processed_output[0]["predicted_depth"]

Citation

If you find this project useful, please consider citing:

@inproceedings{lin2024promptda,
  title={Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation},
  author={Lin, Haotong and Peng, Sida and Chen, Jingxiao and Peng, Songyou and Sun, Jiaming and Liu, Minghuan and Bao, Hujun and Feng, Jiashi and Zhou, Xiaowei and Kang, Bingyi},
  journal={arXiv},
  year={2024}
}

Downloads last month: 45

Safetensors

Model size

25.1M params

Tensor type

F32

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for depth-anything/prompt-depth-anything-vits-transparent-hf

Quantizations

1 model

Collection including depth-anything/prompt-depth-anything-vits-transparent-hf

Prompt-Depth-Anything

Collection

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation • 8 items • Updated Dec 23, 2024 • 4