Sharp Monocular View Synthesis in Less Than a Second (Core ML Edition)

This software project is a communnity contribution and not affiliated with the original the research paper:

Sharp Monocular View Synthesis in Less Than a Second by Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun.

We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements.

This release includes a fully validated Core ML (.mlpackage) version of SHARP, optimized for CPU, GPU, and Neural Engine inference on macOS and iOS.

Rendered using Splat Viewer

Getting started

📦 Download the Core ML Model Only

pip install huggingface-hub
huggingface-cli download --include sharp.mlpackage/ --local-dir . pearsonkyle/Sharp-coreml

🧰 Clone the Full Repository

This will include the inference and model conversion/validation scripts.

brew install git-xet
git xet install

Clone the model repository:

git clone [email protected]:pearsonkyle/Sharp-coreml

📱 Run Inference on Apple Devices

Use the provided sharp.swift inference script to load the model and generate 3D Gaussian splats (PLY) from any image:

# Compile the Swift runner (requires Xcode command-line tools)
swiftc -O -o run_sharp sharp.swift -framework CoreML -framework CoreImage -framework AppKit

# Run inference on an image and decimate the output by 50%
./run_sharp sharp.mlpackage test.png test.ply -d 0.5

Inference on an Apple M4 Max takes ~1.9 seconds.

CLI Features:

Automatic model compilation and caching
Decimation to reduce point cloud size while preserving visual fidelity
Input is expected as a standard RGB image; conversion to [0,1] and CHW format happens inside the model
PLY output compatible with Splat Viewer, MetalSplatter, and Three.js

Usage: \(execName) [OPTIONS] <model> <input_image> <output.ply>

SHARP Model Inference - Generate 3D Gaussian Splats from a single image

Arguments:
    model              Path to the SHARP Core ML model (.mlpackage, .mlmodel, or .mlmodelc)
    input_image        Path to input image (PNG, JPEG, etc.)
    output.ply         Path for output PLY file

Options: 
    -m, --model PATH           Path to Core ML model
    -i, --input PATH           Path to input image
    -o, --output PATH          Path for output PLY file
    -f, --focal-length FLOAT   Focal length in pixels (default: 1536)
    -d, --decimation FLOAT     Decimation ratio 0.0-1.0 or percentage 1-100 (default:  1.0 = keep all)
                                Example: 0.5 or 50 keeps 50% of Gaussians
    -h, --help                 Show this help message

Model Input and Output

📥 Input

The Core ML model accepts two inputs:

image: A 3-channel RGB image in uint8 format with shape (1, 3, H, W).
- Values are expected in range [0, 255] (no manual normalization required).
- Recommended resolution: 1536×1536 (matches training size).
- Aspect ratio is preserved; input will be resized internally if needed.
disparity_factor: A scalar tensor of shape (1,) representing the ratio focal_length / image_width.
- Use 1.0 for standard cameras (e.g., typical smartphone or DSLR).
- Adjust slightly to control depth scale: higher values = closer objects, lower values = farther scenes.
- If using the sharp.swift runner, this input is automatically computed from your image dimensions.

📤 Output

The model outputs five tensors representing a 3D Gaussian splat representation:

Output	Shape	Description
`mean_vectors_3d_positions`	`(1, N, 3)`	3D positions in Normalized Device Coordinates (NDC) — x, y, z.
`singular_values_scales`	`(1, N, 3)`	Scale parameters along each principal axis (width, height, depth).
`quaternions_rotations`	`(1, N, 4)`	Unit quaternions `[w, x, y, z]` encoding orientation of each Gaussian.
`colors_rgb_linear`	`(1, N, 3)`	Linear RGB color values in range `[0, 1]` (no gamma correction).
`opacities_alpha_channel`	`(1, N)`	Opacity (alpha) values per Gaussian, in range `[0, 1]`.

The total number of Gaussians N is approximately 1,179,648 for the default model.

🌍 These outputs are fully compatible with Splat Viewer and MetalSplatter.

🔍 Model Validation Results

The Core ML model has been rigorously validated against the original PyTorch implementation. Below are the numerical accuracy metrics across all 5 output tensors:

Output	Max Diff	Mean Diff	P99 Diff	Angular Diff (°)	Status
Mean Vectors (3D Positions)	0.000794	0.000049	0.000094	-	✅ PASS
Singular Values (Scales)	0.000035	0.000000	0.000002	-	✅ PASS
Quaternions (Rotations)	1.425558	0.000024	0.000067	9.2519 / 0.0019 / 0.0396	✅ PASS
Colors (RGB Linear)	0.001440	0.000005	0.000055	-	✅ PASS
Opacities (Alpha)	0.004183	0.000005	0.000114	-	✅ PASS

Validation Notes:

All outputs match PyTorch within 0.01% mean error.

Quaternion angular errors are below 1° for 99% of Gaussians.

Reproducing the Conversion

To reproduce the conversion from PyTorch to Core ML, follow these steps:

git clone https://github.com/apple/ml-sharp.git
cd ml-sharp
conda create -n sharp python=3.13
conda activate sharp
pip install -r requirements.txt
pip install coremltools
cd ../
python convert.py

Citation

If you find this work useful, please cite the original paper:

@inproceedings{Sharp2025:arxiv,
  title      = {Sharp Monocular View Synthesis in Less Than a Second},
  author     = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoy and Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},
  journal    = {arXiv preprint arXiv:2512.10685},
  year       = {2025},
  url        = {https://arxiv.org/abs/2512.10685},
}

Downloads last month: -

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pearsonkyle/Sharp-onnx

Base model

apple/Sharp

Finetuned

(2)

this model

Paper for pearsonkyle/Sharp-onnx

Sharp Monocular View Synthesis in Less Than a Second

Paper • 2512.10685 • Published about 1 month ago • 26