Sharp Monocular View Synthesis in Less Than a Second (Core ML Edition)
This software project is a communnity contribution and not affiliated with the original the research paper:
Sharp Monocular View Synthesis in Less Than a Second by Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, AmaΓ«l Delaunoy, Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun.
We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements.
This release includes a fully validated Core ML (.mlpackage) version of SHARP, optimized for CPU, GPU, and Neural Engine inference on macOS and iOS.
Rendered using Splat Viewer
Getting started
π¦ Download the Core ML Model Only
pip install huggingface-hub
huggingface-cli download --include sharp.mlpackage/ --local-dir . pearsonkyle/Sharp-coreml
π§° Clone the Full Repository
This will include the inference and model conversion/validation scripts.
brew install git-xet
git xet install
Clone the model repository:
git clone [email protected]:pearsonkyle/Sharp-coreml
π± Run Inference on Apple Devices
Use the provided sharp.swift inference script to load the model and generate 3D Gaussian splats (PLY) from any image:
# Compile the Swift runner (requires Xcode command-line tools)
swiftc -O -o run_sharp sharp.swift -framework CoreML -framework CoreImage -framework AppKit
# Run inference on an image and decimate the output by 50%
./run_sharp sharp.mlpackage test.png test.ply -d 0.5
Inference on an Apple M4 Max takes ~1.9 seconds.
CLI Features:
- Automatic model compilation and caching
- Decimation to reduce point cloud size while preserving visual fidelity
- Input is expected as a standard RGB image; conversion to [0,1] and CHW format happens inside the model
- PLY output compatible with Splat Viewer, MetalSplatter, and Three.js
Usage: \(execName) [OPTIONS] <model> <input_image> <output.ply>
SHARP Model Inference - Generate 3D Gaussian Splats from a single image
Arguments:
model Path to the SHARP Core ML model (.mlpackage, .mlmodel, or .mlmodelc)
input_image Path to input image (PNG, JPEG, etc.)
output.ply Path for output PLY file
Options:
-m, --model PATH Path to Core ML model
-i, --input PATH Path to input image
-o, --output PATH Path for output PLY file
-f, --focal-length FLOAT Focal length in pixels (default: 1536)
-d, --decimation FLOAT Decimation ratio 0.0-1.0 or percentage 1-100 (default: 1.0 = keep all)
Example: 0.5 or 50 keeps 50% of Gaussians
-h, --help Show this help message
Model Input and Output
π₯ Input
The Core ML model accepts two inputs:
image: A 3-channel RGB image inuint8format with shape(1, 3, H, W).- Values are expected in range
[0, 255](no manual normalization required). - Recommended resolution:
1536Γ1536(matches training size). - Aspect ratio is preserved; input will be resized internally if needed.
- Values are expected in range
disparity_factor: A scalar tensor of shape(1,)representing the ratiofocal_length / image_width.- Use
1.0for standard cameras (e.g., typical smartphone or DSLR). - Adjust slightly to control depth scale: higher values = closer objects, lower values = farther scenes.
- If using the
sharp.swiftrunner, this input is automatically computed from your image dimensions.
- Use
π€ Output
The model outputs five tensors representing a 3D Gaussian splat representation:
| Output | Shape | Description |
|---|---|---|
mean_vectors_3d_positions |
(1, N, 3) |
3D positions in Normalized Device Coordinates (NDC) β x, y, z. |
singular_values_scales |
(1, N, 3) |
Scale parameters along each principal axis (width, height, depth). |
quaternions_rotations |
(1, N, 4) |
Unit quaternions [w, x, y, z] encoding orientation of each Gaussian. |
colors_rgb_linear |
(1, N, 3) |
Linear RGB color values in range [0, 1] (no gamma correction). |
opacities_alpha_channel |
(1, N) |
Opacity (alpha) values per Gaussian, in range [0, 1]. |
The total number of Gaussians N is approximately 1,179,648 for the default model.
π These outputs are fully compatible with Splat Viewer and MetalSplatter.
π Model Validation Results
The Core ML model has been rigorously validated against the original PyTorch implementation. Below are the numerical accuracy metrics across all 5 output tensors:
| Output | Max Diff | Mean Diff | P99 Diff | Angular Diff (Β°) | Status |
|---|---|---|---|---|---|
| Mean Vectors (3D Positions) | 0.000794 | 0.000049 | 0.000094 | - | β PASS |
| Singular Values (Scales) | 0.000035 | 0.000000 | 0.000002 | - | β PASS |
| Quaternions (Rotations) | 1.425558 | 0.000024 | 0.000067 | 9.2519 / 0.0019 / 0.0396 | β PASS |
| Colors (RGB Linear) | 0.001440 | 0.000005 | 0.000055 | - | β PASS |
| Opacities (Alpha) | 0.004183 | 0.000005 | 0.000114 | - | β PASS |
Validation Notes:
- All outputs match PyTorch within 0.01% mean error.
- Quaternion angular errors are below 1Β° for 99% of Gaussians.
Reproducing the Conversion
To reproduce the conversion from PyTorch to Core ML, follow these steps:
git clone https://github.com/apple/ml-sharp.git
cd ml-sharp
conda create -n sharp python=3.13
conda activate sharp
pip install -r requirements.txt
pip install coremltools
cd ../
python convert.py
Citation
If you find this work useful, please cite the original paper:
@inproceedings{Sharp2025:arxiv,
title = {Sharp Monocular View Synthesis in Less Than a Second},
author = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoy and Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},
journal = {arXiv preprint arXiv:2512.10685},
year = {2025},
url = {https://arxiv.org/abs/2512.10685},
}
- Downloads last month
- -
Model tree for pearsonkyle/Sharp-onnx
Base model
apple/Sharp