moondream
/

moondream-2b-2025-04-14-4bit

Image-Text-to-Text

Model card Files Files and versions Community

cleanup-inference-code

#2

by snowclipsed - opened 13 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

13 days ago

•

edited 1 day ago

What does this PR do?

The PR aims to:

Clean up ambiguous and unused code in inference.
- Remove checking weight type in weights.py since we only expect quantized weights
- Remove calling build_text_model in weights.py, since it's already being called in moondream.py
- Directly call QuantizedLinear in build_text_model and while building region model since we only expect quantized weights
Pass group size as a parameter to QuantizedLinear

Tests

Tested for all functionalities on:

RTX 4050 Mobile
RTX 3090

remove dtype as input in weights595a8a6e

remove any text model initialization in weights.py9308dfeb

fix weights.py bfloat16 error9fd23e4b

add back add_linear_to_keyfcbc9bf6

only remove is_quantized7090fbf3

add back passing dtype in build text modelfa2446ab

safely remove text_model loading in weights.py8f87befa

remove is_quantized completelye15c30f5

remove gsize check, replace with quantized_linear directly8cb4b5e0

add group size as a parameter1db79dda

fix group size input in build text model4f9fd1b0

snowclipsed changed pull request status to open 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment