cleanup-inference-code

#2

What does this PR do?

The PR aims to:

  • Clean up ambiguous and unused code in inference.
    • Remove checking weight type in weights.py since we only expect quantized weights
    • Remove calling build_text_model in weights.py, since it's already being called in moondream.py
    • Directly call QuantizedLinear in build_text_model and while building region model since we only expect quantized weights
  • Pass group size as a parameter to QuantizedLinear

Tests

Tested for all functionalities on:

  • RTX 4050 Mobile
  • RTX 3090
snowclipsed changed pull request status to open
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment