cleanup-inference-code
#2
by
snowclipsed
- opened
What does this PR do?
The PR aims to:
- Clean up ambiguous and unused code in inference.
- Remove checking weight type in weights.py since we only expect quantized weights
- Remove calling build_text_model in weights.py, since it's already being called in moondream.py
- Directly call QuantizedLinear in build_text_model and while building region model since we only expect quantized weights
- Pass group size as a parameter to QuantizedLinear
Tests
Tested for all functionalities on:
- RTX 4050 Mobile
- RTX 3090
snowclipsed
changed pull request status to
open