95 19 20

Michael Goin

mgoin

mgoin_
mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Recent Activity

upvoted a paper 7 days ago

SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models

updated a model 7 days ago

RedHatAI/DeepSeek-R1-0528-quantized.w4a16

published a model 10 days ago

RedHatAI/granite-3.1-8b-instruct-GGUF

View all activity

Organizations

mgoin's activity

New activity in RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic 25 days ago

How should I input the image?

#3 opened 26 days ago by

CyberWolf0

New activity in RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 about 1 month ago

用vllm serve启动不了

#2 opened 2 months ago by

VenomEY

New activity in RedHatAI/Qwen2.5-VL-72B-Instruct-FP8-Dynamic about 1 month ago

Fix processor_class to match upstream

#4 opened about 2 months ago by

zifeitong

New activity in RedHatAI/Qwen2.5-VL-3B-Instruct-FP8-Dynamic about 2 months ago

Remove image_processor_type

#1 opened 2 months ago by

pooya-davoodi-parasail

New activity in nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic about 2 months ago

OSError: nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic does not appear to have a file named decilm.py

#2 opened about 2 months ago by

TheDrummer

how to deploy this model without internet connection

#1 opened about 2 months ago by

superahn

New activity in RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic about 2 months ago

Why not FP8 with static and per-tensor quantization?

👍 1

#2 opened about 2 months ago by

wanzhenchn

New activity in mistralai/Mistral-Small-3.1-24B-Instruct-2503 2 months ago

Address discrepancies in the languages supported by the Mistral Small 3.1 2503

🔥 1

#54 opened 2 months ago by

fpaupier

New activity in RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic 2 months ago

Please update the chat template

#1 opened 2 months ago by

stelterlab

New activity in mistralai/Mistral-Small-3.1-24B-Instruct-2503 2 months ago

FP8 Dynamic/W8A16 Quants Please

#44 opened 2 months ago by

rjmehta

Problem hosting the model using vllm

➕ 3

#45 opened 2 months ago by

ShaoServient

New activity in RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 3 months ago

Remove image_processor_type

#1 opened 3 months ago by

pooya-davoodi-parasail

New activity in RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8 3 months ago

Remove image_processor_type

#1 opened 3 months ago by

pooya-davoodi-parasail

New activity in RedHatAI/Qwen2.5-VL-72B-Instruct-FP8-Dynamic 3 months ago

Remove image_processor_type

#2 opened 3 months ago by

pooya-davoodi-parasail

New activity in RedHatAI/Qwen2.5-VL-7B-Instruct-FP8-Dynamic 3 months ago

Use Qwen2VLImageProcessor for image_processor_type

#2 opened 3 months ago by

pooya-davoodi-parasail

Use Qwen2VLImageProcessor for image_processor_type

#3 opened 3 months ago by

pooya-davoodi-parasail

New activity in cognitivecomputations/DeepSeek-R1-AWQ 4 months ago

when i use vllm v0.7.2 to deploy r1 awq, i got empty content

#10 opened 4 months ago by

bupalinyu

MLA is not supported with moe_wna16 quantization. Disabling MLA.

#7 opened 4 months ago by

AMOSE

New activity in RedHatAI/gemma-2-9b-it-FP8 4 months ago

AttributeError: 'Gemma2Config' object has no attribute 'interleaved_sliding_window' Traceback (most recent call last):

#3 opened 4 months ago by

samos123

New activity in RedHatAI/granite-3.1-8b-instruct-FP8-dynamic 4 months ago

compressed-tensors MLA support requires fp8 activations and weights in group 'group_0',

#1 opened 4 months ago by

samos123