Steve Li's picture

Steve Li

CHNtentes

·

CHNtentes

AI & ML interests

None yet

Recent Activity

new activity about 2 hours ago

unsloth/DeepSeek-V3.1-GGUF:changed tool call format?

new activity about 10 hours ago

unsloth/DeepSeek-V3.1-GGUF:Thanks!

new activity about 10 hours ago

deepseek-ai/DeepSeek-V3.1:Context length: is it 128K (as mentioned in the model card) or 160K (as specified in config.json)?

View all activity

Organizations

None yet

New activity in unsloth/DeepSeek-V3.1-GGUF about 2 hours ago

changed tool call format?

#2 opened about 2 hours ago by

New activity in unsloth/DeepSeek-V3.1-GGUF about 10 hours ago

Thanks!

#1 opened about 14 hours ago by

New activity in deepseek-ai/DeepSeek-V3.1 about 10 hours ago

Context length: is it 128K (as mentioned in the model card) or 160K (as specified in config.json)?

#17 opened about 19 hours ago by

New activity in deepseek-ai/DeepSeek-V3.1 about 24 hours ago

bro you forgot to put it under the v3.1 collection

#12 opened 1 day ago by

New activity in deepseek-ai/DeepSeek-V3.1-Base 2 days ago

Why taking so long to add more information?

#12 opened 2 days ago by

New activity in nvidia/NVIDIA-Nemotron-Nano-9B-v2 3 days ago

This just trades general performance for domain specific gains.

#3 opened 3 days ago by

New activity in Qwen/Qwen3-4B-Instruct-2507 6 days ago

1.7b 2507?

#7 opened 6 days ago by

New activity in google/gemma-3-270m-it 6 days ago

Gemma A3B

#3 opened 7 days ago by

New activity in zai-org/GLM-4.5V 11 days ago

Text performance compared to GLM-4.5 Air

#1 opened 11 days ago by

New activity in google/gemma-3n-E4B-it 11 days ago

FP16 version?

#36 opened 14 days ago by

New activity in unsloth/gpt-oss-20b-GGUF 15 days ago

Is the BF16 gguf any different from the F16 one? (speed/accuracy)

#10 opened 16 days ago by

New activity in openai/gpt-oss-120b 16 days ago

How to use different reasoning effort in the example?

#45 opened 16 days ago by

New activity in unsloth/gpt-oss-20b-GGUF 16 days ago

Native FP4 seems to make quantization meaningless

#7 opened 16 days ago by

different parameter from llama.cpp recommended settings?

#6 opened 16 days ago by

New activity in Qwen/Qwen3-30B-A3B-Instruct-2507 17 days ago

Test Scores Can Be Misleading

#8 opened 23 days ago by

New activity in unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF 21 days ago

Any reason to not use this model over the 256K context model?

#1 opened 22 days ago by

New activity in Wan-AI/Wan2.2-T2V-A14B 22 days ago

why a14b models use wan2.1 vae?

#4 opened 22 days ago by

New activity in zai-org/GLM-4.5 23 days ago

Thinking tokens issue

#9 opened 23 days ago by

New activity in PowerInfer/SmallThinker-4BA0.6B-Instruct-GGUF 25 days ago

无法复现smallthinker-4ba0.6b在8elite手机上的性能数据

#1 opened 25 days ago by

New activity in google/gemma-3n-E4B-it 28 days ago

Gemma3n not working on H20 with bfloat16 data type.

#30 opened about 1 month ago by