'num_hidden_layers': 61, but layer 62 has weights.
#162 opened 2 months ago
by
xinhe
Upload GTG Breaking every Limit
#161 opened 2 months ago
by
GTGenesis
support prefix complete
3
3
#158 opened 2 months ago
by
HuggineAllen
Create app.py
#157 opened 2 months ago
by
SpaceAgeRobotics

Brokersponsor
#155 opened 2 months ago
by
Brokersponsor

Update README.md
#154 opened 2 months ago
by
egegvner
Upload IMG_4530.png
#152 opened 2 months ago
by
Noemie202586
Upload IMG_1745.JPG
#151 opened 2 months ago
by
Ladib
Create Clara
1
#150 opened 2 months ago
by
Clblinks
If I understand correctly, evaluating MATH-500 requires 64*500 model calls?
1
#149 opened 3 months ago
by
Rorschaaaach
Request: DOI
1
#148 opened 3 months ago
by
Tarush-Appreciate
Update README.md
#147 opened 3 months ago
by
tekno-power
Update README.md
#146 opened 3 months ago
by
Ekimnedops6969
Update README.md
1
1
#143 opened 3 months ago
by
MuhammadEhsan

Request for Information on Purchasing Reasoning API Key
2
#142 opened 3 months ago
by
brahamaandai

Update model_max_length in tokenizer_config.json
2
#139 opened 3 months ago
by
kkokkie2360
Host of the model
3
#138 opened 3 months ago
by
henrycwf

Lite version for DeepSeek-R1?
6
1
#137 opened 3 months ago
by
haili-tian
[Bug] assert not self.training
4
#136 opened 3 months ago
by
Gaie

Upload IMG_0253.HEIC
#134 opened 3 months ago
by
rynty
Upload comment-sample.xlsx
#133 opened 3 months ago
by
faham123
non-reasoning data
#132 opened 3 months ago
by
cmgzy
能不能放一些 4bit的权重,现在手里面的卡都不支持FP8
2
1
#131 opened 3 months ago
by
zhnagchenchne
For the universe! DeepPhaser.py DeepCoralX.py and DeepSynapse.py
2
3
#129 opened 3 months ago
by
karmikovic
Request: Create distill of Mistral Small 24B
3
#128 opened 3 months ago
by
Kenshiro-28
which vision model is R1 using for text extraction from image or pdfs.
2
#127 opened 3 months ago
by
ashutoshroy02

Request: DOI
#125 opened 3 months ago
by
Yungchizzy
Little brother(s) of big DeepSeek-R1 ?
2
#124 opened 3 months ago
by
MrDevolver

Upload gugagagaggagagagga.pdf
1
#123 opened 3 months ago
by
HahahhahH
Change quant_method to bitsandbytes_4bit
#121 opened 3 months ago
by
ngoc24794
Unknown quantization type
5
#120 opened 3 months ago
by
Reewaz321
UPdate config.json
#119 opened 3 months ago
by
keerthanaOfficial2001
所以部署一个671B的模型 显存需要多少 有什么基准的硬件配置?
27
#118 opened 3 months ago
by
cena163

Distill Compatibility for PC w/ Ryzen 7 Pro 8840HS w/ 780M Graphics 2x32GB RAM 1TB DDR5 SSD
1
#115 opened 3 months ago
by
arzx
Upload gitattributes.txt
#114 opened 3 months ago
by
SafeerChalil

Introducing Deepseek's TinyZero
1
1
#113 opened 3 months ago
by
DeepSeekModerator

Create Kuch v
1
#112 opened 3 months ago
by
gamerdowntown
Request: DOI
#111 opened 3 months ago
by
Hassanabbas2975
quantization fp8 error occuring while using pipeline approach or transformer based approach
1
#110 opened 3 months ago
by
neethuvm
Deepseek-R1
#109 opened 3 months ago
by
KudanTao
deepseek-r1 源码中采用 MLA 架构的 KV Cache 压缩存储策略的实现似乎与文中说的不一致,这是为什么?代码中似乎没实现这个大优化
3
2
#108 opened 3 months ago
by
Darkdust
Eating food in a car
#106 opened 3 months ago
by
Ayinbaby1313
Update README.md
#103 opened 3 months ago
by
jungvaclav
error while downloading model
10
8
#102 opened 3 months ago
by
heikhama1982
Upload IMG_20250112_172711.jpg
#101 opened 3 months ago
by
aamir1
help from italy
5
#100 opened 3 months ago
by
MMPPIIAA