
Zillow
company
Verified
AI & ML interests
None defined yet.
Recent Activity
View all activity
zillow's activity

csabakecskemeti
posted
an
update
1 day ago

csabakecskemeti
posted
an
update
13 days ago
Post
301
Deepseek R1 0528 Q2 locally.
(I believe it has overthinking it a bit :) )
https://youtu.be/Iqu5s9aFaXA?si=QWZe293iTKf_3ELU
DevQuasar/deepseek-ai.DeepSeek-R1-0528-GGUF
(I believe it has overthinking it a bit :) )
https://youtu.be/Iqu5s9aFaXA?si=QWZe293iTKf_3ELU
DevQuasar/deepseek-ai.DeepSeek-R1-0528-GGUF
Is this data set relevant for real estate investors?
1
#2 opened 14 days ago
by
jhuilar


csabakecskemeti
posted
an
update
2 months ago
Post
2083
Local Llama4 Maverick Q2
https://youtu.be/4F8g_LThli0?si=MGba2SUTHt6xYw3T
Quants uploading now
Big thanks to @ngxson !
https://youtu.be/4F8g_LThli0?si=MGba2SUTHt6xYw3T
Quants uploading now
Big thanks to @ngxson !

csabakecskemeti
posted
an
update
2 months ago
Post
1722
Why the 'how many r's in strawberry' prompt "breaks" llama4? :D
Quants DevQuasar/meta-llama.Llama-4-Scout-17B-16E-Instruct-GGUF
Quants DevQuasar/meta-llama.Llama-4-Scout-17B-16E-Instruct-GGUF

csabakecskemeti
posted
an
update
3 months ago
Post
3374
I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions.
The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).
https://devquasar.com/gpu-gguf-inference-comparison/
the exact models user are in the page
I'd welcome results from other GPUs is you have access do anything else you've need in the post. Hopefully this is useful information everyone.
The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).
https://devquasar.com/gpu-gguf-inference-comparison/
the exact models user are in the page
I'd welcome results from other GPUs is you have access do anything else you've need in the post. Hopefully this is useful information everyone.

csabakecskemeti
posted
an
update
3 months ago
Post
2393
Managed to get my hands on a 5090FE, it's beefy
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | pp512 | 12207.44 ± 481.67 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | tg128 | 143.18 ± 0.18 |
Comparison with others GPUs
http://devquasar.com/gpu-gguf-inference-comparison/
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | pp512 | 12207.44 ± 481.67 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | tg128 | 143.18 ± 0.18 |
Comparison with others GPUs
http://devquasar.com/gpu-gguf-inference-comparison/

csabakecskemeti
posted
an
update
3 months ago
Post
1828
GTC new model announcement now from Nvidia
nvidia/Llama-3_3-Nemotron-Super-49B-v1
GGUFs:
DevQuasar/nvidia.Llama-3_3-Nemotron-Super-49B-v1-GGUF
Enjoy!
nvidia/Llama-3_3-Nemotron-Super-49B-v1
GGUFs:
DevQuasar/nvidia.Llama-3_3-Nemotron-Super-49B-v1-GGUF
Enjoy!

csabakecskemeti
posted
an
update
3 months ago
Post
584
Cohere Command-a Q2 quant
DevQuasar/CohereForAI.c4ai-command-a-03-2025-GGUF
6.7t/s on a 3gpu setup (4080 + 2x3090)
(q3, q4 currently uploading)
DevQuasar/CohereForAI.c4ai-command-a-03-2025-GGUF
6.7t/s on a 3gpu setup (4080 + 2x3090)
(q3, q4 currently uploading)

csabakecskemeti
posted
an
update
3 months ago

csabakecskemeti
posted
an
update
3 months ago
Post
1979
-UPDATED-
4bit inference is working! The blogpost is updated with code snippet and requirements.txt
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
-UPDATED-
I've played around with an MI100 and ROCm and collected my experience in a blogpost:
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
Unfortunately I've could not make inference or training work with model loaded in 8bit or use BnB, but did everything else and documented my findings.
4bit inference is working! The blogpost is updated with code snippet and requirements.txt
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
-UPDATED-
I've played around with an MI100 and ROCm and collected my experience in a blogpost:
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
Unfortunately I've could not make inference or training work with model loaded in 8bit or use BnB, but did everything else and documented my findings.

csabakecskemeti
posted
an
update
4 months ago
Post
2807
Testing Training on AMD/ROCm the first time!
I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)
For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.
Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.
I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)
For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.
Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

csabakecskemeti
posted
an
update
4 months ago
Post
1630
I found if we apply the reasoning system prompt (that has been published on the
NousResearch/DeepHermes-3-Llama-3-8B-Preview model card) other models are also react to it and start mimicking reasoning. Some better some worse. I've seen internal monologue and self questioning.
Here's a blogpost about it:
http://devquasar.com/ai/reasoning-system-prompt/
Here's a blogpost about it:
http://devquasar.com/ai/reasoning-system-prompt/

csabakecskemeti
posted
an
update
4 months ago
Post
1881
Check out my idea:
LLmaaS - Local LLM as a Service
With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.
Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q
Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.
LLmaaS - Local LLM as a Service
With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.
Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q
Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.

csabakecskemeti
posted
an
update
4 months ago
Post
2104
I've made an uncensored version of DeepSeek-R1-Distill-Llama-8B with merge. It's passing the "say f***" censor test.
Tested with lm-evaluation-harness on standard open llm leaderboard tests + hellaswag. Scores are improved in most. Details on the model card.
Model:
DevQuasar/DevQuasar-R1-Uncensored-Llama-8B
Quants:
DevQuasar/DevQuasar-R1-Uncensored-Llama-8B-GGUF
Tested with lm-evaluation-harness on standard open llm leaderboard tests + hellaswag. Scores are improved in most. Details on the model card.
Model:
DevQuasar/DevQuasar-R1-Uncensored-Llama-8B
Quants:
DevQuasar/DevQuasar-R1-Uncensored-Llama-8B-GGUF

csabakecskemeti
posted
an
update
5 months ago
Post
2362
I've run the open llm leaderboard evaluations + hellaswag on
deepseek-ai/DeepSeek-R1-Distill-Llama-8B and compared to
meta-llama/Llama-3.1-8B-Instruct and at first glance R1 do not beat Llama overall.
If anyone wants to double check the results are posted here:
https://github.com/csabakecskemeti/lm_eval_results
Am I made some mistake, or (at least this distilled version) not as good/better than the competition?
I'll run the same on the Qwen 7B distilled version too.
If anyone wants to double check the results are posted here:
https://github.com/csabakecskemeti/lm_eval_results
Am I made some mistake, or (at least this distilled version) not as good/better than the competition?
I'll run the same on the Qwen 7B distilled version too.

csabakecskemeti
posted
an
update
5 months ago
Post
489
NVIDIA's new AceInstruct and AceMath models quantized here:
DevQuasar/nvidia-aceinstruct-and-acemath-678d716f736603ddc8d7cbd4
(some still uploading please be patient)
DevQuasar/nvidia-aceinstruct-and-acemath-678d716f736603ddc8d7cbd4
(some still uploading please be patient)

csabakecskemeti
posted
an
update
5 months ago
Post
615
Managed to run the Q2 quantized Deepseek V3 base locally
The quants are uploading (probably ~10-12hrs) here: DevQuasar/deepseek-ai.DeepSeek-V3-Base-GGUF
The quants are uploading (probably ~10-12hrs) here: DevQuasar/deepseek-ai.DeepSeek-V3-Base-GGUF

csabakecskemeti
posted
an
update
5 months ago
Post
627
Just wondering why the number of parameters changed in the model attributes/Model size from 685B to 684B after converting
deepseek-ai/DeepSeek-V3-Base from FP8 to BF16:
DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16
and not just for me:
opensourcerelease/DeepSeek-V3-Base-bf16
??
DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16
and not just for me:
opensourcerelease/DeepSeek-V3-Base-bf16
??

csabakecskemeti
posted
an
update
5 months ago
Post
1555
Happy New Year, Huggingface community!
In 2025, I'll continue my quantization (and some fine-tuning) efforts to support the open-source AI and Make knowledge free for everyone.
DevQuasar
https://devquasar.com/
In 2025, I'll continue my quantization (and some fine-tuning) efforts to support the open-source AI and Make knowledge free for everyone.

https://devquasar.com/