Xuan-Son Nguyen
AI & ML interests
Recent Activity
Organizations
ngxson's activity

For around 80 euros I can by a Raspberry Pi 4 kit, so I would expect a robot kit to be the same

He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use “HUGGINGFACE” to get access!

Read more on my blog post: https://huggingface.co/blog/ngxson/common-ai-model-formats
| Hardware | GGUF | PyTorch | Safetensors | ONNX |
|-----------------|-----------|------------------------|--------------------------|-------|
| CPU | ✅ (best) | 🟡 | 🟡 | ✅ |
| GPU | ✅ | ✅ | ✅ | ✅ |
| Mobile | ✅ | 🟡 (via executorch) | ❌ | ✅ |
| Apple silicon | ✅ | 🟡 | ✅ (via MLX framework) | ✅ |

Been testing all these tools myself and created a searchable collection of the most practical ones - from audio transcription to image generation to document analysis. No coding needed, no expensive subscriptions.
Some highlights I've tested personally:
- Private, on-device transcription with speaker ID in 100+ languages using Whisper
- Website scraping that just works - paste a URL, get structured data
- Local image editing with tools like Finegrain (impressive results)
- Document chat using Qwen 2.5 72B (handles technical papers well)
Sharing this early because the best tools come from the community. Drop your favorite tools in the comments or join the discussion on what to add next!
👉 JournalistsonHF/ai-toolkit

And, believe me, this is 𝗻𝗼𝘁 clickbait❌
GitHub 👉 https://github.com/AstraBert/PapersChat
Demo 👉 as-cle-bert/PapersChat
The app is called 𝐏𝐚𝐩𝐞𝐫𝐬𝐂𝐡𝐚𝐭, and it is aimed at 𝗺𝗮𝗸𝗶𝗻𝗴 𝗰𝗵𝗮𝘁𝘁𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝘀𝗰𝗶𝗲𝗻𝘁𝗶𝗳𝗶𝗰 𝗽𝗮𝗽𝗲𝗿𝘀 𝗲𝗮𝘀𝗶𝗲𝗿.
𝐇𝐞𝐫𝐞 𝐢𝐬 𝐰𝐡𝐚𝐭 𝐭𝐡𝐞 𝐚𝐩𝐩 𝐝𝐨𝐞𝐬:
📄 Parses the papers that you upload thanks to LlamaIndex🦙 (either with LlamaParse or with simpler, local methods)
📄 Embeds documents both with a sparse and with a dense encoder to enable hybrid search
📄 Uploads the embeddings to Qdrant
⚙️ Activates an Agent based on mistralai/Mistral-Small-24B-Instruct-2501 that will reply to your prompt
🧠 Retrieves information relevant to your question from the documents
🧠 If no relevant information is found, it searches PubMed and arXiv databases
🧠 Returns a grounded answer to your prompt
𝐇𝐨𝐰 𝐝𝐢𝐝 𝐈 𝐦𝐚𝐧𝐚𝐠𝐞 𝐭𝐨 𝐦𝐚𝐤𝐞 𝐭𝐡𝐢𝐬 𝐚𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐢𝐧 𝟖 𝐡𝐨𝐮𝐫𝐬?
Three key points:
- LlamaIndex🦙 provides countless integrations with LLM providers, text embedding models and vectorstore services, and takes care of the internal architecture of the Agent. You just plug it in, and it works!🔌⚡
- Qdrant is a vector database service extremely easy to set up and use: you just need a one-line Docker command😉
- Gradio makes frontend development painless and fast, while still providing modern and responsive interfaces🏗️
And a bonus point:
- Deploying the demo app couldn't be easier if you use Gradio-based Hugging Face Spaces🤗
So, no more excuses: build your own AI agent today and do it fast, (almost) for free and effortlessly🚀
And if you need a starting point, the code for PapersChat is open and fully reproducible on GitHub 👉 https://github.com/AstraBert/PapersChat

Right now, I’m focusing on educational stuff and getting loads of new people to build open AI models using free and open source tools.
I’ve made a collection of some of the tools I’m building and using for teaching. Stuff like quizzes, code challenges, and certificates.
burtenshaw/tools-for-learning-ai-6797453caae193052d3638e2

I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.
Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec
Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms
Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms
Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec
llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.
Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

ngxson/extracted-lora-mergekit-677d5c3eea0b6a7661201846

Yes, sure!
The first step is to generate the PEFT-compatible LoRA adapter, I used mergekit-extract-lora
to do that. Please note that some bigger models (Qwen/Llama 70B) give some errors that I don't know how to fix, hopefully they will fix that soon. You can find more info about mergekit here: https://github.com/arcee-ai/mergekit
Next step is to convert PEFT to GGUF, I used this space: https://huggingface.co/spaces/ggml-org/gguf-my-lora
Then it's good to go!
Please note that, the space can convert any PEFT LoRA adapters to GGUF, so if you're using something like unsloth, it will be straight-forward to convert into GGUF LoRA (so no need to merge to base model)