Xuan-Son Nguyen's picture

Xuan-Son Nguyen

ngxson

AI & ML interests

Doing AI for fun, not for profit

Recent Activity

commented on an article about 18 hours ago
Topic 28: What is Mixture-of-Mamba?
updated a model about 18 hours ago
ngxson/hf-blog-podcast
View all activity

Organizations

Hugging Face's profile picture Blog-explorers's profile picture Hugging Face TB Research's profile picture ggml.ai's profile picture Hugging Face Discord Community's profile picture Consumer AI Edge Hackathon (Meta, Hugging Face, Pytorch, Scaleway & Unaite)'s profile picture Mistral AI Game Jam's profile picture

ngxson's activity

reacted to as-cle-bert's post with ๐Ÿš€๐Ÿ‘ 3 days ago
view post
Post
2160
I built an AI agent app in less than 8 hours๐Ÿคฏ
And, believe me, this is ๐—ป๐—ผ๐˜ clickbaitโŒ

GitHub ๐Ÿ‘‰ https://github.com/AstraBert/PapersChat
Demo ๐Ÿ‘‰ as-cle-bert/PapersChat

The app is called ๐๐š๐ฉ๐ž๐ซ๐ฌ๐‚๐ก๐š๐ญ, and it is aimed at ๐—บ๐—ฎ๐—ธ๐—ถ๐—ป๐—ด ๐—ฐ๐—ต๐—ฎ๐˜๐˜๐—ถ๐—ป๐—ด ๐˜„๐—ถ๐˜๐—ต ๐˜€๐—ฐ๐—ถ๐—ฒ๐—ป๐˜๐—ถ๐—ณ๐—ถ๐—ฐ ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ๐˜€ ๐—ฒ๐—ฎ๐˜€๐—ถ๐—ฒ๐—ฟ.

๐‡๐ž๐ซ๐ž ๐ข๐ฌ ๐ฐ๐ก๐š๐ญ ๐ญ๐ก๐ž ๐š๐ฉ๐ฉ ๐๐จ๐ž๐ฌ:

๐Ÿ“„ Parses the papers that you upload thanks to LlamaIndex๐Ÿฆ™ (either with LlamaParse or with simpler, local methods)
๐Ÿ“„ Embeds documents both with a sparse and with a dense encoder to enable hybrid search
๐Ÿ“„ Uploads the embeddings to Qdrant
โš™๏ธ Activates an Agent based on mistralai/Mistral-Small-24B-Instruct-2501 that will reply to your prompt
๐Ÿง  Retrieves information relevant to your question from the documents
๐Ÿง  If no relevant information is found, it searches PubMed and arXiv databases
๐Ÿง  Returns a grounded answer to your prompt

๐‡๐จ๐ฐ ๐๐ข๐ ๐ˆ ๐ฆ๐š๐ง๐š๐ ๐ž ๐ญ๐จ ๐ฆ๐š๐ค๐ž ๐ญ๐ก๐ข๐ฌ ๐š๐ฉ๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ข๐จ๐ง ๐ข๐ง ๐Ÿ– ๐ก๐จ๐ฎ๐ซ๐ฌ?

Three key points:

- LlamaIndex๐Ÿฆ™ provides countless integrations with LLM providers, text embedding models and vectorstore services, and takes care of the internal architecture of the Agent. You just plug it in, and it works!๐Ÿ”Œโšก
- Qdrant is a vector database service extremely easy to set up and use: you just need a one-line Docker command๐Ÿ˜‰
- Gradio makes frontend development painless and fast, while still providing modern and responsive interfaces๐Ÿ—๏ธ

And a bonus point:

- Deploying the demo app couldn't be easier if you use Gradio-based Hugging Face Spaces๐Ÿค—

So, no more excuses: build your own AI agent today and do it fast, (almost) for free and effortlessly๐Ÿš€

And if you need a starting point, the code for PapersChat is open and fully reproducible on GitHub ๐Ÿ‘‰ https://github.com/AstraBert/PapersChat
reacted to burtenshaw's post with ๐Ÿ‘๐Ÿค—โค๏ธ 5 days ago
view post
Post
3222
Hey, Iโ€™m Ben and I work at Hugging Face.

Right now, Iโ€™m focusing on educational stuff and getting loads of new people to build open AI models using free and open source tools.

Iโ€™ve made a collection of some of the tools Iโ€™m building and using for teaching. Stuff like quizzes, code challenges, and certificates.

burtenshaw/tools-for-learning-ai-6797453caae193052d3638e2
  • 1 reply
ยท
reacted to mitkox's post with ๐Ÿš€๐Ÿ‘ 27 days ago
view post
Post
2352
llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
ยท
reacted to onekq's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
4733
๐Ÿ‹DeepSeek ๐Ÿ‹ is the real OpenAI ๐Ÿ˜ฏ
ยท
posted an update about 1 month ago
replied to their post about 1 month ago
view reply

Yes, sure!

The first step is to generate the PEFT-compatible LoRA adapter, I used mergekit-extract-lora to do that. Please note that some bigger models (Qwen/Llama 70B) give some errors that I don't know how to fix, hopefully they will fix that soon. You can find more info about mergekit here: https://github.com/arcee-ai/mergekit

Next step is to convert PEFT to GGUF, I used this space: https://huggingface.co/spaces/ggml-org/gguf-my-lora

Then it's good to go!

Please note that, the space can convert any PEFT LoRA adapters to GGUF, so if you're using something like unsloth, it will be straight-forward to convert into GGUF LoRA (so no need to merge to base model)

replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
2999
Check out my collection of pre-made GGUF LoRA adapters!

This allow you to use both normal + abliterated version of popular models like llama, qwen, etc, without having to double to amount of VRAM usage.

ngxson/gguf_lora_collection
ยท
reacted to bartowski's post with ๐Ÿ‘€๐Ÿ‘ about 1 month ago
view post
Post
51940
Looks like Q4_0_N_M file types are going away

Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)

You can see the reference PR here:

https://github.com/ggerganov/llama.cpp/pull/10446

So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)

As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !

Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541

Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
ยท
replied to their post about 1 month ago
view reply

For llama.cpp, I'm not sure if it can be useful to do so. The problem is that source code of llama.cpp changes very often, and it's not parsing the template, but just simple if..else checks.

Ollama on the other hand has its own template engine and template language, which I haven't seen any implementation outside of Golang. Testing ollama templates was always a difficult thing for me when I work with ollama <> hugging face integration, so I made this tool to simplify my workflow.

posted an update about 1 month ago
reacted to julien-c's post with ๐Ÿ”ฅ 2 months ago
view post
Post
9845
After some heated discussion ๐Ÿ”ฅ, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐Ÿ”ฅ

cc: @reach-vb @pierric @victor and the HF team
ยท
reacted to cfahlgren1's post with ๐Ÿค—๐Ÿ‘€๐Ÿ”ฅ 3 months ago
view post
Post
2244
Why use Google Drive when you can have:

โ€ข Free storage with generous limits๐Ÿ†“
โ€ข Dataset Viewer (Sorting, Filtering, FTS) ๐Ÿ”
โ€ข Third Party Library Support
โ€ข SQL Console ๐ŸŸง
โ€ข Security ๐Ÿ”’
โ€ข Community, Reach, and Visibility ๐Ÿ“ˆ

It's a no brainer!

Check out our post on what you get instantly out of the box when you create a dataset.
https://huggingface.co/blog/researcher-dataset-sharing
  • 1 reply
ยท