Adrian Lucas Malec
adlumal
AI & ML interests
None yet
Recent Activity
posted
an
update
1 day ago
I benchmarked embedding APIs for speed, compared local vs hosted models, and tuned USearch for sub-millisecond retrieval on 143k chunks using only CPU. The post walks through the results, trade-offs, and what I learned about embedding API terms of service.
The main motivation for using USearch is that CPU compute is cheap and easy to scale.
Blog post: https://huggingface.co/blog/adlumal/lightning-fast-vector-search-for-legal-documents
published
an
article
1 day ago
How I Built Lightning-Fast Vector Search for Legal Documents
reacted
to
abdurrahmanbutler's
post
with โค๏ธ
5 days ago
๐ I am excited to share news of a project my brother, Umar Butler, and I have been working on for what feels like an eternity now.
๐๐ง๐ญ๐ซ๐จ๐๐ฎ๐๐ข๐ง๐ ๐๐๐๐ โ ๐ญ๐ก๐ ๐๐๐ฌ๐ฌ๐ข๐ฏ๐ ๐๐๐ ๐๐ฅ ๐๐ฆ๐๐๐๐๐ข๐ง๐ ๐๐๐ง๐๐ก๐ฆ๐๐ซ๐ค.
A suite of 10 high-quality English legal IR datasets, designed by legal experts to set a new standard for comparing embedding models.
Whether youโre exploring legal RAG on your home computer, or running enterprise-scale retrieval, apples-to-apples evaluation is crucial. Thatโs why weโve open-sourced everything - including our 7 brand-new, hand-crafted retrieval datasets. All of these datasets are now live on Hugging Face.
Any guesses which embedding model leads on legal retrieval?
๐๐ข๐ง๐ญ: itโs not OpenAI or Google - they place 7th and 9th on our leaderboard.
To do well on MLEB, embedding models must demonstrate both extensive legal domain knowledge and strong legal reasoning skills.
https://huggingface.co/blog/isaacus/introducing-mleb