1 9 17

Abdur-Rahman Butler

abdurrahmanbutler

https://isaacus.com/

AI & ML interests

Legal AI

Recent Activity

reacted to umarbutler's post with 🔥 6 days ago

What happens when you annotate, extract, and disambiguate every entity mentioned in the longest U.S. Supreme Court decision in history? What if you then linked those entities to each other and visualized it as a network? This is the result of enriching all 241 pages and 111,267 words of Dred Scott v. Sandford (1857) with Kanon 2 Enricher in less than ten seconds at the cost of 47 cents. Dred Scott v. Sandford is the longest U.S. Supreme Court decision by far, and has variously been called "the worst Supreme Court decision ever" and "the Court's greatest self-inflicted wound" due to its denial of the rights of African Americans. Thanks to Kanon 2 Enricher, we now also know that the case contains 950 numbered paragraphs, 6 footnotes, 178 people mentioned 1,340 times, 99 locations mentioned 1,294 times, and 298 external documents referenced 940 times. For an American case, there are a decent number of references to British precedents (27 to be exact), including the Magna Carta (¶ 928). Surprisingly though, the Magna Carta is not the oldest citation referenced. That would be the Institutes of Justinian (¶ 315), dated around 533 CE. The oldest city mentioned is Rome (founded 753 BCE) (¶ 311), the oldest person is Justinian (born 527 CE) (¶ 314), and the oldest year referenced is 1371, when 'Charles V of France exempted all the inhabitants of Paris from serfdom' (¶ 370). All this information and more was extracted in 9 seconds. That's how powerful Kanon 2 Enricher, my latest LLM for document enrichment and hierarchical graphitization, is. If you'd like to play with it yourself now that it's available in closed beta, you can apply to the Isaacus Beta Program here: https://isaacus.com/beta.

updated a dataset 13 days ago

isaacus/high-court-of-australia-cases

published a dataset 13 days ago

isaacus/high-court-of-australia-cases

View all activity

Organizations

Posts 1

Post

2541

🎉 I am excited to share news of a project my brother, Umar Butler, and I have been working on for what feels like an eternity now.

𝐈𝐧𝐭𝐫𝐨𝐝𝐮𝐜𝐢𝐧𝐠 𝐌𝐋𝐄𝐁 — 𝐭𝐡𝐞 𝐌𝐚𝐬𝐬𝐢𝐯𝐞 𝐋𝐞𝐠𝐚𝐥 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠 𝐁𝐞𝐧𝐜𝐡𝐦𝐚𝐫𝐤.

A suite of 10 high-quality English legal IR datasets, designed by legal experts to set a new standard for comparing embedding models.

Whether you’re exploring legal RAG on your home computer, or running enterprise-scale retrieval, apples-to-apples evaluation is crucial. That’s why we’ve open-sourced everything - including our 7 brand-new, hand-crafted retrieval datasets. All of these datasets are now live on Hugging Face.

Any guesses which embedding model leads on legal retrieval?

𝐇𝐢𝐧𝐭: it’s not OpenAI or Google - they place 7th and 9th on our leaderboard.

To do well on MLEB, embedding models must demonstrate both extensive legal domain knowledge and strong legal reasoning skills.

https://huggingface.co/blog/isaacus/introducing-mleb

Abdur-Rahman Butler

AI & ML interests

Recent Activity

Organizations

Posts 1

Articles 2

Australian-made LLM beats OpenAI and Google at legal retrieval

Papers 1

models 0

datasets 0