1 13 20

Abdur-Rahman Butler

abdurrahmanbutler

https://isaacus.com/

AI & ML interests

Legal AI

Recent Activity

posted an update about 3 hours ago

Isaacus just shipped a new state-of-the-art model, this time focused on reranking for legal RAG. Although Kanon 2 Embedder already represents the frontier of legal-domain retrieval, we know that not everyone is ready to re-embed their entire corpus. We also knew there was still accuracy left on the table for teams handling highly sensitive legal work. Enter Kanon 2 Reranker: the world’s best legal reranking model. We tested it across both production RAG pipelines and standalone retrieval tasks, and the results were remarkable. Not only does it outperform the competition in a category where there are still very few serious alternatives, it also delivers major retrieval accuracy gains over our standalone embedder. Those improvements translated into exceptional downstream performance. In our final test, we compared Voyage AI by MongoDB 2.5 Rerank with Kanon 2 Reranker on Legal RAG Bench, using identical embedding models, generative models, and pipeline hyperparameters. The only difference was the reranker. The result: Kanon 2 Reranker decisively outperformed Voyage 2.5 Rerank. On holdout questions, the head-to-head margin was one of the most extreme we have seen: for every 1 question Voyage got right and we got wrong, there were 6 questions we got right and Voyage got wrong. We share an example in the blog post where Voyage Rerank actually underperforms Kanon 2 Embedder on its own, delivering the wrong context to the LLM. In that case, not using a reranker at all would have led to the correct answer. All in all, I’m immensely proud of the performance gains we’ve achieved. But as we always say, the best benchmark is your own data. So redeem your free credits, give Kanon 2 Reranker a try, and see firsthand the difference our models can make: https://huggingface.co/blog/isaacus/kanon-2-reranker

upvoted an article about 3 hours ago

Kanon 2 Reranker: the most powerful reranker for legal RAG

published an article about 3 hours ago

Kanon 2 Reranker: the most powerful reranker for legal RAG

View all activity

Organizations

posted an update about 3 hours ago

Post

Isaacus just shipped a new state-of-the-art model, this time focused on reranking for legal RAG.

Although Kanon 2 Embedder already represents the frontier of legal-domain retrieval, we know that not everyone is ready to re-embed their entire corpus. We also knew there was still accuracy left on the table for teams handling highly sensitive legal work.

Enter Kanon 2 Reranker: the world’s best legal reranking model.

We tested it across both production RAG pipelines and standalone retrieval tasks, and the results were remarkable.

Not only does it outperform the competition in a category where there are still very few serious alternatives, it also delivers major retrieval accuracy gains over our standalone embedder. Those improvements translated into exceptional downstream performance.

In our final test, we compared Voyage AI by MongoDB 2.5 Rerank with Kanon 2 Reranker on Legal RAG Bench, using identical embedding models, generative models, and pipeline hyperparameters. The only difference was the reranker.

The result: Kanon 2 Reranker decisively outperformed Voyage 2.5 Rerank.

On holdout questions, the head-to-head margin was one of the most extreme we have seen: for every 1 question Voyage got right and we got wrong, there were 6 questions we got right and Voyage got wrong.

We share an example in the blog post where Voyage Rerank actually underperforms Kanon 2 Embedder on its own, delivering the wrong context to the LLM. In that case, not using a reranker at all would have led to the correct answer.

All in all, I’m immensely proud of the performance gains we’ve achieved.
But as we always say, the best benchmark is your own data.

So redeem your free credits, give Kanon 2 Reranker a try, and see firsthand the difference our models can make:
https://huggingface.co/blog/isaacus/kanon-2-reranker

reacted to umarbutler's post with ❤️ 4 days ago

Post

1894

This awesome visualization by @abdurrahmanbutler tracks how reliant the High Court of Australia has been on UK precedents over time.

Back in the early 1900s, up to 70% of citations in High Court decisions were from the UK. Today, that number sits around 20%.

This change seems to have happened gradually as Australia gained more and more independence from the UK, culminating in the Australia Acts of 1986, where we see a nice bump in the proportion of Australian cases cited.

These insights would not be possible without our latest legal AI model, Kanon 2 Enricher, which we used to extract dates and citations from High Court decisions in isaacus/open-australian-legal-corpus and categorize citations by jurisdiction. You can learn about Kanon 2 Enricher here: https://isaacus.com/blog/kanon-2-enricher.

posted an update 7 days ago

Post

2568

🚀 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗞𝗮𝗻𝗼𝗻 𝟮 𝗘𝗻𝗿𝗶𝗰𝗵𝗲𝗿: 𝘁𝗵𝗲 𝘄𝗼𝗿𝗹𝗱’𝘀 𝗳𝗶𝗿𝘀𝘁 𝗵𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗴𝗿𝗮𝗽𝗵𝗶𝘁𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹

Today we’re publicly releasing Kanon 2 Enricher, and with it, an entirely new class of AI model that we’re calling a hierarchical graphitization model.
This is fundamentally different from both universal extraction models and generative models.

As a hierarchical graphitization model, Kanon 2 Enricher natively outputs a 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗴𝗿𝗮𝗽𝗵 rather than tokens, which makes it architecturally incapable of hallucinating or inventing text that wasn’t present in the input.

What that enables in practice is unlike any other model or ML architecture on the market:

• 𝗡𝗼 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗼𝗻𝘀 🤖
It cannot hallucinate. All references and links are stored as spans, meaning exact character offsets anchored to the original text.

• 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝘀𝗲𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻, 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 📑
It deconstructs a document’s full nested hierarchy, down to chapters, sections, clauses, schedules, signatures, and even singular sentences, and classifies each span with dozens of contextual features.

• 𝗘𝗻𝘁𝗶𝘁𝘆 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻, 𝗱𝗶𝘀𝗮𝗺𝗯𝗶𝗴𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗹𝗶𝗻𝗸𝗶𝗻𝗴 🔗
It resolves what references actually point to, then links entities, citations, and cross-references into a single coherent graph.

• 𝗚𝗿𝗮𝗽𝗵-𝗳𝗶𝗿𝘀𝘁 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 🏃‍➡️
Small enough to run locally on a consumer PC with sub-second latency, and it stays reliable on long documents where front

To read more about our new model, check out our latest Hugging Face article:
https://huggingface.co/blog/isaacus/introducing-kanon-2-enricher

reacted to umarbutler's post with ❤️ 18 days ago

Post

2183

@abdurrahmanbutler and I just dropped Legal RAG Bench, the first benchmark for legal RAG systems to simultaneously evaluate hallucinations, retrieval failures, and reasoning errors.

Our key takeaways are:
1. Embedding models, not generative models, are the primary driver of RAG accuracy. Switching from a general-purpose embedder like OpenAI's Text Embedding 3 Large to a legal domain embedder like Isaacus' Kanon 2 Embedder can raise accuracy by ~19 points.
2. Hallucinations are often triggered by retrieval failures. Fix your retrieval stack, and, in most cases, you end up fixing hallucinations.
3. Once you have a solid legal retrieval engine like Kanon 2 Embedder, it doesn’t matter as much what generative model you use; GPT-5.2 and Gemini 3.1 Pro perform relatively similarly, with Gemini 3.1 Pro achieving slightly better accuracy at the cost of more hallucinations.
4. Google's latest LLM, Gemini 3.1 Pro, is actually a bit worse than its predecessor at legal RAG, achieving 79.3% accuracy instead of 80.3%.

These findings confirm what we already knew at Isaacus: that information retrieval sets the ceiling on the accuracy of legal RAG systems. It doesn’t matter how smart you are; you aren’t going to magically know what the penalty is for speeding in California without access to an up-to-date copy of the California Vehicle Code.

Even still, to our knowledge, we’re the first to actually show this empirically.

Unfortunately, as we highlight in our write-up, high-quality open legal benchmarks like Legal RAG Bench and our earlier MLEB are few and far between.

In the interests of transparency, we have not only detailed exactly how we built Legal RAG Bench, but we’ve also released all of our data openly on Hugging Face. You can read our write up [here](https://isaacus.com/blog/legal-rag-bench), noting that we’ll soon be publishing it as a paper.

Kudos to my brother @abdurrahmanbutler for serving as the lead author on this monumental release.

2 replies

reacted to umarbutler's post with 🔥 26 days ago

Post

5073

What happens when you annotate, extract, and disambiguate every entity mentioned in the longest U.S. Supreme Court decision in history? What if you then linked those entities to each other and visualized it as a network?

This is the result of enriching all 241 pages and 111,267 words of Dred Scott v. Sandford (1857) with Kanon 2 Enricher in less than ten seconds at the cost of 47 cents.

Dred Scott v. Sandford is the longest U.S. Supreme Court decision by far, and has variously been called "the worst Supreme Court decision ever" and "the Court's greatest self-inflicted wound" due to its denial of the rights of African Americans.

Thanks to Kanon 2 Enricher, we now also know that the case contains 950 numbered paragraphs, 6 footnotes, 178 people mentioned 1,340 times, 99 locations mentioned 1,294 times, and 298 external documents referenced 940 times.

For an American case, there are a decent number of references to British precedents (27 to be exact), including the Magna Carta (¶ 928).

Surprisingly though, the Magna Carta is not the oldest citation referenced. That would be the Institutes of Justinian (¶ 315), dated around 533 CE.

The oldest city mentioned is Rome (founded 753 BCE) (¶ 311), the oldest person is Justinian (born 527 CE) (¶ 314), and the oldest year referenced is 1371, when 'Charles V of France exempted all the inhabitants of Paris from serfdom' (¶ 370).

All this information and more was extracted in 9 seconds. That's how powerful Kanon 2 Enricher, my latest LLM for document enrichment and hierarchical graphitization, is. If you'd like to play with it yourself now that it's available in closed beta, you can apply to the Isaacus Beta Program here: https://isaacus.com/beta.

reacted to tomaarsen's post with 🚀 5 months ago

Post

4515

🤗 Sentence Transformers is joining Hugging Face! 🤗 This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face! Details:

Today, the Ubiquitous Knowledge Processing (UKP) Lab is transferring the project to Hugging Face. Sentence Transformers will remain a community-driven, open-source project, with the same open-source license (Apache 2.0) as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged. The project will continue to prioritize transparency, collaboration, and broad accessibility.

Read our full announcement for more details and quotes from UKP and Hugging Face leadership: https://huggingface.co/blog/sentence-transformers-joins-hf

We see an increasing wish from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.

I would like to thank the UKP Lab, and especially Nils Reimers and Iryna Gurevych, both for their dedication to the project and for their trust in myself, both now and two years ago. Back then, neither of you knew me well, yet you trusted me to take the project to new heights. That choice ended up being very valuable for the embedding & Information Retrieval community, and I think this choice of granting Hugging Face stewardship will be similarly successful.

I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!

1 reply

reacted to umarbutler's post with 👍🚀🔥 5 months ago

Post

3006

I'm excited to announce the release of Kanon 2 Embedder, the world's best legal embedding model, ranked first on the Massive Legal Embedding Benchmark 🎉

This model is the product of quite literally months of painstaking work alongside @abdurrahmanbutler collecting, cleaning, and processing terabytes of data as well as coming up with novel improvements to the standard embedder training recipe to push the limits of what's possible.

Kanon 2 Embedder is my most advanced model to date. On MLEB, it benchmarks as 9% more accurate than OpenAI's best embedding model and 30% faster.

Even when truncated from 1,792 to 768 dimensions, Kanon 2 Embedder continues to hold the number one spot on MLEB.

Importantly, Kanon 2 Embedder is also privacy and security friendly — unlike Voyage, Cohere and Jina, none of your data is used to train our models by default.

Kanon 2 Embedder can also be self-hosted for enterprises with heightened security or reliability requirements.

You can read the full announcement on our blog to learn how we did it and how you can get started using Kanon 2 Embedder to embed your own legal documents: https://isaacus.com/blog/introducing-kanon-2-embedder

2 replies

reacted to their post with 🧠➕😎🤗❤️👀🚀🔥 5 months ago

Post

2549

🎉 I am excited to share news of a project my brother, Umar Butler, and I have been working on for what feels like an eternity now.

𝐈𝐧𝐭𝐫𝐨𝐝𝐮𝐜𝐢𝐧𝐠 𝐌𝐋𝐄𝐁 — 𝐭𝐡𝐞 𝐌𝐚𝐬𝐬𝐢𝐯𝐞 𝐋𝐞𝐠𝐚𝐥 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠 𝐁𝐞𝐧𝐜𝐡𝐦𝐚𝐫𝐤.

A suite of 10 high-quality English legal IR datasets, designed by legal experts to set a new standard for comparing embedding models.

Whether you’re exploring legal RAG on your home computer, or running enterprise-scale retrieval, apples-to-apples evaluation is crucial. That’s why we’ve open-sourced everything - including our 7 brand-new, hand-crafted retrieval datasets. All of these datasets are now live on Hugging Face.

Any guesses which embedding model leads on legal retrieval?

𝐇𝐢𝐧𝐭: it’s not OpenAI or Google - they place 7th and 9th on our leaderboard.

To do well on MLEB, embedding models must demonstrate both extensive legal domain knowledge and strong legal reasoning skills.

https://huggingface.co/blog/isaacus/introducing-mleb

1 reply

posted an update 5 months ago

Post

2549

1 reply

reacted to adlumal's post with 🧠🤯 5 months ago

Post

2471

MLEB is the largest, most diverse, and most comprehensive benchmark for legal text embedding models. https://huggingface.co/blog/isaacus/introducing-mleb

Abdur-Rahman Butler

AI & ML interests

Recent Activity

Organizations

abdurrahmanbutler's activity