Accuracy and honesty

How we keep our legal AI honest.

The biggest risk in legal AI is a confident wrong answer. A tool that invents a case is worse than no tool at all, because you have to disprove something that was never real. This page explains, in plain words, how our system stays tied to real authority and what our own Nevada testing shows.

General information for Nevada legal professionals. This is not legal advice. Read the full disclaimer.

Why a made-up case is the real danger

AI writes in a confident voice even when it is wrong. In law, that voice is dangerous. A fabricated citation in a brief is not a small slip. It is a candor problem with the court, and the court remembers.

This is not a rare bug. A 2024 Stanford study tested AI on legal questions and found that general AI tools invented fake citations most of the time. Even the leading paid legal research tools still did it often, roughly one in six to one in three answers. The takeaway is simple. How a tool is built matters more than how smart it sounds.

What "retrieval" means, in plain words

Most AI answers from memory. It has read a great deal, and it rebuilds an answer from what it remembers. That is exactly where invented cases come from. Memory is fuzzy, and the model fills gaps with things that sound right.

We work the other way around. Before the AI writes a word, our system pulls up the real documents that relate to your question. Then the AI answers only from those documents. It is reading real text and building the answer around it, not guessing from memory. Lawyers know this instinct well. You do not argue a case from memory. You pull the opinion and read it first.

Grounding every answer in real documents is the single biggest reason hallucination drops.

What "reranking" means, and why it matters

Finding the right document takes two steps, not one.

The first step casts a wide net. It gathers many documents that might be relevant, and it does it fast. The second step reads that set with more care and keeps only the documents that truly answer your question. It sets aside the ones that merely share a few words.

That second, careful pass is where a large part of the accuracy comes from. A tool that skips it tends to serve documents that sound related but are not, and a wrong source leads to a wrong answer.

Two rules that keep answers honest

Show the source, or stay silent. Every answer comes with the exact passage from the real document it relied on. If the system cannot find a passage that supports a point, it says so instead of filling the gap. A made-up case has no passage to show, so it has nowhere to hide.

Check whether the case is still good law. Before any authority reaches you, the system checks whether a later court overruled, criticized, distinguished, or set it aside, and it flags the case with the reason. This is a signal to guide your review, not a final word, and it is not a substitute for a Shepard's or KeyCite check. You confirm the current status of any case before you rely on it. What this catches is the most dangerous mistake of all: citing a case the Nevada Supreme Court has already overruled.

Together these rules mean every answer is either tied to a real source or held back, and every case has been checked against how later courts treated it.

What our testing shows

It rarely invents a citation.

We tested our system on about 200 Nevada questions with known answers. In law, one number matters most: how often a tool makes up a citation. Ours was about 5 in 100. The rest of the time it was right, or it held back rather than guess.

Made-up citations

About 5 in 100

For comparison, Stanford's published research measured the leading paid legal research tools at about 17 in 100 for Lexis and 33 in 100 for Westlaw, on their own separate tests.

When it was not sure

It held back

About 65 of 100 answers were fully correct. Most of the rest were the model declining to answer or answering partway, rather than inventing something. In law, that caution is the point.

How to read these numbers.

Our number comes from our own test on about 200 Nevada questions. The Stanford figures come from their own separate studies on their own questions, so this is not a controlled head-to-head on the same test. The numbers still point the same way. A tool built to pull real documents and refuse when it cannot find support makes up far less than one that answers from memory.

These are our own internal measurements on a small Nevada sample, not certified results. Testing continues on an expanding Nevada question set, and the figures above will be revised as the data grows.

What we claim, and what we do not

We have not run a controlled head-to-head against the big research services on the same questions, and we will not pretend we have. What we can show is a low made-up-citation rate on our own Nevada test, and a process built to keep it that way.

What we claim is a process you can defend. Answers are built from real documents. A source is shown for every point. Every case gets a good-law check. And you review the work before it leaves your firm.

If a question about your work ever comes up, that process is what holds up. Not a number. A method you followed, wrote down, and reviewed.

How the system works

The system pulls from real Nevada authority, shows the source passage for every answer, and runs a good-law check on every case before it reaches you. We tested it on a Nevada question set, and the made-up-citation rate above is what the testing found. We continue testing on an expanding Nevada question set as a standard part of our quality practice, and we update these numbers as we do. The appliance is described in more detail on our security and how-it-works pages.

Common questions about accuracy

What is a legal AI hallucination?

A hallucination is when an AI states a case, quote, or rule that does not exist or does not say what the AI claims. In law this is serious. A made-up citation in a brief is a problem of candor with the court, not a small typo.

How does grounding answers in real documents help?

Most AI answers from memory, and that is where invented cases come from. DilloLex first pulls up the actual documents that relate to your question, then answers only from those documents. Reading real text instead of guessing from memory is the single biggest reason made-up answers drop.

How does DilloLex keep made-up cases out of an answer?

Two rules. First, every answer shows the exact passage from the real document it relied on. If the system cannot find support, it says so instead of guessing. Second, before any case reaches you, the system checks whether a later court overruled, criticized, distinguished, or set it aside, and flags it. You confirm the current status before you rely on it.

What do your accuracy numbers mean?

We tested our system on about 200 Nevada legal questions with known answers. The number that matters most is how often a tool invents a citation. Ours did it about 5 in 100. Stanford's published research measured the leading paid legal tools at about 17 in 100 (Lexis) and 33 in 100 (Westlaw) on their own separate tests. About 65 of our answers in 100 were fully correct, and most of the rest were the model holding back rather than guessing. These are our own internal measurements, not a controlled head-to-head.

Are your accuracy numbers certified?

No. Our testing was done by our own team on a Nevada question set, so the numbers are our own honest measurements, not certified by an outside party. Ask us about our method at a demo, and we will walk you and your malpractice carrier through it.

See it for yourself. Bring your questions.

We walk firms through how the system finds authority, how the good-law check works, and where our testing stands, at every demo. If you need to take it to your malpractice carrier, we can do that too.

Book a demo