Judges Turn to AI — When Speed Meets Hallucination Risk

Courtrooms Confront AI Mistakes

The US legal system has recently revealed how brittle AI tools can be in high-stakes settings. Lawyers at well-known firms submitted filings that cited cases that did not exist, and an AI-savvy Stanford professor filed sworn testimony containing hallucinations in a deepfake case. Judges stepped in with reprimands and fines, but the problem did not stop there: judges themselves have started using generative AI and some of their outputs have contained errors that went undetected.

Judges Experimenting with AI Assistance

Some judges believe AI can help clear backlogs by handling routine parts of case preparation. Tasks such as summarizing long briefs, generating timelines, drafting routine orders, or suggesting questions for hearings are seen as low-risk uses. Judge Xavier Rodriguez, who began studying AI in 2018, asks models to summarize cases, identify key players, and generate timelines and hearing questions. He treats these outputs like first-draft work that he can check and correct.

Judge Allison Goddard also uses models as a thought partner. She runs 60-page district orders through AI to get concise summaries and to surface questions on technical points. She encourages clerks to use tools that do not train on user conversations by default and relies on law-specific platforms like Westlaw or Lexis for anything requiring verified legal authority.

Defining Safe Boundaries Remains Hard

Researchers and judges warn that the line between safe, rote tasks and those requiring human judgment is fuzzy. Erin Solovey, who studies human-AI interaction, notes that models differ in how they summarize and that AI can produce plausible but factually incorrect timelines. The Sedona Conference issued guidance listing potentially safe uses — legal research assistance, preliminary transcripts, and searching briefings — while emphasizing that hallucinations remain unresolved.

Some judges draw a firm line: using AI to predict bail eligibility or decide custody would be improper because those tasks demand discretion and human judgment. Others worry that falling behind on technology could carry its own risks, as some in the AI sector argue models could be more objective than humans.

Errors, Accountability, and Stakes

Problems are emerging. This summer a federal judge in New Jersey reissued an order with errors that may have come from AI, a Georgia appellate judge issued an order partially based on made-up cases, and a Mississippi judge replaced a flawed civil rights decision without explaining the source of errors. Unlike attorneys, judges face few mechanisms for enforced transparency when their orders contain mistakes. As Judge Scott Schlegel warns, when a judge makes a mistake the ruling becomes law until corrected, and reversals can be difficult and slow.

The stakes are especially high in child custody, bail, and civil rights cases, where a mistaken citation or fabricated precedent can have lasting consequences. Many judges therefore treat AI outputs like junior-associate drafts: useful for speed but always subject to careful human verification.

A Cautious, Practical Approach