Friday, December 26, 2025

Show HN: Chaos engineering for LLMs – Making models cross-examine each other

Single-model inference is a single point of failure. I got sick of Chatgpt hallucinating fake citations and having to manually check them in a different tab with Claude.

So I built Council

The Difference: Shared Context Most "multi-bot" UIs are just parallel silos. Council uses a sequential backend stream where every response is injected into the context of the next model.

If GPT cites a fake study, Claude sees it and calls it out.

If Gemini misses a logic gap, Grok roasts it.

Adversarial Logic Instead of "consensus" (which leads to boring, average answers), I'm using model-on-model friction to surface the truth. By forcing GPT-4o, Claude 3.5, Gemini 1.5, and Grok into one adversarial window, you get a "red-teamed" output that’s harder to fake.

What I need: It's an MVP. I'm trying to figure out if "Inter-model Cross-Examination" actually kills hallucinations or just creates more expensive ones.

Give it a spin and try to break the logic. No fluff, just testing the architecture.


Comments URL: https://news.ycombinator.com/item?id=46395140

Points: 1

# Comments: 0



from Hacker News: Newest https://ift.tt/Z2hkEeM
via IFTTT

No comments:

Post a Comment

The Untold Story of How One Guy Keeps the World Running on the Right Timezone

Article URL: https://onezero.medium.com/the-largely-untold-story-of-how-one-guy-in-california-keeps-the-worlds-computers-on-the-right-time-a...