AFAQ RAZA BHATTI: Show HN: Chaos engineering for LLMs – Making models cross-examine each other

Friday, December 26, 2025

Show HN: Chaos engineering for LLMs – Making models cross-examine each other

Single-model inference is a single point of failure. I got sick of Chatgpt hallucinating fake citations and having to manually check them in a different tab with Claude.

So I built Council

The Difference: Shared Context Most "multi-bot" UIs are just parallel silos. Council uses a sequential backend stream where every response is injected into the context of the next model.

If GPT cites a fake study, Claude sees it and calls it out.

If Gemini misses a logic gap, Grok roasts it.

Adversarial Logic Instead of "consensus" (which leads to boring, average answers), I'm using model-on-model friction to surface the truth. By forcing GPT-4o, Claude 3.5, Gemini 1.5, and Grok into one adversarial window, you get a "red-teamed" output that’s harder to fake.

What I need: It's an MVP. I'm trying to figure out if "Inter-model Cross-Examination" actually kills hallucinations or just creates more expensive ones.

Give it a spin and try to break the logic. No fluff, just testing the architecture.

AFAQ RAZA BHATTI

Friday, December 26, 2025

Show HN: Chaos engineering for LLMs – Making models cross-examine each other

No comments:

Post a Comment

Price-Checking Zerocopy's Zero Cost Abstractions

Report Abuse

Labels