I built an open-source real-time LLM hallucination guardrail — here are the benchmarks

What is Director-Class AI?

An open-source Python library that guards LLM output in real time. It watches tokens as they stream and halts generation the moment it detects a hallucination.

It uses NLI (Natural Language Inference via DeBERTa/F…


This content originally appeared on DEV Community and was authored by Miroslav Šotek

What is Director-Class AI?

An open-source Python library that guards LLM output in real time. It watches tokens as they stream and halts generation the moment it detects a hallucination.

It uses NLI (Natural Language Inference via DeBERTa/FactCG) and optional RAG knowledge grounding to score each claim against source documents.

pip install director-ai

Two-line integration:

from director_ai import guard
client = guard(openai.OpenAI())  # wraps any OpenAI/Anthropic client

Benchmarks (measured, not aspirational)

Metric Value Conditions
Balanced accuracy 75.8% FactCG on LLM-AggreFact (29,320 samples)
GPU latency 14.6ms/pair GTX 1060, ONNX, batch=16
L40S latency 0.5ms/pair FP16, batch=32
E2E catch rate 90.7% Hybrid mode, 600 HaluEval traces
Rust BM25 speedup 10.2x Over pure Python implementation

Framework Integrations

LangChain, LlamaIndex, LangGraph, CrewAI, Haystack, DSPy, Semantic Kernel, and SDK Guard (wraps OpenAI/Anthropic/Bedrock/Gemini/Cohere clients).

Honest Limitations

  • NLI-only scoring needs KB grounding for domain use (medical FPR=100% without KB)
  • ONNX CPU is slow (383ms/pair) — GPU recommended
  • Long documents need >=16GB VRAM
  • Summarisation accuracy weakest (AggreFact-CNN 68.8%)

Quality

  • 3,545 tests, 91% coverage
  • Sigstore-signed releases, SLSA provenance
  • OpenSSF Best Practices: 100%
  • 19 badges of CI/security health

Links

AGPL-3.0 with commercial licensing available.

Would love feedback from anyone working on LLM reliability, RAG pipelines, or AI safety!


This content originally appeared on DEV Community and was authored by Miroslav Šotek


Print Share Comment Cite Upload Translate Updates
APA

Miroslav Šotek | Sciencx (2026-03-29T01:17:47+00:00) I built an open-source real-time LLM hallucination guardrail — here are the benchmarks. Retrieved from https://www.scien.cx/2026/03/29/i-built-an-open-source-real-time-llm-hallucination-guardrail-here-are-the-benchmarks/

MLA
" » I built an open-source real-time LLM hallucination guardrail — here are the benchmarks." Miroslav Šotek | Sciencx - Sunday March 29, 2026, https://www.scien.cx/2026/03/29/i-built-an-open-source-real-time-llm-hallucination-guardrail-here-are-the-benchmarks/
HARVARD
Miroslav Šotek | Sciencx Sunday March 29, 2026 » I built an open-source real-time LLM hallucination guardrail — here are the benchmarks., viewed ,<https://www.scien.cx/2026/03/29/i-built-an-open-source-real-time-llm-hallucination-guardrail-here-are-the-benchmarks/>
VANCOUVER
Miroslav Šotek | Sciencx - » I built an open-source real-time LLM hallucination guardrail — here are the benchmarks. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2026/03/29/i-built-an-open-source-real-time-llm-hallucination-guardrail-here-are-the-benchmarks/
CHICAGO
" » I built an open-source real-time LLM hallucination guardrail — here are the benchmarks." Miroslav Šotek | Sciencx - Accessed . https://www.scien.cx/2026/03/29/i-built-an-open-source-real-time-llm-hallucination-guardrail-here-are-the-benchmarks/
IEEE
" » I built an open-source real-time LLM hallucination guardrail — here are the benchmarks." Miroslav Šotek | Sciencx [Online]. Available: https://www.scien.cx/2026/03/29/i-built-an-open-source-real-time-llm-hallucination-guardrail-here-are-the-benchmarks/. [Accessed: ]
rf:citation
» I built an open-source real-time LLM hallucination guardrail — here are the benchmarks | Miroslav Šotek | Sciencx | https://www.scien.cx/2026/03/29/i-built-an-open-source-real-time-llm-hallucination-guardrail-here-are-the-benchmarks/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.