Guardrails

Protect your AI applications with comprehensive guardrails that ensure safety, security, and compliance across all interactions. Add these as you build your workflows.

Detect PII (Personally Identifiable Information)

Input & OutputPrivacy/Compliance

Definition

Scans input or output for sensitive data like names, addresses, or account numbers. It masks or redacts this data to protect user privacy.

Best Used

Input: Before prompt processing to prevent sensitive data logging. Output: Before display to ensure the model doesn't leak PII, even from its training data.

Hallucination

OutputFactual Accuracy

Definition

Compares the AI's generated response against verified, external sources (grounding) to check factual accuracy. Prevents the model from generating fabricated or incorrect information.

Best Used

In Retrieval-Augmented Generation (RAG) systems or any application generating high-stakes, factual content (e.g., legal, financial, or medical summaries).

Detect Jailbreak

InputSecurity/Integrity

Definition

Analyzes the user's prompt for malicious intent, obfuscation, or manipulation tactics designed to bypass safety filters. Blocks attempts to force the model to perform disallowed or harmful actions.

Best Used

On user-facing applications where the model's core instructions must be protected from external attacks to maintain system integrity and security.

Moderation

Input & OutputSafety/Ethics

Definition

Uses classifiers to flag and filter text (input and output) that is toxic, hateful, explicit, or discriminatory. Ensures all parts of the conversation are safe and compliant with ethical guidelines.

Best Used

Input: To stop harmful prompts before they reach the LLM. Output: To filter any inappropriate content the model may inadvertently generate, protecting brand reputation.

LLM Critique

OutputQuality Assurance

Definition

Employs a separate, often smaller, language model to evaluate the primary LLM's final output against detailed rules for accuracy, style, and safety. Serves as an automated, sophisticated quality assurance layer.

Best Used

As the final check in complex AI agents or pipelines that require the response to strictly adhere to multiple, nuanced constraints (e.g., tone, format, and compliance).