Skip to content
/ nl2sql Public

NL2SQL is an enterprise-grade, multi-agent NL→SQL system that delivers accurate, safe, and deterministic SQL with schema retrieval, validation, and full observability.

License

Notifications You must be signed in to change notification settings

nadeem4/nl2sql

Repository files navigation

NL2SQL Engine

Production-grade Natural Language → SQL runtime with deterministic orchestration.

NL2SQL treats text-to-SQL as a distributed systems problem. The engine compiles a user query into a validated plan, executes via adapters, and aggregates results through a graph-based pipeline.


🧭 What you get

  • Graph-based orchestration (LangGraph) with explicit state (GraphState)
  • Deterministic planning and validation before SQL generation
  • Adapter-based execution with sandbox isolation
  • Observability hooks (metrics, logs, audit events)

🏗️ System Topology

The runtime is organized around a LangGraph orchestration pipeline and supporting registries. It is designed for fault isolation and deterministic execution.

flowchart TD
    User[User Query] --> Resolver[DatasourceResolverNode]
    Resolver --> Decomposer[DecomposerNode]
    Decomposer --> Planner[GlobalPlannerNode]
    Planner --> Router[Layer Router]

    subgraph SQLAgent["SQL Agent Subgraph"]
        Schema[SchemaRetrieverNode] --> AST[ASTPlannerNode]
        AST -->|ok| Logical[LogicalValidatorNode]
        AST -->|retry| Retry[retry_node]
        Logical -->|ok| Generator[GeneratorNode]
        Logical -->|retry| Retry
        Generator --> Executor[ExecutorNode]
        Retry --> Refiner[RefinerNode]
        Refiner --> AST
    end

    Router --> Schema
    Executor --> Router
    Router --> Aggregator[EngineAggregatorNode]
    Aggregator --> Synthesizer[AnswerSynthesizerNode]
Loading

1. The Control Plane (The Graph)

Responsibility: Reasoning, Planning, and Orchestration.

  • Agentic Graph: Implemented as a Directed Cyclic Graph (LangGraph) to enable refinement loops. If a plan fails validation, the system self-corrects.
  • State Management: Shared GraphState ensures auditability and reproducibility of every decision.

2. The Security Plane (The Firewall)

Responsibility: Invariants Enforcement.

  • Valid-by-Construction: The LLM generates an Abstract Syntax Tree (AST) rather than executing SQL.
  • Static Analysis: The Logical Validator enforces RBAC and schema constraints before SQL generation.

3. The Data Plane (The Sandbox)

Responsibility: Semantic Search and Execution.

  • Blast Radius Isolation: SQL drivers run in a dedicated Sandboxed Process Pool. A segfault in a driver kills a disposable worker, not the Agent.
  • Partitioned Retrieval: The Schema Store + Retrieval flow injects relevant schema context, preventing context window overflow.

4. The Reliability Plane (The Guard)

Responsibility: Fault Tolerance and Stability.

  • Layered Defense: A combination of Circuit Breakers and Sandboxing keeps the system stable during outages.
  • Fail-Fast: We stop processing immediately if a dependency is unresponsive, preserving resources.

5. The Observability Plane (The Watchtower)

Responsibility: Visibility, Forensics, and Compliance.

  • Full-Stack Telemetry: Native OpenTelemetry integration provides distributed tracing (Jaeger) and metrics (Prometheus) for every node execution.
  • Forensic Audit Logs: A persistent Audit Log records AI decisions for compliance and debugging.

📐 Architectural Invariants

Invariant Rationale Mechanism
No Unvalidated SQL Prevent hallucinations & data leaks All plans pass through LogicalValidator (AST). PhysicalValidator exists but is not wired into the default SQL subgraph.
Zero Shared State Crash Safety Execution happens in isolated processes; no shared memory with the Control Plane.
Fail-Fast Reliability Circuit Breakers and Strict Timeouts prevent cascading failures (Retry Storms).
Determinism Debuggability Temperature-0 generation + Strict Typing (Pydantic) for all LLM outputs.

🚀 Quick Start

Prerequisites

  • Python 3.10+
  • A configured datasource (configs/datasources.yaml)
  • A configured LLM (configs/llm.yaml)

1. Installation

git clone https://github.com/nadeem4/nl2sql.git
cd nl2sql

# Set up environment
python -m venv venv
source venv/bin/activate

# Install core engine and adapter SDK
pip install -e packages/core
pip install -e packages/adapter-sdk

2. Run a query (Python API)

from nl2sql.context import NL2SQLContext
from nl2sql.pipeline.runtime import run_with_graph

ctx = NL2SQLContext()
result = run_with_graph(ctx, "Top 5 customers by revenue last quarter?")

print(result.get("final_answer"))

📚 Documentation


📦 Repository Structure

packages/
├── core/               # The Engine (Graph, State, Logic)
├── adapter-sdk/        # Interface Contract for new Databases
└── adapters/           # Official Dialects (Postgres, MSSQL, MySQL)
configs/                # Runtime Configuration (Policies, Prompts)
docs/                   # Architecture & Operations Manual

About

NL2SQL is an enterprise-grade, multi-agent NL→SQL system that delivers accurate, safe, and deterministic SQL with schema retrieval, validation, and full observability.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages