SecurityOffensive Security Case Study

Spectre

An autonomous pentesting platform that orchestrates 39 AI exploit agents, 25+ security scanners, and multi-LLM reasoning across DORA TLPT and NIS2 compliance frameworks — from reconnaissance to executive report.

Visit live product Back to portfolio

AI exploit agents

39 agents

Each agent specializes in a vulnerability class — SQLi, XSS, SSRF, LFI, RCE, crypto, auth bypass, and more.

Scanner integrations

25+ tools

ZAP, Nuclei, Nmap, Masscan, Semgrep, Nikto, Subfinder, Amass, Trivy, SQLMap, Metasploit, and more.

Docker services

14 containers

FastAPI, Next.js, PostgreSQL, PgBouncer, Redis, Neo4j, Qdrant, Temporal, workers, and scanner sidecars.

Compliance frameworks

DORA + NIS2

Automated TLPT evidence collection and regulatory mapping for EU financial sector compliance.

The Problem

What this project had to solve

Penetration testing workflows are fragmented across dozens of standalone tools, each producing incompatible output. Analysts waste hours on manual triage, duplicate scanning, and report assembly instead of reasoning about real attack paths. Spectre was designed to unify the entire offensive security lifecycle under AI-driven orchestration.

Spectre was born from the observation that modern penetration testing is still overwhelmingly manual. Security teams spend more time wrangling scanner output, writing reports, and context-switching between tools than actually reasoning about attack paths. The goal was to build a platform where AI agents handle the grunt work — scanning, triaging, correlating, exploiting — while humans focus on strategy and judgement.

The system is built around a FastAPI backend orchestrating 39 specialized exploit agents via LangGraph state machines, with Temporal handling long-running campaign workflows. Each agent is a self-contained unit that can reason about its domain (SQLi, XSS, SSRF, crypto weaknesses, etc.), select appropriate tools, and chain findings into attack narratives. A Neo4j knowledge graph connects targets, vulnerabilities, and exploit paths into a queryable attack surface.

The frontend is a Next.js 16 application with GSAP-powered animations, real-time campaign monitoring via SSE, and a DORA TLPT compliance module that maps findings directly to regulatory frameworks. The result is a platform that can run a full penetration test — from subdomain discovery to executive PDF report — with minimal human intervention.

What changed

Instead of manually running scanners, copying output between tools, and assembling reports in Word, Spectre lets operators define a campaign target and watch AI agents autonomously discover, scan, exploit, and report — with full traceability from finding to evidence to compliance mapping.

Why it was hard

The core challenge was orchestrating 39 agents that each need different tools, different reasoning patterns, and different failure modes — while keeping the whole system deterministic enough to produce compliance-grade evidence. Balancing agent autonomy with human controllability required careful state machine design in LangGraph and durable workflow patterns in Temporal.

Constraints

39 exploit agents had to operate autonomously while remaining auditable and controllable by human operators.
25+ scanner integrations (ZAP, Nuclei, Nmap, Semgrep, SQLMap, etc.) needed unified output normalization.
Campaign workflows could run for hours — Temporal was required for durable, resumable execution.
DORA TLPT and NIS2 compliance demanded structured evidence trails, not just raw findings.
Multi-LLM fallback (OpenRouter → Mistral → Ollama) was essential for cost control and availability.

My role

Designed and built the full-stack platform: FastAPI backend, Next.js frontend, Docker infrastructure.
Architected the LangGraph-based multi-agent system with 39 specialized exploit agents.
Built the Temporal workflow engine for durable, long-running campaign orchestration.
Integrated 25+ security scanners with unified output normalization into Neo4j.
Implemented DORA TLPT compliance mapping and automated executive report generation.
Designed the real-time campaign monitoring UI with SSE streaming and GSAP animations.

Proof

What the product actually looks like

Real screens from the product — each one supports a specific argument about clarity, control, or observability.

Mission ControlClick to view full size

Command Center Dashboard

The central nervous system of Spectre — real-time campaign status, active agent count, finding severity distribution, and system health at a glance.

Designed for SOC-style monitoring where operators need to track multiple concurrent campaigns without context-switching.

Workflow EngineClick to view full size

Campaign Operations

Campaign management with target scope definition, agent selection, scanner configuration, and Temporal workflow orchestration.

Each campaign is a durable Temporal workflow that survives restarts, handles timeouts, and produces a complete audit trail.

Tool IntegrationClick to view full size

Scanner Arsenal

Unified interface for 25+ security scanners — each with health monitoring, configuration, and normalized output feeding into the knowledge graph.

Scanner outputs are parsed into a common schema before being ingested into Neo4j, enabling cross-tool correlation.

Triage SurfaceClick to view full size

Vulnerability Findings

AI-prioritized findings with severity scoring, CVSS mapping, exploit chain visualization, and one-click evidence export.

The triage interface reduces analyst fatigue by surfacing the highest-signal findings first with pre-built exploit context.

Executive OutputClick to view full size

Campaign Report

Automated report generation with executive summary, technical findings, risk matrices, and compliance mapping — ready for stakeholder delivery.

Reports are generated as structured data first, then rendered to PDF with WeasyPrint for pixel-perfect output.

RegulatoryClick to view full size

DORA TLPT Compliance

Threat-Led Penetration Testing module that maps findings to DORA and NIS2 regulatory requirements with evidence chains.

Built for EU financial sector compliance — each finding links to specific regulatory articles and control frameworks.

Decisions

Tradeoffs that shaped the product

The strongest work is visible in the choices made under pressure, not just in the final interface.

LangGraph over plain LangChain

Challenge

39 exploit agents needed structured reasoning flows with branching, retries, and tool selection — not just sequential chains.

Decision

Adopted LangGraph state machines where each agent type has its own graph topology: reconnaissance agents use linear flows, exploit agents use cyclic graphs with validation loops, and reporting agents use fan-out/fan-in patterns.

Tradeoff

LangGraph added complexity to agent definitions but made each agent's behavior fully inspectable and debuggable — critical for compliance evidence.

Neo4j knowledge graph for attack surface

Challenge

Findings from 25+ scanners needed to be correlated into attack paths, not just listed as flat vulnerability tables.

Decision

Built a Neo4j knowledge graph where targets, services, vulnerabilities, and exploits are connected nodes. This enables graph queries like 'show all paths from internet-facing services to the database through exploitable vulnerabilities.'

Tradeoff

Neo4j added operational overhead (another stateful service to manage) but transformed raw scanner output into actionable intelligence that agents could reason about.

Temporal for campaign orchestration

Challenge

Pentesting campaigns run for hours, involve dozens of parallel scanner tasks, and must survive infrastructure failures without losing progress.

Decision

Used Temporal durable workflows where each campaign phase (recon → scanning → exploitation → reporting) is a workflow step with automatic retry, timeout, and checkpoint semantics.

Tradeoff

Temporal's Java-influenced SDK required careful Python wrapper design, but the guarantee of exactly-once execution and full workflow history made it indispensable for compliance-grade audit trails.

Architecture

How data flows through the system

Campaign Definition & Target Scoping

Operator defines campaign scope — target domains, IP ranges, excluded assets, scanner selection, and compliance framework (DORA TLPT or NIS2). The campaign is registered as a Temporal workflow.

→ A durable, resumable campaign workflow is created with full configuration snapshot.

Reconnaissance & Asset Discovery

Subfinder, Amass, and Nmap agents perform subdomain enumeration, port scanning, and service fingerprinting. Discovered assets are ingested into the Neo4j knowledge graph as interconnected nodes.

→ Complete attack surface map with service versions, technologies, and network topology.

Vulnerability Scanning & Correlation

ZAP, Nuclei, Semgrep, Nikto, and Trivy run in parallel against discovered assets. Scanner outputs are normalized into a common schema and correlated in Neo4j to identify overlapping findings and attack chains.

→ Deduplicated, correlated vulnerability inventory with graph-based attack path analysis.

AI-Driven Exploitation & Validation

39 specialized exploit agents receive prioritized findings and attempt automated exploitation — SQLi payloads, XSS chains, SSRF pivots, auth bypasses. Each attempt is logged with full evidence for audit.

→ Validated exploits with proof-of-concept evidence, separated from theoretical findings.

Triage, Scoring & Compliance Mapping

Multi-LLM consensus (OpenRouter + Mistral + Ollama) scores and prioritizes findings. DORA TLPT and NIS2 mapping links each finding to regulatory articles and control frameworks.

→ Risk-ranked findings with regulatory compliance evidence chains ready for audit.

Report Generation & Delivery

Automated executive summaries, technical findings, risk matrices, and remediation recommendations are assembled. WeasyPrint renders the final PDF with consistent formatting.

→ Stakeholder-ready penetration test report with full evidence trail from scan to finding to exploit.

Spectre is a 14-service Docker Compose stack centered on a FastAPI backend that orchestrates 39 LangGraph-based exploit agents through Temporal durable workflows. The frontend is a Next.js 16 application with real-time SSE streaming for campaign monitoring. Security scanners (ZAP, Nuclei, Nmap, Semgrep, etc.) run as sidecar containers with output normalization into a Neo4j knowledge graph. Qdrant provides vector similarity search for vulnerability deduplication. PostgreSQL with PgBouncer handles persistent state, Redis manages pub/sub and caching, and Traefik routes traffic across services. Multi-LLM fallback ensures agent availability across OpenRouter, Mistral, and Ollama providers.

Product surfaces

The interfaces that carry the experience

Real-Time Campaign Monitoring

Live campaign dashboards with SSE-streamed agent activity, scanner progress, and finding counts. Operators can pause, resume, or redirect campaigns without losing state thanks to Temporal's durable workflow model.

Server-Sent Events streamingTemporal workflow controlLive agent activity feedCampaign pause/resume

Campaign Target Management

Granular target scoping with domain lists, IP ranges, CIDR notation, and asset exclusion rules. Each target is tracked through the full lifecycle from discovery to exploitation with status indicators.

CIDR range supportAsset exclusion rulesLifecycle trackingScope validation

Campaign Detail & Agent Analytics

Deep campaign inspection showing agent performance, scanner results, timing breakdowns, and finding distribution by severity. The detail view provides full operational visibility into what each agent discovered and exploited.

Per-agent performance metricsScanner result breakdownSeverity distributionTiming analytics

Target Intelligence View

Individual target analysis with discovered services, open ports, identified technologies, and linked vulnerabilities rendered as a knowledge graph subview pulled from Neo4j.

Neo4j graph subviewService fingerprintingTechnology detectionLinked vulnerabilities

Vulnerability Analysis Console

Deep-dive analysis interface for individual findings with CVSS scoring, exploit chain visualization, affected assets, remediation guidance, and one-click evidence export for compliance documentation.

CVSS scoringExploit chain visualizationEvidence exportRemediation guidance

Executive Report Builder

Automated report generation with configurable sections — executive summary, technical findings, risk matrices, compliance mapping, and remediation roadmap. Rendered to PDF via WeasyPrint with consistent branding.

Configurable sectionsPDF renderingCompliance mappingRisk matrices

Platform Settings & Configuration

System-wide configuration for LLM providers, scanner paths, API keys, notification channels, and compliance framework selection. Multi-LLM fallback chain is configured here with priority ordering.

Multi-LLM configurationScanner managementAPI key vaultNotification channels

Tech Stack

Built with

Framework

Next.js 16TypeScript 5.3FastAPI 0.115Python 3.12Pydantic v2

LangGraphLangChain CoreMulti-LLM fallback (OpenRouter + Mistral + Ollama)structlog

Data

TemporalNeo4jQdrantPostgreSQL + PgBouncerRedis 7TanStack React Query 5

Security

OWASP ZAPNucleiNmap + MasscanSemgrepNiktoSubfinder + AmassTrivySQLMapMetasploit RPC

Tailwind CSS 4GSAP + ScrollTriggerRadix UI + shadcn/ui

Infra

Docker ComposeTraefikPrometheus + Grafana

Languages

Python 3.12TypeScriptCypher (Neo4j)SQLYAMLDockerfile

What this project proves

Multi-agent AI architecture (39 specialized exploit agents)LangGraph state machine orchestrationDurable workflow engineering (Temporal)Graph database modeling (Neo4j knowledge graph)Security scanner integration and output normalizationReal-time streaming (SSE)DORA TLPT and NIS2 regulatory complianceFull-stack platform engineering (FastAPI + Next.js)Multi-LLM orchestration and fallbackDocker infrastructure design (14 services)Automated report generationOffensive security methodology

PreviousPrettiOps

Back to all projects