
An autonomous pentesting platform that orchestrates 39 AI exploit agents, 25+ security scanners, and multi-LLM reasoning across DORA TLPT and NIS2 compliance frameworks — from reconnaissance to executive report.
AI exploit agents
39 agents
Each agent specializes in a vulnerability class — SQLi, XSS, SSRF, LFI, RCE, crypto, auth bypass, and more.
Scanner integrations
25+ tools
ZAP, Nuclei, Nmap, Masscan, Semgrep, Nikto, Subfinder, Amass, Trivy, SQLMap, Metasploit, and more.
Docker services
14 containers
FastAPI, Next.js, PostgreSQL, PgBouncer, Redis, Neo4j, Qdrant, Temporal, workers, and scanner sidecars.
Compliance frameworks
DORA + NIS2
Automated TLPT evidence collection and regulatory mapping for EU financial sector compliance.
The Problem
Penetration testing workflows are fragmented across dozens of standalone tools, each producing incompatible output. Analysts waste hours on manual triage, duplicate scanning, and report assembly instead of reasoning about real attack paths. Spectre was designed to unify the entire offensive security lifecycle under AI-driven orchestration.
Spectre was born from the observation that modern penetration testing is still overwhelmingly manual. Security teams spend more time wrangling scanner output, writing reports, and context-switching between tools than actually reasoning about attack paths. The goal was to build a platform where AI agents handle the grunt work — scanning, triaging, correlating, exploiting — while humans focus on strategy and judgement.
The system is built around a FastAPI backend orchestrating 39 specialized exploit agents via LangGraph state machines, with Temporal handling long-running campaign workflows. Each agent is a self-contained unit that can reason about its domain (SQLi, XSS, SSRF, crypto weaknesses, etc.), select appropriate tools, and chain findings into attack narratives. A Neo4j knowledge graph connects targets, vulnerabilities, and exploit paths into a queryable attack surface.
The frontend is a Next.js 16 application with GSAP-powered animations, real-time campaign monitoring via SSE, and a DORA TLPT compliance module that maps findings directly to regulatory frameworks. The result is a platform that can run a full penetration test — from subdomain discovery to executive PDF report — with minimal human intervention.
What changed
Instead of manually running scanners, copying output between tools, and assembling reports in Word, Spectre lets operators define a campaign target and watch AI agents autonomously discover, scan, exploit, and report — with full traceability from finding to evidence to compliance mapping.
Why it was hard
The core challenge was orchestrating 39 agents that each need different tools, different reasoning patterns, and different failure modes — while keeping the whole system deterministic enough to produce compliance-grade evidence. Balancing agent autonomy with human controllability required careful state machine design in LangGraph and durable workflow patterns in Temporal.
Constraints
My role
Proof
Real screens from the product — each one supports a specific argument about clarity, control, or observability.
Mission ControlClick to view full sizeThe central nervous system of Spectre — real-time campaign status, active agent count, finding severity distribution, and system health at a glance.
Designed for SOC-style monitoring where operators need to track multiple concurrent campaigns without context-switching.
Workflow EngineClick to view full sizeCampaign management with target scope definition, agent selection, scanner configuration, and Temporal workflow orchestration.
Each campaign is a durable Temporal workflow that survives restarts, handles timeouts, and produces a complete audit trail.
Tool IntegrationClick to view full sizeUnified interface for 25+ security scanners — each with health monitoring, configuration, and normalized output feeding into the knowledge graph.
Scanner outputs are parsed into a common schema before being ingested into Neo4j, enabling cross-tool correlation.
Triage SurfaceClick to view full sizeAI-prioritized findings with severity scoring, CVSS mapping, exploit chain visualization, and one-click evidence export.
The triage interface reduces analyst fatigue by surfacing the highest-signal findings first with pre-built exploit context.
Executive OutputClick to view full sizeAutomated report generation with executive summary, technical findings, risk matrices, and compliance mapping — ready for stakeholder delivery.
Reports are generated as structured data first, then rendered to PDF with WeasyPrint for pixel-perfect output.
RegulatoryClick to view full sizeThreat-Led Penetration Testing module that maps findings to DORA and NIS2 regulatory requirements with evidence chains.
Built for EU financial sector compliance — each finding links to specific regulatory articles and control frameworks.
Decisions
The strongest work is visible in the choices made under pressure, not just in the final interface.
Challenge
39 exploit agents needed structured reasoning flows with branching, retries, and tool selection — not just sequential chains.
Decision
Adopted LangGraph state machines where each agent type has its own graph topology: reconnaissance agents use linear flows, exploit agents use cyclic graphs with validation loops, and reporting agents use fan-out/fan-in patterns.
Tradeoff
LangGraph added complexity to agent definitions but made each agent's behavior fully inspectable and debuggable — critical for compliance evidence.
Challenge
Findings from 25+ scanners needed to be correlated into attack paths, not just listed as flat vulnerability tables.
Decision
Built a Neo4j knowledge graph where targets, services, vulnerabilities, and exploits are connected nodes. This enables graph queries like 'show all paths from internet-facing services to the database through exploitable vulnerabilities.'
Tradeoff
Neo4j added operational overhead (another stateful service to manage) but transformed raw scanner output into actionable intelligence that agents could reason about.
Challenge
Pentesting campaigns run for hours, involve dozens of parallel scanner tasks, and must survive infrastructure failures without losing progress.
Decision
Used Temporal durable workflows where each campaign phase (recon → scanning → exploitation → reporting) is a workflow step with automatic retry, timeout, and checkpoint semantics.
Tradeoff
Temporal's Java-influenced SDK required careful Python wrapper design, but the guarantee of exactly-once execution and full workflow history made it indispensable for compliance-grade audit trails.
Architecture
Operator defines campaign scope — target domains, IP ranges, excluded assets, scanner selection, and compliance framework (DORA TLPT or NIS2). The campaign is registered as a Temporal workflow.
→ A durable, resumable campaign workflow is created with full configuration snapshot.
Subfinder, Amass, and Nmap agents perform subdomain enumeration, port scanning, and service fingerprinting. Discovered assets are ingested into the Neo4j knowledge graph as interconnected nodes.
→ Complete attack surface map with service versions, technologies, and network topology.
ZAP, Nuclei, Semgrep, Nikto, and Trivy run in parallel against discovered assets. Scanner outputs are normalized into a common schema and correlated in Neo4j to identify overlapping findings and attack chains.
→ Deduplicated, correlated vulnerability inventory with graph-based attack path analysis.
39 specialized exploit agents receive prioritized findings and attempt automated exploitation — SQLi payloads, XSS chains, SSRF pivots, auth bypasses. Each attempt is logged with full evidence for audit.
→ Validated exploits with proof-of-concept evidence, separated from theoretical findings.
Multi-LLM consensus (OpenRouter + Mistral + Ollama) scores and prioritizes findings. DORA TLPT and NIS2 mapping links each finding to regulatory articles and control frameworks.
→ Risk-ranked findings with regulatory compliance evidence chains ready for audit.
Automated executive summaries, technical findings, risk matrices, and remediation recommendations are assembled. WeasyPrint renders the final PDF with consistent formatting.
→ Stakeholder-ready penetration test report with full evidence trail from scan to finding to exploit.
Spectre is a 14-service Docker Compose stack centered on a FastAPI backend that orchestrates 39 LangGraph-based exploit agents through Temporal durable workflows. The frontend is a Next.js 16 application with real-time SSE streaming for campaign monitoring. Security scanners (ZAP, Nuclei, Nmap, Semgrep, etc.) run as sidecar containers with output normalization into a Neo4j knowledge graph. Qdrant provides vector similarity search for vulnerability deduplication. PostgreSQL with PgBouncer handles persistent state, Redis manages pub/sub and caching, and Traefik routes traffic across services. Multi-LLM fallback ensures agent availability across OpenRouter, Mistral, and Ollama providers.
Product surfaces

Live campaign dashboards with SSE-streamed agent activity, scanner progress, and finding counts. Operators can pause, resume, or redirect campaigns without losing state thanks to Temporal's durable workflow model.

Deep campaign inspection showing agent performance, scanner results, timing breakdowns, and finding distribution by severity. The detail view provides full operational visibility into what each agent discovered and exploited.

Deep-dive analysis interface for individual findings with CVSS scoring, exploit chain visualization, affected assets, remediation guidance, and one-click evidence export for compliance documentation.

Automated report generation with configurable sections — executive summary, technical findings, risk matrices, compliance mapping, and remediation roadmap. Rendered to PDF via WeasyPrint with consistent branding.

System-wide configuration for LLM providers, scanner paths, API keys, notification channels, and compliance framework selection. Multi-LLM fallback chain is configured here with priority ordering.
Tech Stack
Framework
AI
Data
Security
UI
Infra
Languages
Python 3.12TypeScriptCypher (Neo4j)SQLYAMLDockerfileWhat this project proves