Freelance AI engineer

FreelanceAIEngineerinParis

I build AI systems designed to work past the demo: RAG pipelines, multi-agent architectures, and LLM orchestration. Not slides. Not POCs that collect dust. Production-grade software with guardrails, observability, and cost controls.

3 production-grade AI systems built end-to-end
Paris, France · Remote across Europe
8+ years in data, AI & product engineering

What I build

AI systems that survive contact with real users

Every engagement starts from the same question: what does this system need to do reliably, at scale, for real people? The answer shapes everything: architecture, model selection, evaluation strategy, deployment.

RAG pipelines & AI assistants

Document ingestion, chunking strategies, hybrid retrieval (dense + sparse), reranking, citation grounding, and streaming generation. I build the full retrieval stack, not just a LangChain quickstart. Includes evaluation loops so you know when retrieval quality degrades before your users do.

LangChainQdrantFastAPILangFuse

Multi-agent systems

Coordinated agent workflows using LangGraph and PydanticAI: tool-calling, consensus engines, supervisor patterns, and human-in-the-loop checkpoints. I design agent architectures where failure is recoverable, not catastrophic. Every agent gets structured output validation, timeout handling, and trace-level observability.

LangGraphPydanticAIOpenRouterPython

LLM integration & orchestration

Model routing across providers (OpenAI, Anthropic, Mistral, open-source), automatic fallback chains, response caching, token budget caps, and streaming multiplexing. I treat LLM calls like any other infrastructure dependency: with retries, circuit breakers, and cost dashboards.

OpenRouterLiteLLMRedisFastAPI

AI product engineering

Taking AI features from prototype to production: prompt injection guardrails, hallucination detection, A/B evaluation frameworks, structured output validation, and full CI/CD with model regression tests. The gap between a working notebook and a production system is where most AI projects die. That is exactly where I operate.

DockerGitHub ActionsPytestPydantic

Computer vision & NLP pipelines

Document understanding (OCR + layout analysis), image classification, named entity recognition, and custom text classification. When the problem requires specialized models beyond general-purpose LLMs, I build and deploy the appropriate pipeline with proper evaluation and monitoring.

TransformersspaCyTesseractFastAPI

Production AI, not demos

What separates production-ready AI from impressive demos

Most AI consultants stop at the POC. I start there. Here is what production AI actually requires, and what I build into every system from day one.

Guardrails & safety

Prompt injection detection, output validation against structured schemas, PII filtering, hallucination scoring, and content policy enforcement. Production LLMs need defense in depth, not a single system prompt hoping for the best.

Observability & tracing

Every LLM call gets traced end-to-end with Langfuse or OpenTelemetry: latency, token usage, retrieval scores, output quality metrics. When something breaks at 2 AM, you need trace-level debugging, not log-level guessing.

Cost management

Model routing that sends simple queries to fast cheap models and complex ones to capable expensive models. Response caching, semantic deduplication, token budget caps per user/org. I have seen LLM bills go from manageable to catastrophic in a week. I build the controls before that happens.

Real deployment

Docker containers, CI/CD pipelines with model regression tests, health checks, autoscaling, and rollback procedures. Not a Streamlit app on someone's laptop. Production AI means the system keeps running correctly when you are not watching.

Production-grade projects

AI systems I designed and built end-to-end

These are not theoretical architectures or slide decks. These are production-grade systems I designed, built, and tested under real conditions, with full infrastructure, observability, and security in place.

Frequently asked questions

Straight answers about freelance AI engineering

The model itself is rarely the problem. What breaks in production is everything around it: retrieval quality that degrades silently when new documents arrive, prompts that behave differently across model versions, costs that spike when usage scales, and hallucinations that slip through when the system lacks proper refusal paths. I build all of those controls from day one because fixing them after launch costs significantly more time and trust.