FreelanceAIEngineerinParis
I build AI systems designed to work past the demo: RAG pipelines, multi-agent architectures, and LLM orchestration. Not slides. Not POCs that collect dust. Production-grade software with guardrails, observability, and cost controls.
What I build
AI systems that survive contact with real users
Every engagement starts from the same question: what does this system need to do reliably, at scale, for real people? The answer shapes everything: architecture, model selection, evaluation strategy, deployment.
RAG pipelines & AI assistants
Document ingestion, chunking strategies, hybrid retrieval (dense + sparse), reranking, citation grounding, and streaming generation. I build the full retrieval stack, not just a LangChain quickstart. Includes evaluation loops so you know when retrieval quality degrades before your users do.
Multi-agent systems
Coordinated agent workflows using LangGraph and PydanticAI: tool-calling, consensus engines, supervisor patterns, and human-in-the-loop checkpoints. I design agent architectures where failure is recoverable, not catastrophic. Every agent gets structured output validation, timeout handling, and trace-level observability.
LLM integration & orchestration
Model routing across providers (OpenAI, Anthropic, Mistral, open-source), automatic fallback chains, response caching, token budget caps, and streaming multiplexing. I treat LLM calls like any other infrastructure dependency: with retries, circuit breakers, and cost dashboards.
AI product engineering
Taking AI features from prototype to production: prompt injection guardrails, hallucination detection, A/B evaluation frameworks, structured output validation, and full CI/CD with model regression tests. The gap between a working notebook and a production system is where most AI projects die. That is exactly where I operate.
Computer vision & NLP pipelines
Document understanding (OCR + layout analysis), image classification, named entity recognition, and custom text classification. When the problem requires specialized models beyond general-purpose LLMs, I build and deploy the appropriate pipeline with proper evaluation and monitoring.
Production AI, not demos
What separates production-ready AI from impressive demos
Most AI consultants stop at the POC. I start there. Here is what production AI actually requires, and what I build into every system from day one.
Guardrails & safety
Prompt injection detection, output validation against structured schemas, PII filtering, hallucination scoring, and content policy enforcement. Production LLMs need defense in depth, not a single system prompt hoping for the best.
Observability & tracing
Every LLM call gets traced end-to-end with Langfuse or OpenTelemetry: latency, token usage, retrieval scores, output quality metrics. When something breaks at 2 AM, you need trace-level debugging, not log-level guessing.
Cost management
Model routing that sends simple queries to fast cheap models and complex ones to capable expensive models. Response caching, semantic deduplication, token budget caps per user/org. I have seen LLM bills go from manageable to catastrophic in a week. I build the controls before that happens.
Real deployment
Docker containers, CI/CD pipelines with model regression tests, health checks, autoscaling, and rollback procedures. Not a Streamlit app on someone's laptop. Production AI means the system keeps running correctly when you are not watching.
Production-grade projects
AI systems I designed and built end-to-end
These are not theoretical architectures or slide decks. These are production-grade systems I designed, built, and tested under real conditions, with full infrastructure, observability, and security in place.
Frequently asked questions
