CV — David Cockson

Technical Projects

control — Self-Hosted LLM Platform [case study]

Solo-built v2 in 10 days, spec → live: 12 Docker services across 2 hosts plus cloud, 100% IaC (Terraform + Ansible, S3-backed state), 518 pytest tests at cutover
Crash-safe filesystem job queue via atomic shutil.move transitions (_queue → _active → _completed) — no broker, no message loss; worker re-queues stranded jobs on restart
End-to-end SSE streaming from FastAPI through the worker to a TypeScript React 18 UI; hybrid RAG over Qdrant vectors + a hand-rolled Neo4j knowledge graph
Multi-machine Ollama routing (Contabo VPS + homelab GPU) with explicit cloud fallback (Groq, Gemini, Anthropic) via FastMCP; routes by model: field
Zero open inbound ports — Cloudflare Tunnel outbound-only + Tailscale mesh, secrets resolved at runtime from Infisical; full OpenTelemetry → Grafana Cloud (Loki/Mimir/Tempo) telemetry with Discord alerting

maps to → FastAPI, React/TypeScript, SSE, Terraform, Ansible, hybrid RAG, Qdrant, Neo4j, MCP, Cloudflare Zero Trust, OpenTelemetry, Grafana

EvalUI — Dual-Backend LLM Observability [live] [repo]

Next.js 16 dashboard on Vercel fanning a single OpenTelemetry source out to Langfuse and Arize Phoenix, normalised to a shared 4-stage blueprint and raced side-by-side
Independent Claude Haiku 4.5 judge scoring Gemini 2.5 Flash generations on DeepEval Faithfulness, Contextual Precision, Answer Relevancy, and Hallucination
ISR (revalidate=60) + tag-based cache invalidation to stay inside Hobby-tier rate limits; a <canvas> latency replay driven by real span timings
Built solo in a single day; deployed behind Cloudflare Zero Trust

maps to → Next.js 16, OpenTelemetry, Langfuse, Arize Phoenix, DeepEval, LLM-as-judge, Vercel, TypeScript

EARS Specs — VS Code Extension [marketplace] [repo]

Published extension for writing requirements in EARS notation for spec-driven development — live on the VS Code Marketplace and Open VSX (installs in VS Code, Cursor, VSCodium)
Syntax highlighting, an auto-classifying sidebar across the 5 EARS archetypes, a spec scaffolder, and Tab-completion snippets; JavaScript, ~30 KB, CI release pipeline via GitHub Actions

maps to → JavaScript, VS Code extension API, CI/CD, GitHub Actions, spec-driven development

Double Diamond — VS Code Extension [marketplace]

Published extension bringing the four-phase Double Diamond design process into the editor: an idea state machine, a Kanban webview, and Obsidian library export

maps to → TypeScript, VS Code extension API, webviews, design-process tooling

Terminalz — Terminal Multiplexer for AI Agents

Desktop multiplexer for running multiple coding agents at once: a cover-flow layout that keeps every session live — Tauri 2 / Rust, xterm.js v6, portable-pty, TypeScript
Process-type detection via /proc colour-codes Claude Code, Gemini CLI, and SSH panes at a glance; built EARS-spec-first with dedicated QA passes

maps to → Rust, Tauri 2, TypeScript, xterm.js, systems programming, IPC

MapIt + MappitHills — Geospatial Rendering

MapIt: Python CLI + web app rendering OpenStreetMap data (Overpass API) as animated SVG/HTML across 4 aesthetic modes including laser/G-code output; Overpass caching/retry, SSE progress streaming, result caching, Docker, 105 tests — built, deployed, and documented in a day
MappitHills: GPX walking-route renderer over real 3D terrain (MapLibre-GL + Mapzen Terrarium tiles), gradient-coloured by ascent rate with a vertical-exaggeration slider; Flask backend, Docker — built solo in half a day the following day

maps to → Python, geospatial, Overpass API, SVG/Canvas, MapLibre-GL, Flask, Docker

vault-runner — Self-Hosted LLM Job Runner

File-driven job queue: Markdown files with YAML frontmatter move through _queue → _active → _completed — Syncthing propagates between laptop and VPS, results land in Obsidian automatically
Multi-machine model routing: Contabo VPS (Qwen2.5 14B local) + Windows gaming laptop (Gemma 4:26b via authenticated Cloudflare Tunnel) — jobs route by model: field; cloud fallback via Groq, Gemini, and OpenRouter
Five job types: text, vision, staged checklist, chain pipeline, chain_planner (LLM generates and executes its own step sequence)
Chain actions: Tavily web search, URL fetch + defuddle, GitLab push + MR creation, CI pipeline polling
FastAPI + HTMX web UI with live SSE output streaming, job cancellation, template picker, and vault search, served via Cloudflare Tunnel
MCP / MemPalace semantic memory: every past job output is indexed; new jobs inject relevant context with one YAML flag (use_memory: true)
76+ pytest tests, GitLab CI/CD pipeline (Bandit SAST, pip-audit), auto-deploy on merge
Full observability: OpenTelemetry spans per job and per LLM call → Tempo; Langfuse @observe() decorators for LLM-specific tracing

maps to → Python, FastAPI, HTMX, SSE, LLM orchestration, multi-machine routing, MCP, OpenTelemetry, Langfuse, pytest, GitLab CI/CD, agentic systems

Monitoring & Observability

Dual observability pipeline: OpenTelemetry → Tempo (infrastructure traces, job duration, token counts) and Langfuse (LLM-specific traces — per-model token usage, chain sessions, per-call latency)
Grafana Alloy → Grafana Cloud (eu-west-2) for metrics shipping; Grafana dashboards across homelab and Contabo VPS
Full stack: Prometheus + Node Exporter + Grafana across homelab and Contabo VPS
Diagnosed silent cAdvisor failure (cgroup v2+ non-standard Docker root), pivoted to Telegraf Docker socket API
Published Grafana community dashboard (ID 25012)
Discord alerting via webhook — job completion and failure notifications from the LLM runner

maps to → monitoring, observability, OpenTelemetry, Tempo, Langfuse, Prometheus, Grafana, incident diagnosis, distributed tracing

Linux & Containerisation

Proxmox hypervisor running Ubuntu VM with 20+ Docker containers
Services: Plex, Immich, Gitea, SearXNG, Nginx Proxy Manager, Portainer, Syncthing, Langfuse stack (ClickHouse, MinIO, PostgreSQL)
Multi-stage Docker builds (Node/Vite → nginx:alpine) for project deployments
systemd service authoring for production services: LLM runner poller (runbook.py) and web UI (web.py) with auto-restart; Tailscale metrics proxy
Syncthing as a file-sync transport layer — propagates job queue files between laptop and VPS, results land in Obsidian automatically
Linux Mint as primary dev environment

maps to → Linux, Docker, containerisation, virtualisation, systemd, process management, Syncthing

Cloud Infrastructure

AWS: self-taught runbook methodology through 23 reps — click-ops → CLI → scripted automation → self-healing fleet (3 EC2 instances)
Static and containerised app deployments to EC2 via multi-stage Docker builds (Node/Vite → nginx:alpine)
Contabo VPS (Ubuntu 24.04, 8 vCPU, 24GB): production LLM runner stack — Ollama Qwen2.5 14B, FastAPI web UI, OTel agent, systemd-managed services, Cloudflare Tunnel (zero-trust public endpoints without open ports)
Multi-machine Ollama routing: Contabo (Qwen2.5 14B) + Windows gaming laptop (Gemma 4:26b via authenticated Cloudflare Tunnel) — jobs route automatically by model field
Cloud API routing: Groq (Llama 3.3 70B), Gemini 2.5 Flash, OpenRouter (Qwen3 32B) as fallback/specialist runners
Multi-provider experience: AWS, Contabo, Cloudflare, GCP Skills Boost

maps to → AWS, EC2, CLI, multi-cloud, VPS management, Cloudflare Tunnel, zero-trust networking, multi-machine orchestration

Python Development

vault-runner: production LLM job runner — FastAPI + HTMX web UI with live SSE streaming, file-based queue (no broker), multi-machine model routing, job cancellation
Multi-job-type system: text, vision, staged checklist, chain pipeline, chain_planner (LLM generates and executes its own steps)
Chain actions: Tavily web search, URL fetch + defuddle, GitLab push + MR creation, CI pipeline polling
76+ pytest tests with GitLab CI/CD pipeline (Bandit SAST, pip-audit); auto-deploy on merge
OpenTelemetry instrumentation — every job is a trace, every LLM call is a span; Langfuse @observe() decorators for LLM-specific tracing
Multi-provider LLM client: Ollama (local), Groq, Gemini, OpenRouter — unified routing via config
MCP client integration (MemPalace) — semantic memory over all past job outputs injected into prompts
Code kata system: self-designed, multi-domain reps (AWS, Linux, networking); scripting across homelab and cloud automation

maps to → Python, FastAPI, HTMX, SSE, pytest, CI/CD, OpenTelemetry, LLM orchestration, MCP, agentic systems, automation

Terraform & IaC

Runbook self-learning methodology: 4 reps completed, committed to pinned infra-practice repo
Progression: single EC2 → User Data + IAM + CloudWatch → ALB + HTTPS → full production stack (ASG + Launch Template) in a single terraform apply
GCP Skills Boost badges (Terraform, Kubernetes, Google Cloud Network)

maps to → Terraform, IaC, HCL, ALB, ASG, GCP exposure

Networking & Access Control

Tailscale mesh networking across homelab, laptop, phone, and Contabo VPS
Nginx Proxy Manager as reverse proxy for all services
DNS management across multiple custom domains (Cloudflare)
Labelled physical network infrastructure with structured cabling and surge protection
Tailscale metrics proxy: custom Python service exposing network metrics to Prometheus

maps to → networking, DNS, reverse proxy, VPN/zero-trust, infrastructure documentation

Volunteer — Geeks for Social Change (GFSC)

Drafted initial observability strategy (Uptime Kuma → Node Exporter → Grafana Cloud) collaboratively in HedgeDoc
Iterated infrastructure review based on community maintainer feedback
First external contribution: committed final technical proposals and system documentation to the organisation's public GitHub repo

maps to → infrastructure mapping, Git collaboration, technical documentation, systems discovery, open-source community

Professional Experience

Compliance Manager — Gambling Commission of Great Britain

July 2024 – Present

Assessment of regulated gambling companies including casinos, betting shops, bingo, AGCs (in-person) and remote casinos, sportsbooks, bingo and B2B software development companies (online)
Led all 2025 Software Development Licence assessments, including ISO 27001 reviews, Change Management evaluations, policy and procedure assessments, and governance interviews with C-level executives (CTO/CEO), Heads of Departments and SMEs
Managed end-to-end incident response—from real-time triage to root-cause investigation—for complex software failures. Evaluated active incident reports and sensitive disclosures, gathered critical system context, and escalated high-severity findings to executive committees to enforce necessary technical remediations.
Supported cross-functional technical initiatives and unplanned operational workstreams, managing highly sensitive data escalations and critical internal risk disclosures.
Utilise Microsoft Copilot, SharePoint and proprietary software to streamline regulatory reporting and maintain audit trails for complex assessment workflows

Regulatory Compliance Assurance Manager — William Hill (888 Holdings)

May 2023 – December 2023

Recruited, trained and directed an Assurance team across Marketing Compliance, Technical Compliance and VIP/HVC schemes for 3 business units (UK & Ireland, International, US) spanning 22 regulated markets
Established procedural frameworks and authored the Risk Matrix for the Group Assurance department; planned and executed all non-AML and SG assurance testing for 2023
Coordinated operations for a 5-person international data and monitoring team and a 19-person testing team (Manila), providing task allocation and performance feedback
Integrated Assurance into group regulatory and controls mapping with external entities including KPMG; authored the framework and supported delivery of RTS testing framework automation and the GB annual assurance statement
Developed and delivered specialist training (Assurance Testing, GB Regulatory Actions, 3rd Party Risk Analysis, Marketing Compliance)

Compliance Officer — Allwyn UK (National Lottery)

September 2022 – March 2023

Authored technical Business Requirement Documents (BRDs) and facilitated critical operational handover workshops with legacy service providers.
Architected and deployed the foundational system monitoring frameworks and operational control registers for the platform transition.
Advised Marketing, Retail and Product Development teams on technical implementation and regulatory alignment as a Subject Matter Expert
Managed tier-one technology vendor relationships (Scientific Games) and coordinated international infrastructure efficiency (Net Zero) initiatives.
Directed a joint Machine Learning development project with Oxford University, bridging the gap between academic AI models and commercial product deployment.

Previous Roles

Regional Regulatory Compliance Manager | Betway Group (Nov 2021 – Apr 2022) Led cross-organizational incident response on a partner platform (West Ham), mitigating risk by analyzing third-party access logs to confirm zero public exposure. Managed an international team of 4, driving time-critical operational shifts—including rapid documentation overhauls and emergency resource mobilization to the Netherlands—to unblock strict licensing deadlines and ensure business continuity.
Legal & Compliance Officer — PlayAttack Affiliates Jul – Sep 2021 Marketing compliance across Sweden, MGA and Romania. Affiliate monitoring, licence applications, UKGC onboarding.
Marketing Compliance Analyst — Pokerstars (Flutter International) Nov 2020 – Jul 2021 Marketing compliance across 15+ regulated markets, 3 gaming verticals. Created guidance, training and regulatory risk matrices.
Compliance Officer — Genesis Global Limited Dec 2018 – Nov 2020 Managed emergency remediation following a mandated operational suspension and severe external review. Executed rapid gap analysis and rebuilt critical monitoring frameworks, achieving the second-fastest operational reinstatement on record—and the fastest historically for an escalation of that severity. Additionally drove new platform launches and regulatory reporting automation. (Previously Risk, Payment & Fraud Analyst).
Case Handler — HSBC (PPI & PBA Departments) Aug 2013 – Aug 2017 Complaint investigation, documentation review across proprietary systems, procedure improvement, colleague training.