📐 Solution design — how it works

This application generates AI research reports as a durable workflow. Even if a pod crashes, a node reboots, or the LLM API times out mid-run, the work resumes exactly where it left off — nothing is lost and nothing is duplicated. This page explains the architecture, the components, and the request lifecycle end to end.

☁️ Azure Kubernetes Service ⏱️ Temporal durable execution 🐍 Python · Flask 🤖 OpenAI GPT-4o 🐘 PostgreSQL 📊 Prometheus · Grafana

Overview Architecture Components Request lifecycle Why it's durable High availability Observability Security

The problem this solves

Long, multi-step AI jobs are fragile on plain web servers

A single research report requires several steps that each take time and can fail independently: call an LLM (slow, rate-limited, occasionally errors), turn the result into a PDF, persist metadata to a database, and notify the user. On a normal stateless web server, if the process restarts or crashes anywhere in the middle, the whole job is lost — the user gets nothing, and a retry re-runs steps that already succeeded (wasting tokens and money).

Temporal solves this by turning the multi-step job into a workflow whose every step is recorded in an append-only event history. If anything crashes, Temporal replays that history on another worker and continues from the exact point of failure — steps that already completed are not re-executed. The whole thing runs on a managed Azure Kubernetes Service cluster with redundant, self-healing pods.

💡

In one sentence: the app is a Flask front end that hands work to a Temporal workflow, which orchestrates durable activities (LLM → PDF → DB → notify) so the job always finishes exactly once, even across crashes.

Architecture at a glance

Everything runs as pods inside one AKS namespace

flowchart TB
    User([👤 User / Browser])
    subgraph Azure["☁️ Azure"]
      DNS["🌐 Public DNS + TLS
temporal-ai-app...cloudapp.azure.com"]
      subgraph AKS["⎈ Azure Kubernetes Service — namespace: temporal-ai"]
        ING["🔀 ingress-nginx
+ cert-manager (Let's Encrypt)"]
        API["🖥️ API pods ×2
Flask · api_server.py
serves UI + REST"]
        WK["⚙️ Worker pods ×2
worker.py
runs workflows + activities"]
        TS["⏱️ Temporal server
auto-setup"]
        PG[("🐘 PostgreSQL
StatefulSet + PVC")]
        PROM["📈 Prometheus"]
        GRAF["📊 Grafana"]
      end
      ACR["📦 Azure Container Registry"]
    end
    OAI["🤖 OpenAI GPT-4o API"]

    User --> DNS --> ING
    ING -->|"/"| API
    ING -->|"/grafana"| GRAF
    API -->|"start / signal / query"| TS
    TS -->|"dispatch tasks"| WK
    WK -->|"LLM activity"| OAI
    WK -->|"store metadata"| PG
    TS --- PG
    WK -.->|"metrics"| PROM
    TS -.->|"metrics"| PROM
    PROM --> GRAF
    ACR -.->|"pull image"| API
    ACR -.->|"pull image"| WK

Solid arrows = request/data flow · dotted arrows = image pulls & metrics scraping

Components

What each pod does and which source file it maps to

🖥️

API serverapi_server.py · 2 replicas

Flask app that serves the web UI and the REST API. It does not run business logic itself — it starts Temporal workflows and reads their state via signals/queries. Stateless, so it scales horizontally.

⚙️

Workerworker.py · 2 replicas

Polls the research task queue and actually executes the workflow code and activities. This is where the LLM call, PDF creation, DB write and notification run. Any worker can pick up any task.

⏱️

Temporal servertemporalio/auto-setup

The orchestration engine. Stores each workflow's event history, dispatches tasks to workers, manages durable timers, retries and signals. The brain that makes "exactly-once" possible.

🐘

PostgreSQLStatefulSet + PVC

Persistent store. Backs Temporal's event history and holds the app's own reports metadata table. Data survives pod restarts via an Azure managed-disk volume.

🧩

Workflow & activitiesworkflow.py · activities.py

The actual job definition. GenerateReportWorkflow orchestrates four activities: llm_call → create_pdf → store_report_metadata → send_notification, each with its own retry policy and timeout.

📊

Prometheus + Grafanamonitoring.yaml

Prometheus scrapes Temporal & worker metrics; Grafana visualizes throughput, latency and message processing. Embedded directly into the app under /grafana.

Request lifecycle

What happens when a user clicks "Generate report"

Browser → API The UI POSTs the prompt to /api/workflows/generate-report on a Flask API pod.

API → Temporal: start workflow The API calls client.start_workflow(GenerateReportWorkflow, ...) on the research task queue and immediately returns a workflow_id — it does not wait for the job to finish.

Temporal → Worker: dispatch Temporal records "workflow started" in history and hands the first task to a free worker pod.

Activity 1 — LLM call retry ×3 Worker calls OpenAI GPT-4o. If it times out or errors, Temporal retries automatically with backoff. The successful result is written to history.

Activity 2 — Create PDF retry ×2 The model output is rendered to a PDF file.

Activity 3 — Store metadata PostgreSQL Report id, filename, user and prompt are persisted to the reports table.

Activity 4 — Notify A completion notification is sent. Workflow records "completed" and returns the final result string.

Browser polls status Meanwhile the UI polls /api/workflows/<id>/status and /result until the workflow reports COMPLETED, then shows the report.

⚡

See it live: the Features playground runs a self-contained version of this — with deliberate failures — so you can watch retries, durable timers, signals and the full event history in real time.

Why it's durable

The Temporal capabilities this design relies on

Capability	What it gives this app
Event history	Every step is appended to a durable log in PostgreSQL. A crashed workflow is replayed from this log on another worker and continues from the exact failure point — completed steps never re-run.
Automatic retries	Each activity has a `RetryPolicy`. Transient LLM/DB/network failures are retried with exponential backoff without any custom retry code.
Durable timers	Sleeps and timeouts survive process restarts — a workflow can wait minutes, hours or days and resume correctly even if every pod was replaced in the meantime.
Signals	External events (e.g. a human approval) are delivered into a running workflow durably, pausing it until the signal arrives.
Queries	The current state of a running workflow can be read at any time without affecting it — this powers the live progress UI.
Exactly-once	The combination above guarantees the end-to-end job completes once and only once, even under crashes, redeploys or node failures.

High availability

How the deployment stays up during failures and upgrades

Redundancy & self-healing

2 replicas each for API and worker — one pod can die with no outage.
Liveness & readiness probes restart unhealthy pods and keep traffic off pods that aren't ready.
PodDisruptionBudgets (minAvailable: 1) keep at least one pod serving during node drains and cluster upgrades.
Kubernetes reschedules any failed pod automatically.

Zero-downtime deploys

Rolling updates with maxUnavailable: 0, maxSurge: 1 — a new pod becomes healthy before an old one is removed.
Workers are stateless against the task queue, so a redeploy never loses in-flight workflows — Temporal simply redispatches their next task to a surviving worker.
Right-sized CPU/memory requests so replicas always schedule.

⚠️

Known single points of failure: the cluster currently runs on a single AKS node, and postgres / temporal are single-replica. Scaling the node pool to 2+ nodes is the main step to also survive a node failure. See the Costs page for the trade-offs.

Observability

How you can see what the system is doing

Temporal and the workers expose Prometheus metrics; Prometheus scrapes them and Grafana visualizes throughput, task latency and message-processing rates. The dashboard is embedded straight into the app so you never leave the experience.

Live metrics — the home page embeds the Grafana dashboard under /grafana.
Workflow history — the playground renders the raw append-only event log of any workflow.
Health endpoint — /health backs the Kubernetes probes.

Security & configuration

How secrets and traffic are handled

TLS everywhere — public traffic is HTTPS, terminated at ingress-nginx with a Let's Encrypt certificate auto-renewed by cert-manager.
Secrets, not code — the OpenAI key and DB password live in a Kubernetes Secret (injected as env vars), never baked into the image or source.
Config separation — non-secret settings (hosts, ports, model name) live in a ConfigMap.
Image supply chain — the app image is built and stored in a private Azure Container Registry and pulled by the pods.