This application generates AI research reports as a durable workflow. Even if a pod crashes, a node reboots, or the LLM API times out mid-run, the work resumes exactly where it left off โ nothing is lost and nothing is duplicated. This page explains the architecture, the components, and the request lifecycle end to end.
Long, multi-step AI jobs are fragile on plain web servers
A single research report requires several steps that each take time and can fail independently: call an LLM (slow, rate-limited, occasionally errors), turn the result into a PDF, persist metadata to a database, and notify the user. On a normal stateless web server, if the process restarts or crashes anywhere in the middle, the whole job is lost โ the user gets nothing, and a retry re-runs steps that already succeeded (wasting tokens and money).
Temporal solves this by turning the multi-step job into a workflow whose every step is recorded in an append-only event history. If anything crashes, Temporal replays that history on another worker and continues from the exact point of failure โ steps that already completed are not re-executed. The whole thing runs on a managed Azure Kubernetes Service cluster with redundant, self-healing pods.
Everything runs as pods inside one AKS namespace
flowchart TB
User([๐ค User / Browser])
subgraph Azure["โ๏ธ Azure"]
DNS["๐ Public DNS + TLS
temporal-ai-app...cloudapp.azure.com"]
subgraph AKS["โ Azure Kubernetes Service โ namespace: temporal-ai"]
ING["๐ ingress-nginx
+ cert-manager (Let's Encrypt)"]
API["๐ฅ๏ธ API pods ร2
Flask ยท api_server.py
serves UI + REST"]
WK["โ๏ธ Worker pods ร2
worker.py
runs workflows + activities"]
TS["โฑ๏ธ Temporal server
auto-setup"]
PG[("๐ PostgreSQL
StatefulSet + PVC")]
PROM["๐ Prometheus"]
GRAF["๐ Grafana"]
end
ACR["๐ฆ Azure Container Registry"]
end
OAI["๐ค OpenAI GPT-4o API"]
User --> DNS --> ING
ING -->|"/"| API
ING -->|"/grafana"| GRAF
API -->|"start / signal / query"| TS
TS -->|"dispatch tasks"| WK
WK -->|"LLM activity"| OAI
WK -->|"store metadata"| PG
TS --- PG
WK -.->|"metrics"| PROM
TS -.->|"metrics"| PROM
PROM --> GRAF
ACR -.->|"pull image"| API
ACR -.->|"pull image"| WK
What each pod does and which source file it maps to
Flask app that serves the web UI and the REST API. It does not run business logic itself โ it starts Temporal workflows and reads their state via signals/queries. Stateless, so it scales horizontally.
Polls the research task queue and actually executes the workflow code and activities. This is
where the LLM call, PDF creation, DB write and notification run. Any worker can pick up any task.
The orchestration engine. Stores each workflow's event history, dispatches tasks to workers, manages durable timers, retries and signals. The brain that makes "exactly-once" possible.
Persistent store. Backs Temporal's event history and holds the app's own reports
metadata table. Data survives pod restarts via an Azure managed-disk volume.
The actual job definition. GenerateReportWorkflow orchestrates four activities:
llm_call โ create_pdf โ store_report_metadata โ send_notification,
each with its own retry policy and timeout.
Prometheus scrapes Temporal & worker metrics; Grafana visualizes throughput, latency and message
processing. Embedded directly into the app under /grafana.
What happens when a user clicks "Generate report"
POSTs the prompt to /api/workflows/generate-report on a Flask API pod.client.start_workflow(GenerateReportWorkflow, ...) on the research
task queue and immediately returns a workflow_id โ it does not wait for the job to finish.reports table./api/workflows/<id>/status and /result until the
workflow reports COMPLETED, then shows the report.The Temporal capabilities this design relies on
| Capability | What it gives this app |
|---|---|
| Event history | Every step is appended to a durable log in PostgreSQL. A crashed workflow is replayed from this log on another worker and continues from the exact failure point โ completed steps never re-run. |
| Automatic retries | Each activity has a RetryPolicy. Transient LLM/DB/network failures are retried with
exponential backoff without any custom retry code. |
| Durable timers | Sleeps and timeouts survive process restarts โ a workflow can wait minutes, hours or days and resume correctly even if every pod was replaced in the meantime. |
| Signals | External events (e.g. a human approval) are delivered into a running workflow durably, pausing it until the signal arrives. |
| Queries | The current state of a running workflow can be read at any time without affecting it โ this powers the live progress UI. |
| Exactly-once | The combination above guarantees the end-to-end job completes once and only once, even under crashes, redeploys or node failures. |
How the deployment stays up during failures and upgrades
minAvailable: 1) keep at least one pod serving during node
drains and cluster upgrades.maxUnavailable: 0, maxSurge: 1 โ a new pod becomes healthy
before an old one is removed.postgres / temporal are single-replica. Scaling the node pool to 2+ nodes is the main
step to also survive a node failure. See the Costs page for the trade-offs.How you can see what the system is doing
Temporal and the workers expose Prometheus metrics; Prometheus scrapes them and Grafana visualizes throughput, task latency and message-processing rates. The dashboard is embedded straight into the app so you never leave the experience.
/grafana./health backs the Kubernetes probes.How secrets and traffic are handled
Secret
(injected as env vars), never baked into the image or source.ConfigMap.