Durable AI ResearchTemporal on Azure Kubernetes Service checking…

Generate a durable AI research report

Submit a topic and Temporal orchestrates a multi-step workflow β€” resilient to pod crashes, retries and transient failures β€” running on Azure Kubernetes Service.

⚑
Try the Temporal features playground Interactively explore durable timers, automatic retries, queries, signals, cancellation and the event-history log β€” no API key needed.
Open playground β†’

New research request

Recent requests

πŸ“Š Temporal message processing Open full dashboard β†—

Real-time view of activity execution, task-queue polling and gRPC traffic, scraped from Temporal and the worker by Prometheus and visualised in Grafana. Auto-refreshes every 10s.

πŸ—οΈ Solution design β€” how it works

This is a durable AI research pipeline. A request flows through a Temporal workflow that orchestrates four activities. Because Temporal persists every step, the process survives pod crashes, restarts, and transient API failures β€” it resumes exactly where it left off instead of starting over.

Architecture

🌐 Browser UIthis page
β–Ό  HTTP (LoadBalancer Β· port 80)
βš™οΈ API serverFlask Β· serves UI + REST
β–Ό  gRPC (port 7233) β€” start workflow
⏱️ Temporal serverschedules & persists state
β–Ό β–²  polls task queue Β· records every step
πŸ› οΈ Workerruns workflow + activities
β–Ό
🧠 OpenAILLM content
πŸ“„ ReportLabPDF
πŸ—„οΈ PostgreSQLmetadata
πŸ”” Notifycompletion

The workflow (4 durable activities)

1
LLM call β€” sends your prompt to OpenAI (GPT-4o) to generate research content. Retries 3Γ— with exponential backoff Β· 60s timeout
2
Create PDF β€” renders the content into a formatted PDF report with ReportLab. Retries 2Γ— Β· 30s timeout
3
Store metadata β€” writes report ID, user, prompt & file path to PostgreSQL. Retries 2Γ— Β· non-fatal if it fails
4
Send notification β€” signals completion (e.g. email hook). Lenient retry Β· optional

Why Temporal (durable execution)

  • Crash-proof: if the worker pod dies mid-run, Temporal replays history and continues from the last completed activity β€” no lost work, no duplicate LLM calls.
  • Automatic retries: transient errors (timeouts, 5xx) are retried per-activity with backoff, without custom retry code.
  • Visibility: every step is recorded; you can query status & results by workflow_id.
  • Separation of concerns: the workflow defines what happens and in what order; activities do the actual I/O (LLM, DB, files).

Deployment (Azure Kubernetes Service)

APIDeployment + LoadBalancer (public IP)
WorkerDeployment β€” executes the workflow
TemporalDeployment β€” orchestration engine
PostgreSQLStatefulSet + persistent disk
ConfigConfigMap (hosts) + Secret (keys)
RegistryAzure Container Registry image
MonitoringPrometheus scrape + Grafana dashboard

REST API: POST /api/workflows/generate-report Β· GET /api/workflows/<id>/status Β· GET /api/workflows/<id>/result Β· GET /health