National Early Warning Centre · Republic of Seloria

Five Days
to ImpactYou don't forecast the weather. You decide what to do about it.

A deep Atlantic low is five days out. You run Seloria's warning authority. Over six briefings you turn uncertain ensemble forecasts into warnings that are real objects — colour, area, message, audience, channel and cost — and you must get the whole value chain to respond: the public, the agencies who act on warnings, and the infrastructure that can fail underneath them.

A warning is not a colour, and a brilliant warning still fails if shelters aren't open, buses aren't running, or the bridge to the hospital floods. Beneath the desk sits real weather — separate physical drivers (river vs surface water, gust vs surge, flash vs prolonged rain) that can disagree — and a forecast that is sometimes badly behaved.

OBJECTIVE   Minimise total loss = hazard damage + action cost + warning burden + avoidable deaths + lost trust + inequity.
THE CHAIN   Protective outcome = public response × agency readiness × feasibility × lead time. Every link can break.
SCORING   Judged on expected-value reasoning at the moment of each call — across the ensemble, never on hindsight.
Expert mode: the forecast pathology is not named — only operational clues. Trigger probabilities and figures are shown. Training mode instead names the pathology and shows qualitative bands.

Mini tutorial starts with one region and one question: will flooding cut hospital access? Random storm deals a fresh seed from the full climatology and takes you straight into the first briefing, five days out. Daily storm gives everyone the same seed for the day, so scores are comparable; it changes at midnight UTC.

Built on real doctrine: the WMO/UNDRR warning value chain and Early Warnings for All, impact-based forecasting (hazard × exposure × vulnerability), probabilistic cost–loss decision theory, the Common Alerting Protocol, the Protective Action Decision Model, and anticipatory action. Each storm is generated from a seed you can share and replay.

T−120h
Briefing 1 of 6
Outlook phase
STORM SEED — share the seed to replay this exact storm

Seloria — operational picture

regions · warnings · infrastructure · track

Forecast diagnosis

50 members
Forecast confidence

Voices in the room

competing concerns

Regions, impacts & warnings

set level · area · message

Sectoral early actions

grouped by type

Response capacity

missions · agencies · infrastructure

Field manual — the warning desk

Everything the duty officer is assumed to already know. Read once; replay forever.

1 · What you actually do

You are not the forecaster. The ensemble arrives already computed. Your job is the harder, less glamorous one: decide what to do about it. Over six briefings, from five days out (T−120 h) to six hours out (T−6 h), you convert an uncertain forecast into warnings and anticipatory actions, then live with what verifies. You are scored on the quality of each decision at the moment you made it — given what you could have known — never on hindsight.

The objective is to minimise total loss: hazard damage + the cost of actions + the burden your warnings place on the public and economy + avoidable deaths + lost public trust + inequity in who was protected.

2 · The value chain (why a colour is not enough)

A warning only saves anyone if every link holds:

protective outcome = public response × agency readiness × feasibility × lead time

A perfectly judged Red warning protects no one if shelters aren't open, buses aren't running, the message never reached the islands, or the bridge to the hospital is already under water. Each of those is a separate link you can strengthen — or neglect. This is the WMO/UNDRR warning value chain and the Protective Action Decision Model, made mechanical.

3 · Reading the forecast

The ensemble. Fifty members, each a plausible version of the storm. The spread is the uncertainty — it is not noise to be averaged away. Early on you should ask "what would hurt us?", not "what is most likely?".

The map is a clock. The cyclone symbol is the low's analysed position, advancing toward Seloria each briefing; the solid line behind it is the observed track so far (a kinked observed track is itself a clue that the guidance has been jumping). Ahead of it, each faint line is one ensemble member's remaining track: a smooth path to its own destination plus a correlated cross-track wobble that grows with forecast range, which is why the bundle reads as spaghetti rather than a family of tidy curves. Genuinely offshore members run north parallel to the coast and exit at sea; only tracks past the cluster boundary actually cross the coastline. The shaded cone is derived from those members: at every downstream station it spans the 10–90% spread of member positions across the mean track, so it is literally the presented probabilities made spatial — widening with lead time, narrowing as briefings pass, and physically shrinking as the low advances. The dashed line is the ensemble-mean track. When the members split into distinct track families the cone is drawn as separate dashed lobes: plan against more than one scenario, not the average of them. Crucially, the cone is the possible path of the low centre, not the hazard area — severe seas are fetch-driven and can devastate the North Isles while the centre tracks elsewhere, which is why regions also carry diagonal hatching scaled to the ensemble's chance of severe impacts there, with small ▲/▼ arrows on each impact line showing how the probabilities moved since the last briefing. Hatching is the threat; the colour fill is your decision. A heavily hatched region with no warning colour is the gap you are paid to notice. Click the storm symbol for the cluster split and whether confidence has narrowed since the last briefing.

The featured diagnostic shows the 50-member spread of whichever driver currently carries the most impact salience — the River Aven hydrograph in a river event, the surge-height plume on a tide-window case, the flash-flood rainfall index when Greyspine is live, the wave or lee-wind index when the isles or Eastmarch are the story — with the current observed level on the left climbing as the event approaches. The runner-up driver sits one click away.

Confidence reflects track spread. It usually rises as impact nears — but not always (see pathologies).

Forecast pathologies. The guidance is sometimes badly behaved: a jumpy track, an under-dispersive ensemble that looks more confident than it should, a model with a known mountain wet bias, under-resolved convection, or an observation outage where confidence falls near impact. Training mode names the pathology; Expert mode gives only the operational clue.

4 · Storm regimes (no two events are alike)

Each storm's seed first picks a regime, then draws the physics inside it. You are not told which — you have to read it. The regimes are deliberately spread across the country, so "it's probably a Westshore wind event" is no longer a safe assumption:

  • Offshore wind miss — high uncertainty, often a near-miss. Mostly false-alarm management.
  • Westshore wind / surge — damaging gusts, coastal surge near high tide, ports and power.
  • Vale river flood — prolonged rain on saturated ground; the Aven responds slowly but severely.
  • Vale surface-water trap — a convective burst floods urban routes before the river peaks.
  • Greyspine flash flood — orographic convection, fast water in steep valleys, very short lead time.
  • North Isles evacuation and isolation — severe seas close the last safe ferry window; island power and telecom fail.
  • Compound event — two or more regions reach severe impact at once. Your budget will not stretch to everything.
  • Phantom threat: the guidance locks onto a coastal monster for the first three briefings, then collapses. The whole event is false-alarm management: hedge proportionately, then de-escalate visibly and with reasons.
  • Black-swan compound: a once-in-a-century hit on several regions at once. It deliberately exceeds the resources available: the test is triage, not coverage, and the debrief judges prioritisation rather than the bases you could never have covered.

Crucially, some storms are meant to be boring. Some miss. Some are mainly disruption. Some look frightening and verify weakly. That is the point: over-warning has a real cost, so a calm, well-targeted response to a near-miss is a good decision, not a timid one.

5 · Warnings are objects

A warning is colour + area + message + audience + channel — composed in Common Alerting Protocol style. Every part matters:

  • Colour (Yellow / Amber / Red) should match the severity you actually expect.
  • Area — a targeted warning to the floodplain burdens fewer people than a whole-region one, and reads as more credible.
  • Message — clarity, a concrete protective action, and naming specific groups all raise the chance people act.
  • Channel — reach depends on it, and cell broadcast can fail if the islands' telecom mast goes down.
  • Certainty / urgency — claiming "Observed" or "Immediate" days before the event reads as crying wolf and costs credibility.

De-escalation is a skill, not a retreat. Whenever you step a live warning down — including the common Yellow → no-warning case — the desk asks you to record a public reason ("risk shifted away", "gauge below forecast", and so on). A reasoned, visible downgrade preserves trust and is the professional way out of a false alarm; an unexplained one costs more trust if the hazard still verifies. Walking warnings down well is graded just as seriously as putting them up on time.

Warnings can pull actions forward (emergency activation). Sectoral actions have natural phase windows — island resupply, shelters and hospital continuity normally open from T−72. But a Red warning, or an Amber+ carrying a Prepare to leave / Evacuate now message, on the region an action protects pulls that action's availability one briefing forward at a 40% budget surcharge plus extra political cost. The warning is the commitment; the system scrambles to catch up. This mirrors real practice — a formal Red obliges agencies to stand up early — and it means an early, honest danger-to-life warning for (say) the North Isles is never a dead letter just because the ferry action hasn't opened yet.

6 · Readiness, missions and infrastructure

Warnings are received by institutions that must convert them into action. Each region's life-safety depends on a mission (e.g. floodplain evacuation, hospital continuity) delivered by several agencies. Readiness is a weakest-link chain, not an average: one unprepared agency (say, police) drags the whole mission down even if everyone else is ready. Actions raise specific agencies' readiness.

Infrastructure can fail underneath you. Substations, the Aven bridge, the hospital access road, island power and the telecom mast each have a failure risk that rises with the hazard. A failure cascades — losing the bridge undermines both evacuation and hospital continuity. You can harden some of it in advance. The map nodes are clickable and show current failure risk.

7 · Actions, resources and the warning burden

You spend four currencies: preparedness budget, political capital, public trust and public fatigue. Actions cost budget and capital; warnings impose a burden on the public and economy each briefing they're held.

Why issue a high warning early? Immediate disruption scales with proximity to impact — the costly closures bite near landfall, so a Red five days out causes far less disruption than one issued at T−6. But early warning is not free: it consumes public attention, credibility and agency bandwidth, and those costs fall away much more slowly than the disruption does. So "early Red just in case" is a real expense, not a freebie. Early warning buys lead time only if the message is specific, conditional and updated coherently — and lead time is the single biggest multiplier on whether people can actually act. If you run the budget to zero you can no longer fund sectoral actions, so spend it where the marginal value is highest.

8 · How you're scored

The debrief grades three distinct skills, because they fail independently:

  • Forecast diagnosis — did you read the hazard and the uncertainty correctly?
  • Warning decision — were your colours, areas and messages well-judged against the probabilities at the time?
  • Response chain — did you actually build the readiness and protect the people who needed it, including equitably?

An overall expected-value grade rewards decisions that were right given the ensemble. Calibration (a Brier-style score), the broken links that cost lives, and the equity of who was protected are all reported.

Beyond the £ total. National loss is dominated by the big-exposure regions (Vale, Westshore), so the debrief also reports a people-centred salience dimension — per-capita catastrophe, isolation, equity failure and infrastructure loss — and a tail-risk figure (the mean of the worst 10% of ensemble outcomes). A small region like the North Isles can produce the worst warning-chain failure of an event without ever topping the national bill, and a low-probability tail can be the call that mattered most. The run is given a title, and you earn lesson-based achievements.

9 · Modes

Expert (default) hides the named pathology and shows the underlying numbers — probabilities, marginal expected value in £m, raw readiness. Training names the pathology, softens uncertainty into bands, and gives qualitative read-outs (worth-it / costly) instead of figures. Switch on the loading screen.

10 · The mathematics under the hood

Nothing in the engine is hand-waved; every number you see is computed from a small set of hidden physical latents. If you want to play it as a calibration exercise, this is the machinery.

The storm. Each seed first draws a regime (the archetype list above, with fixed climatological frequencies), then samples six latents inside it: track, intensity, soil saturation, tide phase, convective fraction, onshore-flow strength. Regional hazard drivers are smooth functions of these, mostly Gaussian "peak" responses such as gust ∝ intensity × exp(−(track−0.40)²/2w²), so damaging wind needs a genuinely coastal track, orographic flash rain needs a deep inland one, and island seas need strong onshore flow with an offshore-leaning track. Severity per region is the maximum of its drivers.

The ensemble. Each of the 50 members is truth + bias(pathology, briefing) + noise(briefing), with noise shrinking from σ≈0.42 at T−120 to 0.03 at T−6. Every probability on the desk is a simple member count: P(impact) = fraction of members whose driver exceeds that impact's threshold. There is no second model — the map cone, the hatching, the trigger probabilities and the river plume are all the same 50 members, presented differently.

The decision standard. Actions are graded by probabilistic cost–loss logic: act when expected avoided loss exceeds cost, i.e. when the trigger probability p > C/L. The debrief replays each decision with the marginal expected value it had at the time — marginal, because benefits exhibit diminishing returns once a risk is already covered — and |EV| ≤ £3m is treated as a defensible judgement call either way.

The response model. A warning's per-briefing effect on one region is response = base(colour) × trust × lead(phase) × reach(channels) × (0.35 + 0.65·clarity) × feasibility × (1 − 0.3·fatigue) and responses accumulate across briefings as acted = 1 − ∏(1 − responseᵢ), so an early warning keeps paying even if later refined. Reach is channel-weighted (cell broadcast strongest, and it fails to the isles if the telecom mast is lost) and is now also group-specific: naming an audience whose channels you have not selected discounts reach, and the composer flags the gap (elderly residents listen to local radio and community leaders, tourists get cell broadcast, fishers follow the harbour channels). Clarity rewards concrete actions, named groups and honest certainty; feasibility collapses if you tell mobility-constrained groups to leave with nothing in place.

Burden and attention. Holding warnings costs burden ∝ population warned × colour × BURDEN(phase), where BURDEN rises from 0.30 at T−120 to 1.0 near impact (disruption bites near landfall), while the attention/credibility cost decays much more slowly (0.55 → 1.0). A region-wide Red in the first two briefings now also carries an extra credibility charge, which is exactly why "early Red just in case" is not free.

Readiness. Each life-safety mission is a weighted harmonic mean of its agencies' readiness: a weakest-link aggregate, so one unprepared agency caps the whole mission no matter how ready the rest are. Life-saving response is then capped at 0.4 + 0.6 × mission readiness.

Verification. The guidance itself is scored with Brier scores on the headline impacts (mean squared error of each briefing's probability against the 0/1 outcome), and your tail exposure is reported as CVaR: the mean loss of the worst 10% of ensemble outcomes under your final posture. The debrief also applies a regret frame to each pivotal call, pairing the decision (judged on what was knowable) with how the hazard verified, so a good decision with a bad outcome is named as bad luck and a negative-EV commitment that never met a hazard is named as a lucky escape. Black-swan runs report a triage score weighted toward the response chain, because covering every base in those events is impossible by design. From T−24 the observation desk arrives with stated confidence and siting caveats (rating curves above flood stage, radar beam blockage, a single tide gauge for a whole coast): observations are evidence, never an oracle. Rumours reaching the desk are decided in the dialogue itself — verify, correct or ignore — and mishandling them costs real trust. The forecast-diagnosis card features one ensemble plume, chosen each briefing from the driver with the most impact salience, with the runner-up one click away — all built from the same 50 members. The headline run score blends the three skill grades with decision quality (the fraction of action calls that were defensible at the EV available at the time), so heavy spending cannot buy a medal that the action ledger does not support. Eastmarch counts against you only as a false focus: a strong warning there when the lee-wind/dust hazard never verified. When the rare wake event is real, recognising it is credited like any other region.

11 · The theory this game is built on

Early Warnings for All. The WMO/UNDRR framework defines four pillars (risk knowledge, monitoring & forecasting, dissemination & communication, and preparedness to respond), and its central finding is that warnings fail at the weakest pillar, almost never at the forecast alone. The game's value chain (reach × clarity × belief × feasibility × readiness) is that doctrine made mechanical.

Impact-based forecasting. Modern services warn for what the weather will do, not what it will be: risk = hazard × exposure × vulnerability, summarised in the likelihood-by-impact matrix you now see in the warning composer. The matrix frames the call — a low-likelihood/severe-impact cell is legitimately Amber — but it never makes it.

Cost–loss decision theory. For a protective action costing C that prevents loss L, the rational threshold is p* = C/L: cheap actions against large losses justify acting at low probabilities (pre-positioning pumps), while costly disruption (evacuation, rail closure) demands higher confidence or a tighter target. Almost every "when should I commit?" tension on the desk reduces to this ratio plus lead-time decay.

The cry-wolf effect, with nuance. Repeated false alarms do erode response, but the research consistently shows the erosion is driven by unexplained or poorly targeted alarms rather than by false alarms per se. A near-miss managed with targeted warnings, honest uncertainty and a reasoned public de-escalation can leave trust intact. Hence the downgrade-reason mechanic, and hence the Phantom-threat regime, where the entire skill is the walk-down.

Anticipatory action. Acting on a forecast, before impact, is systematically cheaper per life and per pound than response after, but only inside the lead-time window, which is why action effectiveness decays with the phase you commit in, and why "lead time left on the table" is called out in the debrief.

Protective Action Decision Model & CAP. People act when a message reaches them, they understand it, believe it, and can act on it: four separately breakable links. Warnings are composed in Common Alerting Protocol style (certainty / urgency / severity / instruction / audience / channel) because that structure is what real dissemination systems consume.

The good miss. Verification practice distinguishes a forecast bust from a good decision with an unlucky outcome. You are graded on the expected value of each call given the probabilities in front of you. A defensible Red that verifies quiet, walked down with reasons, scores better than a lucky silence.

Built on real doctrine: the WMO/UNDRR warning value chain and Early Warnings for All, impact-based forecasting (hazard × exposure × vulnerability), probabilistic cost–loss decision theory, the Common Alerting Protocol, the Protective Action Decision Model, and anticipatory action.