Reading Risk Numbers (NNT)

Healthcare · §640

A drug cuts your risk by 50% — true sentence, useless number. Cutting a 20% risk in half is one of the biggest favours modern medicine can do you. Cutting a 0.2% risk in half is theatre. The arithmetic that tells the difference is the number-needed-to-treat: how many people take the pill for one to actually benefit. It fits on a napkin, and once it's in your head every drug ad, screening recommendation, and clinic conversation reads differently.

Know · As-needed Evidence Moderate Chapter Healthcare

Rare territory: the substance is a way of reading numbers, not something you take or do. Free, learned in about half an hour, useful for the rest of your life every time a treatment comes up. The headline-framing problem it fixes is one of the most replicated findings in medical research — people consistently agree to drugs and screenings whose actual benefit, in real numbers, is much smaller than they thought.

Three numbers describe what a treatment does in a trial. The first is the relative one — the one in the headline. If 20 people out of 100 have a heart attack without the drug and 15 out of 100 have one with it, the drug cuts the rate by a quarter. That quarter — 25% — is the relative risk reduction. It does not care what the underlying rates were; it just measures the size of the gap between them.

The second is the absolute one, and it does the actual work. The drug took the rate from 20% to 15%. That is a five-percentage-point drop. The same 25% relative reduction in a population where only 2 in 100 would have had a heart attack takes the rate from 2% to 1.5% — half a percentage point. Same drug, same biology, same trial result. Different decision.

The third number is the number-needed-to-treat, abbreviated NNT. Flip the absolute drop upside down. A five-percentage-point reduction means 1 in 20 people who took the drug avoided the heart attack — NNT 20. A half-percentage-point reduction means 1 in 200 — NNT 200. That ratio is what you carry into a clinical conversation. The first scenario is one of the biggest favours preventive medicine offers; the second is borderline at best, and that is before you have looked at the side effects.

What happens when the framing changes

The reason this matters is that the relative number and the absolute number move people in different directions. The same trial, presented two ways, produces two different decisions about whether to take the drug.

The largest pooled look at this is the Hoffmann and Del Mar systematic review in JAMA Internal Medicine. They gathered 35 studies of patients asked to estimate the benefit of common interventions — statins, mammograms, colonoscopies, chemotherapy, blood-pressure drugs — and compared those estimates to the absolute reductions published in the trials. Most patients overshot, often by a factor of ten or more. They believed the drugs and the screenings were doing much more for them than the actual numbers said Hoffmann & Del Mar 2015. The mirror finding held for harms: patients undershot how often side effects happen.

A Cochrane review by Akl and colleagues pooled 35 randomised trials of presentation format. When the same trial result was shown as a relative reduction, perceived benefit was higher than when it was shown as an absolute reduction or as a number-needed-to-treat. The effect held across clinicians and patients, across education levels, and was not erased by showing both formats side-by-side — the relative number kept dominating Akl et al. 2011.

The Stacey Cochrane meta-analysis of 105 trials of patient decision aids — leaflets and tools that include absolute risk alongside the relative number — found higher accurate-risk-perception scores, lower decisional conflict, and no increase in anxiety in patients who got the absolute framing Stacey et al. 2017. The fear that you'd scare patients by handing them real numbers turned out not to be the failure mode.

A few NNTs worth carrying in your head.

Statin after a heart attack: roughly 1 in 30 people who take it for five years avoid an early death (4S, Lancet 1994) 4S 1994.
Statin in low-risk people with elevated inflammation: roughly 1 in 95 over five years (JUPITER, NEJM 2008) JUPITER 2008.
Daily aspirin for primary prevention in people with diabetes: 1 in 91 avoid a serious vascular event over seven years — but 1 in 112 have an extra major bleed (ASCEND, NEJM 2018) ASCEND 2018.
Intensive blood-pressure control: 1 in 61 avoid a cardiac event over three years (SPRINT, NEJM 2015) SPRINT 2015.
Colonoscopy screening: 1 in 455 invited to screen avoid a colorectal cancer over ten years (NordICC, NEJM 2022) NordICC 2022.
An SSRI for moderate-to-severe depression: 1 in 7 respond who wouldn't have on placebo (Cipriani, Lancet 2018) Cipriani et al. 2018. One of the better numbers in everyday medicine.

The same drug class — statins — ranges from NNT 30 to NNT 95 depending on whose population is taking it. That is the baseline-risk effect, in one example. Nothing about the drug changed.

What the headline number isn't telling you

Four errors are common enough to be near-universal.

"A 50% reduction means half the people benefit." No. It means the treated group's event rate is half the untreated group's. If the untreated rate is 2%, the treated rate is 1%, and 1 person in 100 benefits in absolute terms. The other 99 took the pill for no measurable gain on the trial's endpoint.

"The headline number applies to me." Trials measure groups, not individuals. The number-needed-to-treat tells you that 1 person in N benefits across the whole group; it does not tell you which person. The honest individual translation is probabilistic: "taking this for the trial's duration, I have a 1-in-N chance of avoiding the event."

"NNT alone tells me whether to take the drug." Every NNT has a shadow: the number-needed-to-harm, or NNH — the count of people taking the drug for one extra serious side effect. The decision is the comparison. An NNT of 50 paired with an NNH of 200 is favourable; an NNT of 50 paired with an NNH of 30 is not. Aspirin for primary prevention sits very close to that second case ASCEND 2018.

"The endpoint in the headline is the endpoint I care about." Trials often measure composite endpoints (heart attack or stroke or cardiovascular death) to gain statistical power, or surrogate endpoints (cholesterol lowered, blood pressure lowered, tumour shrunk) because the hard endpoint takes too long to measure. NNT against a composite or a surrogate is not the same as NNT against the outcome you would notice in your life. Look at the per-component breakdown when the trial reports one.

The three questions

The working procedure for any treatment claim — a drug ad, a screening recommendation, your doctor mentioning a prescription — is three questions. They take seconds to ask and reframe almost every preventive-medicine decision you will meet.

Run the harms version of the loop in parallel: what's the absolute increase in serious side effects, over what horizon, and how does the number-needed-to-harm compare to the number-needed-to-treat. Both numbers belong in the same conversation.

For shared decision-making with a clinician, the operational ask is one sentence: "What's my absolute chance of this outcome over the next few years if I do nothing, and how much does this treatment change that?" It is a question your doctor is trained to answer; in many clinics it is one nobody asks. The conversation pulls onto numerate ground in roughly the time it takes to say it.

Where the metric itself gets in the way

NNT is a summary, not a verdict. Three cases warrant care.

Rare catastrophic outcomes. A vaccine with an NNT of 5,000 to prevent a fatal infection is not the same call as a statin with an NNT of 5,000 to prevent a single heart attack. Severity weights the calculation. When the endpoint is catastrophic and the intervention is cheap and one-off, a large NNT can still be a clear yes.

Average NNT hides subgroups. An NNT of 60 across a whole trial might be NNT 10 for the high-risk slice of that population and effectively pointless for the low-risk slice. When the trial publishes stratified numbers — by age, sex, baseline risk, or a blood marker — those are the figures that match your situation. The headline NNT is the population average; your number is the relevant subgroup's.

Time doesn't behave linearly. Some treatments load benefit early (blood-thinning right after a heart attack); others build it over years (statins for primary prevention). Multiplying or dividing NNT across time horizons is a guess unless the trial actually measured at those horizons.

What it costs to skip this

The cost shows up at both ends of the spectrum.

On one end: people take preventive drugs whose absolute benefit, if they knew it, they would have declined — and live with the side effects, the cost, the doctor's-visit drag, and the daily pill habit for years. Most adults overshoot the benefit of statins, mammograms, colonoscopies, and routine blood-pressure prescriptions by ten times or more Hoffmann & Del Mar 2015. They are not being deceived; they are reading the literature the way the literature is written.

On the other end: people decline interventions they would have benefited from. A 60-year-old after a heart attack who reads "statins only reduce risk by 25%" and quietly shrugs is reading the same sentence as the healthy 30-year-old, with no way to tell that the same drug is a one-in-thirty lifesaver in the first case and a one-in-a-hundred maybe in the second 4S 1994, JUPITER 2008. Headline framing erases that difference. Absolute framing puts it back.

And the chronic background: anxiety from scary relative-risk headlines that, in absolute terms, describe small things. "Doubles your risk of X" lands in the brain as this will happen to me when the underlying numbers say it will happen to 1 in 1,000 instead of 1 in 2,000. The defensive posture — second-guessing every food, every screen, every supplement — has a price the person paying it does not always see. People around you start to notice the hedging before you do; the household conversation about "should we be worried about X" arrives more often than it used to. The fix is not to care less. It is to read the number that tells you how much to care.

Adjacent to this: shared decision-making protocols, overdiagnosis as a systemic issue in screening medicine, the broader topic of statistical literacy in everyday life, and the individual-drug or individual-screening entries — each of which carries its own NNT discussion when the evidence is rich enough to warrant one. The Drug Facts Box format pioneered by Schwartz and Woloshin Schwartz et al. 2007 is worth looking up if you want to see what a clean ARR/NNT/NNH presentation looks like in one table.

Related in the handbook

— The PREVENT number is the kind of risk figure these literacy skills help you interpret.
— These drugs are a case study in reading a modest benefit honestly against real risk and cost.
— Breast screening's headline benefit is a perfect case for reading the absolute numbers, not just the percentage.
— Reading the real benefit numbers is exactly what an annual medication review needs: it's how you spot the pills no longer earning their place.
— Both are numeracy skills for your own care — one reads the benefit of a treatment, the other reads whether your labs actually mean anything.
— The multi-cancer blood test is a perfect case for running the numbers — impressive detection claims, unproven benefit, real false-positive harm.
— Want a real example of why the baseline number matters? PSA screening's modest absolute benefit is it.
— Asking your doctor for the number-needed-to-treat is one of the most useful questions you can bring to the visit.
— Knowing the real number-needed-to-treat is exactly the kind of question a second opinion should answer about a proposed treatment.

Substance and claimed effects

The substance of this entry is a cognitive skill: the ability to translate a relative-risk-reduction (RRR) claim into the absolute risk reduction (ARR) and the number-needed-to-treat (NNT) that it implies. NNT is the reciprocal of ARR, expressed as a count of people; if a treatment cuts a 10% baseline event rate to 8%, ARR is 2 percentage points and NNT is 50 — fifty people must take the treatment for the trial's stated duration for one to avoid the endpoint Laupacis 1988, Cook & Sackett 1995. The claimed effects of having this skill are: (1) better-calibrated decisions about preventive medications, screening tests, and elective treatments; (2) reduced susceptibility to headline relative-risk framing in news, drug advertising, and patient leaflets; (3) more substantive shared decision-making conversations with clinicians; (4) lower rates of decisional regret and lower rates of accepting low-yield interventions. The entry covers the metric itself, its arithmetic, the empirical evidence that framing changes choices, the protocol for applying it in a clinical conversation, and the failure modes where NNT is itself misleading. Scope explicitly excludes deep dives into any individual drug or screening test — those belong in their own entries — and excludes general numeracy training.

Evidence by addressing question

Mechanism

The mechanism is arithmetic, not biology. Four quantities describe the effect of a binary intervention on a binary endpoint over a fixed time window. Control event rate (CER) is the proportion of untreated people who experience the endpoint; experimental event rate (EER) is the proportion of treated people who experience it. Relative risk (RR) is EER/CER. Relative risk reduction (RRR) is 1 − RR, usually expressed as a percentage. Absolute risk reduction (ARR) is CER − EER, expressed in percentage points. Number needed to treat (NNT) is 1/ARR, rounded up to the next integer Laupacis 1988, Cook & Sackett 1995.

The clinically critical property: RRR is invariant to baseline risk; ARR and NNT are not. A drug that cuts event probability by 25% relative produces ARR of 5 percentage points (NNT 20) in a population whose baseline is 20%, but ARR of 0.25 percentage points (NNT 400) in a population whose baseline is 1%. Same drug, same biological mechanism, two orders of magnitude different clinical utility. This is why the relative number is publication-grade but not decision-grade Gigerenzer et al. 2010. The Laupacis paper that proposed NNT did so explicitly to put effect-size into a unit that maps onto clinical action — a single person who either does or does not need to take a pill, a single procedure that either does or does not need to be booked Laupacis 1988.

NNT inherits one further property worth naming: it is time-bounded. An NNT of 50 over five years is not the same statistic as an NNT of 50 over six months. The Laupacis convention is that NNT is reported for the duration of the trial that produced it; extrapolating beyond that horizon is inference, not arithmetic Cook & Sackett 1995.

Evidence

The empirical case for NNT literacy rests on three observations from the risk-communication literature.

People systematically overestimate benefits and underestimate harms. Hoffmann and Del Mar's 2015 systematic review pooled 35 studies in which patients were asked to estimate the benefit of common interventions (statins, screening mammography, chemotherapy, screening colonoscopy, antihypertensives) and compared their estimates to published trial-derived ARRs. Across the studies, the majority of participants overestimated benefit, often by an order of magnitude; only a small fraction came within the published confidence interval Hoffmann & Del Mar 2015. The mirror finding held for harms: participants systematically underestimated the absolute frequency of adverse events.

The framing format changes the decision. The Cochrane review by Akl et al. pooled 35 randomised trials comparing presentation formats (RRR, ARR, NNT, frequencies, graphical aids). RRR consistently produced higher perceived benefit and higher intent-to-treat than the same trial result expressed as ARR or NNT, in both clinicians and patients Akl et al. 2011. The effect is robust across cultures and education levels and is not eliminated by adding ARR alongside RRR — RRR dominates when both are presented.

Decision aids that include ARR/NNT improve knowledge without increasing anxiety. The Stacey Cochrane review of 105 trials of patient decision aids found higher accurate-risk-perception scores, lower decisional conflict, and no increase in distress when patients were given absolute-risk presentations and short structured probability summaries Stacey et al. 2017. The Schwartz–Woloshin "Drug Facts Box" — a black-box-style tabular display of ARR for benefit and ARR for harm, side-by-side — produced higher comprehension and more discriminating choices in a US adult sample Schwartz et al. 2007.

Concrete NNT anchors from major trials, useful for calibration:

Aspirin for primary prevention in diabetes (ASCEND, 7.4-year follow-up). Serious vascular events: 8.5% on aspirin vs 9.6% on placebo. ARR ≈ 1.1 percentage points; NNT ≈ 91. Major bleeding: 4.1% on aspirin vs 3.2% on placebo. ARR for harm ≈ 0.9 percentage points; NNH ≈ 112. Roughly one prevented event for every one extra major bleed ASCEND 2018.
Rosuvastatin for primary prevention with elevated CRP (JUPITER, median 1.9 years). Primary composite endpoint: 1.6% on rosuvastatin vs 2.8% on placebo. ARR ≈ 1.2 percentage points over ~1.9 years; NNT ≈ 83 over that horizon, roughly NNT ≈ 95 if linearly projected to five years JUPITER 2008.
Simvastatin for secondary prevention post-MI (4S, 5.4 years). All-cause mortality: 8.2% on simvastatin vs 11.5% on placebo. ARR ≈ 3.3 percentage points; NNT ≈ 30. The same drug class as JUPITER, but baseline risk an order of magnitude higher, so NNT is one-third the value 4S 1994.
Intensive blood-pressure control (SPRINT, median 3.3 years). Composite cardiovascular endpoint: 1.65%/year on intensive vs 2.19%/year on standard. NNT ≈ 61 over the trial's median follow-up SPRINT 2015.
Colonoscopy screening (NordICC, 10-year intention-to-treat). Colorectal cancer incidence: 0.98% in the invited-to-screening arm vs 1.20% in usual care. ARR ≈ 0.22 percentage points; NNT to prevent one colorectal cancer over ten years ≈ 455. NNT to prevent one CRC death over the same horizon is larger still NordICC 2022.
SSRIs vs placebo in moderate-to-severe major depression (Cipriani network meta-analysis). Pooled response rates roughly 50% on active vs 35% on placebo. NNT for response ≈ 7. One of the lower NNTs in everyday medicine, and a useful counter-anchor: not every preventive intervention is high-NNT Cipriani et al. 2018.

Protocol

The reader's working procedure when encountering a treatment claim — in a news headline, a leaflet, a drug ad, or a clinician's recommendation — is a three-question loop.

Question 1: What's the baseline risk? Without the CER, RRR is uninterpretable. Sometimes the baseline is in the article; more often it's in the trial abstract; sometimes it's in a guideline document. If the baseline cannot be located, the claim cannot be evaluated.

Question 2: What's the absolute reduction? Either compute ARR = baseline × RRR, or look up the trial's published ARR / NNT directly. The clinician-curated database at thennt.com aggregates published trial NNTs for common interventions and is the fastest single source for major drugs and screening tests, though it is one source and not a substitute for primary literature.

Question 3: Over what time horizon? NNT is meaningful only when paired with its duration. A statin's NNT of 80 over five years is not the same as NNT of 80 over one year. Patients converting trial NNTs to personal expectations should hold the time window explicit.

The same procedure runs for harms: number-needed-to-harm (NNH) from the trial's adverse-event ARRs. A treatment with NNT 50 and NNH 200 is favourable; NNT 50 and NNH 30 is not. Both numbers must be looked at together.

For shared decision-making conversations, the operational ask is: "What's my absolute risk of this outcome over the next [N] years if I do nothing, and how much does [intervention] change that?" The question forces the conversation onto an absolute footing and pulls clinical judgment into a numerate format Gigerenzer et al. 2010.

Misconceptions

Several errors are common enough to be near-universal among non-specialists.

"50% reduction means half the people on this drug benefit." No. A 50% RRR means the treated group's event rate is half the untreated group's event rate. If the untreated baseline is 2%, then 1% benefit absolutely; the other 99% derive no measurable benefit from the trial endpoint.

"The headline number applies to me personally." Population-level NNT does not identify which specific person in a treated cohort is the one who benefits; the benefit is distributed probabilistically across the group. The expected-value framing — "if I take this for five years, I have a 1-in-N chance of avoiding the event" — is the honest individual translation, with N equal to NNT.

"NNT alone tells me whether to take the treatment." NNT must be paired with NNH and with the relative severity of benefit and harm. A high-NNT intervention with a high-NNH for a trivial harm may be worth it; a low-NNT intervention with a low NNH for a serious harm may not be.

"The composite endpoint is the same as the outcome I care about." Trials commonly use composite endpoints (e.g., "major adverse cardiovascular events" combining MI, stroke, and CV death) to gain statistical power. NNT computed against the composite may be driven by the least-severe component; readers should check the per-component breakdown when the trial reports it.

"Surrogate endpoints are the same as hard endpoints." NNT against a surrogate (LDL lowered, blood pressure lowered, tumour shrunk) is not equivalent to NNT against the patient-relevant endpoint (heart attack avoided, stroke avoided, survival extended). The gap is sometimes large; the protocol that ignores it inflates apparent benefit.

Failure modes

NNT is a powerful summary but not universally appropriate. The principal failure modes:

Catastrophic-outcome rare events. When the endpoint is extremely severe and the intervention is short and cheap (childhood vaccines, anaphylaxis epinephrine), even a very large NNT can be a clear "yes." The decision calculus weights severity, not just frequency. A vaccine with NNT 5,000 to prevent a fatal infection is not equivalent to a statin with NNT 5,000 to prevent one MI.

Heterogeneous treatment effects. Average NNT across a trial population can mask large subgroup differences. A drug with population NNT 60 might be NNT 10 for a high-risk subgroup and effectively non-beneficial for a low-risk subgroup. Stratified NNTs, when reported, are more informative than the headline figure.

Time-varying effects. Some interventions front-load benefit (acute anti-coagulation post-MI); others back-load it (statins for primary prevention, where benefit grows with cumulative exposure). Linear extrapolation of NNT across years is a modelling assumption, not a fact.

All-cause vs disease-specific mortality. Many screening trials show reduced disease-specific mortality without a matching reduction in all-cause mortality, raising the possibility of competing risks, overdiagnosis, or screening-related harm offsetting the apparent benefit. NNT against the wrong endpoint understates these dynamics.

Composite-endpoint inflation. Already named above; recurrent enough to warrant emphasis. Always look at component NNTs when the trial reports them.

Stakes

The reader cost of not having this skill is documented in the same Hoffmann-Del Mar review: systematic overestimation of benefit means systematic over-consent to interventions whose absolute upside is small, plus systematic acceptance of side-effect risk that, if the absolute benefit were known, the reader would have declined Hoffmann & Del Mar 2015. The mirror-image failure is also real: a reader frightened by a headline relative-risk number for a condition whose absolute risk is low can be moved into screening or treatment of marginal utility, generating downstream cost, anxiety, and procedure-related harm.

The asymmetry between RRR and ARR in commercial communications — DTC drug advertising in the US, patient leaflets globally — means the default information environment is biased toward overstatement of benefit Gigerenzer et al. 2010. Without the literacy, the default is to absorb the bias.

Out of scope

This entry does not cover: deep dives into specific drugs or screening tests (which warrant their own entries); the broader topic of shared decision-making protocols; statistical literacy at the level of Bayesian reasoning or base-rate calibration; overdiagnosis and overtreatment as systemic phenomena; the philosophical question of what counts as a "good" endpoint.

Credibility range

Optimist case

The metric itself is mathematically defined and uncontroversial; NNT and ARR are universally taught in evidence-based-medicine curricula at the medical-school level and recommended by every major guidelines body. The empirical case that framing changes decisions is one of the most replicated findings in risk communication, with a Cochrane meta-analysis (Akl et al.) and a JAMA Internal Medicine systematic review (Hoffmann & Del Mar) underpinning the central claim that patients given RRR-only information consent to interventions they would refuse when given ARR. Decision aids incorporating ARR/NNT have a Cochrane meta-analysis behind them (Stacey et al., 105 trials) showing improved knowledge and reduced decisional conflict with no signal of distress. Operationally, the three-question protocol is trivial to learn — minutes of instruction, demonstrated retention in patient-education studies — and the cost of being wrong is documented in the over-consent literature. The optimist's strongest framing: this is one of the highest-leverage cognitive interventions available, with massive expected effect on personal healthcare decisions for cents of effort.

Skeptic case

NNT itself has known limitations: composite endpoints, surrogate endpoints, heterogeneous treatment effects, and time-bounding all complicate the headline number. A naive reader could misapply NNT just as readily as they misapply RRR — treating a single trial's NNT as transferrable to their own situation when the trial's population doesn't generalise, or comparing NNTs across trials with different endpoints. Some statisticians argue NNT obscures rather than reveals: it is sample-derived, has confidence intervals that are often skipped, and behaves badly when ARR is near zero or negative. Real-world uptake of NNT literacy has been disappointing despite three decades of advocacy; clinicians often know the concept but skip it in time-pressured visits, and patients exposed to ARR/NNT in a decision aid frequently revert to relative framing days later. Without sustained reinforcement, literacy decays. And the protocol's "ask your clinician for absolute risk" step assumes a clinician who has the number, has the time to discuss it, and is incentivised to give it — assumptions that fail in much of routine practice.

Author's call

The literacy is unambiguously net-positive at the individual reader level even if real-world implementation gaps remain. The metric's known limitations (composite endpoints, surrogate endpoints, heterogeneous treatment effects) are not arguments against learning it — they are arguments for learning it more thoroughly, since each limitation has a corresponding protocol fix (look at component endpoints, ask about hard outcomes, look for stratified NNTs). The skeptic's strongest point — that literacy decays without reinforcement — is real but not invalidating: the entry exists in part to be the reinforcement. Meta scores reflect this: high evidence (Cochrane-backed across two distinct meta-analyses), low controversy (no serious dispute about the underlying metric or the framing-effect findings), modest scores on the consequence dimensions because the substance acts indirectly via downstream decisions rather than producing a direct biological effect.

Stakeholder and incentive map

Patients and the public — Variable uptake; numeracy-low subpopulations benefit most but are least likely to encounter the literacy.
EBM educators and academic clinicians — Strong push since Sackett's group at McMaster formalised NNT. Universally taught in medical school. The protocol is institutional consensus.
Pharmaceutical industry and direct-to-consumer drug advertising — Strong incentive to publish and promote relative numbers. Headline RRR is more compelling than ARR for the same trial. US regulators require fair balance of benefit and risk in DTC ads but do not require ARR specifically. The Schwartz-Woloshin Drug Facts Box has been recommended by the FDA's Risk Communication Advisory Committee but has not been mandated.
Medical journalism — Publishes RRR by default; ARR adds words and reduces story impact. Headlines like "drug cuts risk by 50%" are common; the same headline reframed as "drug cuts risk by 0.05 percentage points" rarely runs.
Guidelines bodies (USPSTF, NICE, AHA/ACC) — Have moved toward publishing ARR/NNT alongside RRR in patient-facing summaries. USPSTF screening recommendation grades are explicitly tied to absolute-benefit estimates.
Clinicians in routine practice — Mixed. Awareness is high; time-pressured visit structure suppresses use. Decision aids reduce the friction but require system-level adoption.

Population variability

The cognitive skill is universally applicable, but the magnitude of benefit varies with starting conditions.

Numeracy. Low-numeracy readers benefit most from being moved to absolute-risk framing; they are the population in which RRR most reliably misleads. Decision aids using ARR with frequency-format expressions ("8 out of 1,000") improve comprehension more than percent-format expressions in this group Akl et al. 2011.

Baseline risk. Higher-baseline-risk patients see the same RRR translate into a lower NNT, often dramatically. The same statin that is borderline-justifiable in a low-risk 35-year-old (NNT in the hundreds) is high-yield in a post-MI 60-year-old (NNT ~30) 4S 1994.

Decision context. One-off decisions (a single screening test, a single elective surgery) have higher cognitive return on literacy than chronic-dosing decisions where the same protocol is rehearsed monthly. But the chronic decisions accumulate value over years; the analysis is symmetric in expected utility, asymmetric in effort.

Trial-to-patient extrapolation. NNT is computed in the trial's sampled population; transferring it to a reader whose age, sex, comorbidities, or baseline risk differ from the trial cohort introduces inference error. Literate readers handle this explicitly; the protocol's third question (time horizon) generalises naturally to a fourth (is my baseline risk like the trial's?).

Knowledge gaps

Three gaps are worth naming.

Persistence of literacy. Most decision-aid trials measure outcomes immediately after exposure. Few measure whether literacy persists, whether downstream decisions remain ARR-anchored months or years later, or whether decay can be slowed by repeated brief reinforcements. The Stacey Cochrane meta-analysis acknowledges this; outcome timeframes in the included trials skew short.

Long-run outcomes of NNT-informed decisions. The chain from "patient saw ARR/NNT" through "patient made a different decision" through "patient experienced a different long-run outcome" has been established at each link separately but not end-to-end in a single cohort with long follow-up.

System-level implementation. What it would take to move the default information environment — drug ads, news coverage, patient leaflets, electronic medical records — from RRR-first to ARR-first is a policy question that the clinical evidence base cannot answer.

Substance framing. The brief asked for the metric, its role in preventive interventions, its role in shared decision-making, and its relationship to headline relative-risk claims. All four are covered. The article is structured as a literacy entry rather than a substance entry; the action verb is know and the cadence is as-needed (triggered when a treatment claim is encountered).

Score calls worth flagging.

Longevity 2. Held back from 3 because the mortality effect is mediated entirely through downstream behaviour change — the literacy itself produces nothing biological. The magnitude depends on which interventions the reader subsequently accepts or declines, and the population-level distribution of those decisions has not been measured end-to-end (see research §6 knowledge gaps).
Mood 2. Defended on the Akl Cochrane + Stacey Cochrane findings of reduced decisional conflict and the absence of distress-increase. Felt-experience anchor in the stakes section: reduced anxiety from scary relative-risk headlines. Could be argued at 1 if a reviewer wants to be conservative about whether literacy-induced calm persists.
Evidence 4 (not 5). The metric is mathematically defined and universally taught, but the literature is about framing effects on decisions rather than end-outcome RCTs of literacy interventions. 5 would require trials showing literacy translates into measured long-run health outcomes; those do not exist with the rigour required.
Health (short-term) 1. Conservative call. The mechanism — reduced exposure to side effects of low-yield drugs — is real but downstream and small at the individual level.

Section choices. No payoff section. The stakes section already covers the symmetric "what you gain when you adopt this," and a separate payoff would have been repetitive in a literacy entry where the substance produces nothing biological. Decision was to put the symmetric framing inside stakes rather than splitting it.

Separate-entry candidates surfaced during the write.

Shared decision-making — broader than NNT literacy; covers preference elicitation, decision aids, the communication side. Warrants its own medical entry.
Overdiagnosis & overtreatment — adjacent to NNT but a distinct phenomenon. Mammography, PSA, thyroid nodule, low-risk skin cancer all carry the topic separately.
Statin literacy — primary-prevention vs secondary-prevention distinction has enough internal substance to warrant a standalone entry; this entry uses statins as the calibration example, not as the topic.
Drug Facts Box / risk-communication tooling — Schwartz & Woloshin's structured presentation format deserves its own treatment.

Future-link candidates. Once written, link forward to: any specific drug entry (statins, aspirin, antihypertensives), any screening entry (mammography, colonoscopy, PSA), the shared-decision-making entry, the overdiagnosis entry.

Hard call on tone. Stakes section walks a line between motivating the literacy and slipping into wellness-skeptic territory ("everyone is overtreated"). Chose to keep the symmetric framing — overshoot and undershoot are both real, anxiety from over-framing is a third — to avoid reading as anti-medicine. The cited 60-year-old-post-MI example is there specifically to demonstrate that absolute framing is not a covert argument against statins; it is an argument for matching intervention to baseline risk.

Citations not used in the article but kept in research. The Gigerenzer 2010 BMJ paper informs the research dossier's framing-effect synthesis but the article cites the larger meta-analyses (Akl, Stacey, Hoffmann & Del Mar) instead — Gigerenzer is the conceptual antecedent rather than the load-bearing study.