Lab Ranges: Normal vs. Optimal

სკრინინგი · §127

Your lab report shows a number, a "normal" range, and a green check or a red flag. The range was built mechanically: take a few hundred apparently-healthy people, run the test, draw a line at the 2.5th and 97.5th percentiles. By construction, 5% of healthy people fall outside it, and on a 14-test panel about half of healthy people will show at least one "abnormal" result. The reference range is a screening filter, not a verdict on your health. A "narrower" optimal range — popular in functional-medicine circles — is sometimes right and sometimes a mechanism for converting healthy people into patients.

Know · As-needed Evidence Moderate თავი სკრინინგი

The frame matters more than any single number. Read well, you catch the iron deficiency hiding in a "normal" ferritin of 35 and ignore the borderline TSH that would have sent you down a rabbit hole. Read badly, you spend years either dismissing real symptoms because the lab said fine, or chasing every flag through retests, supplements, and specialists who profit from the cascade. Most of the work is interpretive — almost no cost, mostly the discipline of holding off on action when a single number near a cutoff doesn't carry the weight the chart's clean dichotomy suggests.

The "normal" range your lab prints is not what a thoughtful doctor's idea of health is. It is the central 95% of values from a few hundred people the lab recruited as "apparently healthy" — recruited mostly by self-report, not by deep screening. The lower bound is the 2.5th percentile of that group; the upper bound is the 97.5th. Anything between them gets a green check.

Three things follow from that. The reference group includes plenty of people with low-grade inflammation, occult deficiencies, modern-life background noise that nobody screened them for — so the range is built from a population that includes the very dysfunctions you might be trying to detect. By construction, 5% of genuinely healthy people fall outside the range. And on a multi-analyte panel — the comprehensive metabolic panel runs 14 tests, a typical wellness panel runs 30+ — the math compounds: with 14 independent tests at 5% each, the probability of at least one flagged result in a healthy person is about 51%. Run enough tests and something will turn red.

The "optimal" range is built the other way around. Instead of looking at where everyone's values sit, you look at where the values sit in the subset of people with the best outcomes — lowest disease rates, fewest symptoms, longest follow-up survival. That window is usually narrower than the reference range, sometimes much narrower, and the two camps can disagree by a factor of two or more on the same biomarker. Whether the narrower window is real or marketing depends entirely on whether anyone has actually tested it.

Where the narrower window pays — and where it doesn't

The argument lives or dies analyte by analyte. Here is what the trial evidence actually shows for the four cases that come up most.

Ferritin — the strongest case for the narrower window

The lab calls ferritin "low" at about 15–30 ng/mL. The optimal range pushes that floor up to 50 or even 100. This is the one where the narrower window has real trial support: menstruating women with ferritin sitting between 15 and 50 and unexplained fatigue get a substantial improvement from iron supplementation, despite a normal lab report.

If you are a menstruating woman with persistent fatigue and a ferritin between 15 and 50, the report saying "normal" is the report being wrong about you. Push for a trial of iron.

TSH — the case where the narrower window collapses

Standard thyroid range is roughly 0.4–4.5 mIU/L; functional-medicine practice often targets under 2.5. The 2.5 cutoff has surface appeal — 80% of disease-free Americans in the big national survey did sit below it — but TSH also rises naturally with age (the 97.5th percentile is about 3.5 for under-50s and climbs to 7.5 for over-70s), and the trials that tested whether treating the gap actually helps came back empty.

About 21 million Americans take levothyroxine; roughly 90% probably don't need it (Brito et al. 2021). The pill is cheap, but iatrogenic low TSH from overtreatment carries real cardiovascular risk in older adults. The narrower TSH window is the textbook case of "in range on the wider scale" being the right call.

Vitamin D — the case where the optimal window quietly reversed

In 2011 the Endocrine Society said the floor for sufficiency was 30 ng/mL and the target was 40–60 (Holick 2011). The IOM the same year said 20 was enough for skeletal health in 97.5% of the population (IOM 2011) and refused to endorse non-skeletal targets without trial data. Vitamin D testing and supplementation tripled in the following decade. Then the actual trial landed.

Vitamin D matters in genuinely deficient people — housebound elderly, dark-skinned residents of northern latitudes, exclusively breastfed infants. For the average healthy adult, chasing your 25(OH)D from 28 to 45 with daily supplements turned out not to do anything the trial could measure.

HbA1c — the case in the middle

The ADA cutoffs are clean: under 5.7% is normal, 5.7–6.4 is prediabetes, 6.5+ is diabetes (ADA 2024). But cardiovascular and mortality risk track HbA1c continuously well below the prediabetes line — risk starts climbing around 5.5. The "optimal" target you hear is ≤5.4 or even ≤5.0. The risk gradient is real, but the recommendation either way is the same: sleep, food, exercise, weight. The label "prediabetic at 5.6 vs healthy at 5.4" doesn't change what to do. Skip the catastrophizing, watch the trajectory.

Three things people get wrong

"In range" means healthy. "Out of range" means sick. Neither. In range can mean you are the 51st percentile of a population that is, on average, modestly unwell — and your "normal" ferritin of 22 is what is making you tired. Out of range can mean you are the 2.5% tail of healthy variation, or that the assay drifted, or that you ate before a fasting test, or that you had a virus two weeks ago. The range is a screening filter, not a diagnosis.

Narrower means more accurate. Narrower means more sensitive — catches more genuinely low or high values — at the cost of more healthy people getting flagged. Whether that trade pays depends entirely on whether intervention at the narrower threshold actually helps. TRUST and VITAL are the two clean tests of that question, and both came back saying the narrower window didn't earn its keep.

One reading near a cutoff tells you something. Not much. Lab numbers move around for reasons other than your underlying state — analytical noise plus normal biological variation — and the reference change value (Fraser 2011) for many common tests is 15–25%. Two readings that differ by less than that have not meaningfully changed. A single TSH of 4.2 versus 3.8 last year, with no symptoms and no antibodies, is not a story.

How to read your own report

A workable order of operations, in plain steps.

Read the printed range first. Values clearly outside it are signal — bring them to your clinician. A flagged result deep into red is not the same as a flagged result a hair past the cutoff.
For values near a boundary, ask whether you have symptoms or a defensible reason to worry. A ferritin of 30 in a tired menstruating woman is a different number than a ferritin of 30 in an asymptomatic 45-year-old man.
Apply the narrower target only where the trial evidence supports it. Ferritin in symptomatic menstruating women, yes. Vitamin D in clearly deficient or housebound populations, yes. TSH below 2.5 in an asymptomatic adult, no. Testosterone "optimization" in mid-range men, no.
Before retesting, ask whether a real change would even register. If the test's normal noise is wider than the change you are looking for, the retest will just produce more noise. Wait for a meaningful interval (3–6 months for most slow-moving biomarkers) and check the trajectory, not the wobble.
On a multi-test panel, expect at least one flag in a healthy person. The math says about half the time on a 14-test panel. Triage by clinical context — a mild flag with no symptoms and no related red flags usually gets a "watch it next year," not a workup.

The discipline this protects is not "ignore your labs." It is "let the lab be one input among several" — symptoms, trajectory, the related labs that would corroborate a finding, your prior probability of actually having the thing.

Both directions fail badly

The two failure modes look opposite from the outside but cost about the same in lived time.

"Everything is in range, you're fine." This is how iron deficiency goes uncorrected for years in menstruating women whose ferritin sits at 25, how the genuinely hypothyroid 35-year-old with TSH 4.0 and positive antibodies gets sent home, how the housebound elderly woman's 25(OH)D of 18 gets a green check. The lab said normal; the doctor was busy; the symptoms get re-attributed to stress or aging. Months pass. The energy / focus / mood symptoms compound. The person stops trusting their own perception that something is wrong.

"Optimize every number into the narrow band." This is the wellness-clinic playbook: large panel, narrow optimal targets on every line, supplement protocol to push each value into the window, retest in three months. The diffuse cost shows up in five places. (1) Iatrogenic harm — most visibly the levothyroxine over-prescription pattern (Brito et al. 2021), where ~21M Americans take a drug that probably is not helping them and risks atrial fibrillation and bone loss in older adults. (2) Cascade-of-care from incidental findings — a national survey of 991 US internists found 99% had experienced cascades after incidental findings, 94% with no clinically important outcome at the end, with patient psychological harm in 68% of cases (Ganguli et al. 2019). (3) Screening-driven over-diagnosis, of which the Korean thyroid-cancer episode is the cleanest example — ultrasound screening drove a 15-fold increase in diagnosis between 1993 and 2011 with no change in mortality (Ahn, Kim & Welch 2014). (4) Money — retesting cadence, branded supplements, follow-up visits. (5) The opportunity cost of attention — the hours spent on biomarker chasing are hours not spent on sleep, training, food, social connection, which is where the larger effect sizes actually live.

The pattern that prevents both failures is the same: read the number in context, ask what intervention follows from the answer, and only act when intervention has been shown to help.

What this looks like if you keep getting it wrong

The cost lands in two completely different ways and you can pick which one you want.

If you default to trusting the green checks on your lab report, here is what the next decade looks like. You feel a low background hum of tired you cannot explain, and every doctor visit ends with "labs look fine." You start drinking more coffee. Friends notice you have less spark than you used to and stop suggesting things. By your mid-40s you have built a story about yourself that you are just not the energy person you were — when in fact your ferritin has been sitting at 22 for years and a $20 box of iron tablets would have given you the back half of your day back. Same pattern for the borderline thyroid, the missed B12, the vitamin D at 18 in a Boston winter.

If you default the other way — every flag deserves a workup, every number must hit the narrow band — the decade looks different but no better. There is always one number on the report that needs attention. You add the supplement, retest in three months, the number moves a tenth of a unit, your clinician suggests another panel. You spend Saturday mornings on healthcare logistics. The incidental nodule on a scan you didn't need turns into three more scans and a biopsy that comes back benign — the cascade pattern that 99% of physicians say they see routinely. You spend hundreds of dollars a year on supplements no trial has shown do anything. The people around you notice you have become the friend who is always working on a health thing.

The version of you who reads lab reports well looks like neither. You catch the iron deficiency at 28 because you noticed the symptoms and pushed for a trial of treatment despite the green check. You ignore the borderline TSH the supplement clinic wanted to put you on thyroid medication for. You spend the freed-up time and money on the things that actually move the needle — sleep, training, food, the people you love. The friend test for whether you are reading labs well is not whether you have an opinion on every number. It is whether the people around you can tell you have an opinion on every number.

Closely related topics worth a look: the case for and against running an annual blood panel at all when you have no symptoms; iron and ferritin testing specifically; vitamin D supplementation in genuinely deficient populations; the cardiovascular biomarker stack (ApoB, Lp(a), coronary calcium score), which carries its own ranges-and-optimal logic; how to evaluate a direct-to-consumer wellness panel before you order it.

Related in the handbook

— B12 is a prime example: inside the 'normal' range, your cells can still be starving.
— Ferritin is the poster child — a result inside 'normal' can still mean your iron is depleted.
— Iron deficiency hiding inside a 'normal' ferritin is the headline example of reading past the green check.
— A borderline-high TSH is the classic case for reading your own labs by value, not just the flag.
— Low T is the classic case of a borderline lab driving a lifelong decision. Confirm it on two fasted morning draws first.
— Learning normal-vs-optimal is what turns an advanced cardiac panel from a page of numbers into actual decisions.
— ApoB is a prime example of why normal on a lab isn't the same as optimal.
— A CGM in a healthy person throws off 'spikes' that look alarming against an optimal-range mindset but mean nothing — read them with the same skepticism.
— Like an MRI report, a lab result can read scarier than your body actually is — learn to read your own.
— eGFR is a prime example of a lab number that can read normal while something's actually wrong.
— A1c is a prime example of why a lab value needs context before you act on it.
— Both are ways healthy people get turned into patients — a borderline 'flag' or a meaningless positive sending you down a needless cascade.
— Reading these two anemia numbers well is exactly the past-the-green-check interpretation this is about.
— Reading your own labs pairs with knowing how to read a treatment's real benefit — together they make you a sharper patient.
— Pulse oximetry is the bedside version of this problem — a 'normal' SpO2 can hide a real low, especially on darker skin.
— Liver panels are a perfect case of why 'normal' on the report isn't the whole story — read by pattern, not just the reference range.
— Restless legs is a perfect example: the lab calls your ferritin normal, but for this you need it much higher.

Substance + claimed effects

The substance here is not a molecule but an interpretive frame: how to read a number on a lab report. Two frames compete. The "normal" (or population reference) range is what clinical laboratories print next to your result — by international convention (CLSI EP28-A3c) it is the central 95% of values from a defined reference population, with the lower limit at the 2.5th percentile and the upper at the 97.5th. The "optimal" range is a narrower window, usually proposed by functional-medicine practitioners, integrative clinicians, or specific research groups, defined by where the analyte sits in people with the best long-term health outcomes rather than where it sits in everybody. The two can disagree by a factor of two or more (Surks & Hollowell 2007; Holick 2011; Krayenbuehl 2011). The claimed consequences run in both directions: optimal-range thinking can surface real deficiency hidden inside "normal" (iron, thyroid, vitamin D in some populations); it can also generate false-positive alarm, supplement cascades, treatment of non-disease (Welch et al. 2011; Ganguli et al. 2019; Brito et al. 2021), and a steady drift toward biomarker chasing that crowds out interventions with larger effect sizes. The entry covers the mechanism of how reference ranges get built, the analytes where the normal/optimal gap is biggest and most consequential, the downstream effects (more retesting, more supplementation, more medication, more anxiety), and how a thoughtful reader should triage which "abnormal-on-paper" results actually warrant action. Score-relevant consequences: longevity (avoiding both under- and over-treatment), short-term health (catching real deficiency early), focus and energy (correcting iron / thyroid / vitamin D where genuinely low), mood (less health anxiety from cascade-driven retesting), cost (test bills, supplement spend, follow-up visits), effort (the discipline of not chasing every flag).

Evidence by addressing question

Mechanism — how a "normal" range gets made

The standard procedure is mechanical, not clinical. The lab recruits a "reference population" of apparently-healthy people (minimum 120 by CLSI guidance, often a few hundred), runs the assay, takes the central 95% of the distribution, and prints the 2.5th and 97.5th percentiles as the lower and upper limit of normal (CLSI EP28-A3c). By construction, 5% of healthy people fall outside this range — 2.5% on each tail. The range is descriptive of the sampled population, not prescriptive of health.

This creates several structural artefacts. First, the reference population is "apparently healthy" by self-report and minimal screening, so it includes people with subclinical disease, occult deficiency, low-grade inflammation, suboptimal sleep, and any other modern-life background that nobody screened them for. The range is built from a population that includes the very dysfunctions you are trying to detect. Second, on a multi-analyte panel — the comprehensive metabolic panel runs 14 tests, a typical wellness panel runs 30+ — the probability of at least one flagged result in a healthy person rises sharply: with 14 independent tests each having a 5% false-positive rate, P(at least one flag) = 1 − 0.95¹⁴ ≈ 51%. Third, reference ranges are population- and method-specific: a hormone immunoassay and a mass-spectrometry measurement of the same hormone give numerically different results with different ranges, and a Japanese reference population gives different limits than a Scandinavian one. Fourth, serial measurements on the same person move around for reasons other than true biological change — analytical noise plus within-subject biological variation define the reference change value (RCV), the minimum difference between two tests on the same person that is unlikely to be noise alone (Fraser 2011). For many common analytes the RCV is large enough (often 10–30%) that two "different" results inside the reference range may not mean anything has changed.

The "optimal" range is constructed differently. The idea: instead of looking at where everyone's values sit, look at where the values sit in the subset of people with the best outcomes — lowest disease incidence, best symptom profile, longest follow-up survival. For ferritin, that pushes the lower bound up from ~15–30 ng/mL (the conventional cutoff for iron deficiency anaemia) toward 50–100 ng/mL, since trials show that symptomatic improvement in fatigue, exercise capacity, and restless legs syndrome occurs in non-anaemic women with ferritin ≤50 ng/mL when treated with iron (Krayenbuehl et al. 2011). For TSH, NHANES III data showed 80% of disease-free Americans sit below 2.5 mIU/L; the proposed optimal upper bound of 2.5 thus disagrees with the lab-printed upper bound of ~4.5 by almost 2× (Surks & Hollowell 2007). For 25(OH) vitamin D, the IOM set 20 ng/mL as the threshold sufficient for skeletal health in 97.5% of the population (IOM 2011); the 2011 Endocrine Society guideline called for 30 ng/mL minimum and 40–60 ng/mL as the target for "sufficiency" (Holick 2011); the 2024 Endocrine Society guideline walked this back and no longer endorses the 30 ng/mL target for healthy adults under 75 (Demay et al. 2024). The optimal-range proponents are not making one move; they are making analyte-by-analyte arguments of variable rigour.

Evidence — where the normal/optimal gap is real, where it is decoration

TSH. The strongest case for a narrower "optimal" range is also the one that most cleanly illustrates the trap. Surks & Hollowell 2007 showed that the upper TSH limit shifts with age — the 97.5th percentile is about 3.5 mIU/L for under-50s and rises to 7.5 mIU/L for those 70 and over. Naively applying a single 2.5 cutoff to older adults would label more than 35% of them hypothyroid. The TRUST trial (Stott et al. 2017), a randomized double-blind placebo-controlled trial of 737 adults 65+ with subclinical hypothyroidism (mean TSH 6.4 mIU/L) showed levothyroxine produced no improvement in symptoms, tiredness, or quality of life and no reduction in cardiovascular events — the textbook example that "labelling abnormal" and "treatment helps" are different questions. And yet about 21 million Americans take levothyroxine, roughly 90% of them without clear benefit, with about 240,000 new cases of iatrogenic low TSH from overtreatment each year (Brito et al. 2021). The optimal-range argument here cuts both ways: lowering the upper limit might catch a small number of truly hypothyroid people earlier, but the dominant population effect of doing so would be more overtreatment, not less.

Vitamin D. The 2011 Endocrine Society guideline (Holick 2011) defined sufficiency as 25(OH)D > 30 ng/mL and "optimal" as 40–60 ng/mL, drawing on observational links between low 25(OH)D and a long list of outcomes (cardiovascular disease, cancer, autoimmune disease, mortality). The IOM, the same year (IOM 2011), set 20 ng/mL as adequate for skeletal health in 97.5% of the population and explicitly declined to endorse non-skeletal thresholds in the absence of RCT evidence. The VITAL trial (Manson et al. 2019), the definitive RCT of 25,871 adults randomized to 2,000 IU/d vitamin D vs placebo, showed no reduction in cardiovascular events or invasive cancer over five years. The 2024 Endocrine Society guideline (Demay et al. 2024) reversed course — recommending healthy adults under 75 follow the IOM's 600 IU/d RDA without 25(OH)D testing, and no longer endorsing the 30 ng/mL "deficient" threshold. The story is that an optimal range built on observational associations did not survive contact with RCT evidence; the population-percentile threshold turned out to be approximately right.

Ferritin. Here the optimal-range argument has actual trial support. The conventional cutoff for iron deficiency is ferritin < 15 (women) or < 30 ng/mL (more inclusive cutoff). But menstruating women with ferritin between 15 and 50 ng/mL and unexplained fatigue derive substantial benefit from iron supplementation: Krayenbuehl et al. (2011) showed IV iron in non-anaemic premenopausal women with ferritin ≤50 cut fatigue scores roughly in half from baseline, with a ~19% advantage over placebo. The mechanism is plausible — ferritin is an acute-phase reactant and rises with inflammation, so the "true" iron-deficiency cutoff is conventionally raised to 100 ng/mL when CRP is elevated. Here the conventional reference range is genuinely too permissive on the low end; symptomatic iron deficiency is missed in millions of menstruating women whose values are "normal."

HbA1c. The ADA defines diabetes at HbA1c ≥6.5% and prediabetes at 5.7–6.4% (ADA 2024); below 5.7% is "normal." But cardiovascular and mortality risk track HbA1c continuously well below this cutoff — the elevation in risk of major adverse cardiovascular events starts around HbA1c 5.5%, and observational cohorts find risk gradients within what the lab calls normal. The "optimal" HbA1c is often cited as ≤5.4 or even ≤5.0%, on the grounds that lifestyle changes (diet, exercise, sleep) can push HbA1c into the lower normal range and that lower is better across the curve. This one sits in the middle: the gradient is real and not controversial, but it is a graded risk factor like LDL — the question is whether labelling the 5.5–5.6 range earns its keep, given that the recommendation either way is the same lifestyle nudge.

Testosterone. The harmonized adult-male reference range from four large cohorts is 264–916 ng/dL for men 19–39 (Travison et al. 2017), with a wider distribution after age 40. The optimal-range literature in men's-health circles pushes the lower bound toward 500–600 ng/dL and treats anything below as candidate for testosterone replacement therapy. Trial evidence does not support this — symptomatic late-onset hypogonadism is not reliably defined by a number above the conventional cutoff, and the testosterone-replacement market has expanded faster than the evidence for benefit in men with intermediate-range testosterone.

Practice / clinical consensus

Mainstream endocrinology, primary care, and laboratory medicine treat the population reference range as the default and look for context (symptoms, trajectory, related labs) before acting on a borderline number. USPSTF declines to recommend routine CBC, comprehensive metabolic panel, or vitamin D screening in asymptomatic average-risk adults — not because testing is harmful in itself, but because the net effect of running a panel in low-prior-probability adults is to surface false positives at high rates and generate cascades without commensurate health gain. The Ganguli et al. (2019) national survey of internists found 99.4% had experienced cascades of testing after an incidental finding, and 94% had experienced cascades with no clinically important outcome — causing patient psychological harm (68%), physical harm (16%), and financial burden (58%). Functional-medicine and integrative-medicine practice runs the opposite playbook: extensive panels, narrow optimal ranges, supplement protocols aimed at moving values into the optimal window. Specialty bodies (Endocrine Society, AACE, ATA) sit between these poles and shift their recommendations as RCT data accumulates — the 2024 Endocrine Society reversal on vitamin D thresholds is the clearest recent example (Demay et al. 2024).

Misconceptions

The most common misreading among readers who have absorbed wellness content: "in range" means "healthy" and "out of range" means "diseased." Neither is true. In range can mean a number that is the 51st percentile of a population that is, on average, modestly sick; out of range can mean the patient is the 2.5% tail of healthy variation, or that the assay drifted, or that they fasted differently than last time, or that they had a viral infection two weeks ago. A related misconception flows from the optimal-range side: that narrower means more accurate. It does not. Narrower means more sensitive (catches more true low/high values) at the cost of specificity (more healthy people flagged as off-target). Whether the trade-off pays depends entirely on whether intervention at the narrower threshold actually helps — which is an RCT question, not a percentile question. The TRUST trial and VITAL trial are the two reference cases where the narrower threshold was tested directly and the intervention did not pan out (Stott 2017; Manson 2019).

A third misconception: that "your" personal optimal is knowable from one or two readings. Within-subject biological variation plus assay imprecision combine to define the reference change value (Fraser 2011); for many analytes the RCV is 15–25%, meaning two readings that differ by less than that have not meaningfully changed. Single readings near a cutoff (just above, just below) carry less information than the chart's clean dichotomy suggests.

Protocol — how to actually read a panel

The workable interpretive frame is graded rather than binary. (1) Use the lab-printed reference range as the screening filter — values far outside it are signal. (2) For values inside the reference range but near the boundary, ask whether symptoms or a defensible prior exist; if not, don't act. (3) For analytes where an "optimal" range has RCT-grade support (ferritin in symptomatic menstruating women; vitamin D in genuinely deficient northern-latitude / dark-skinned / housebound populations), allow the narrower target to influence the work-up. (4) For analytes where the optimal-range claim is observational only (most hormones, most micronutrients beyond iron and vitamin D), default to the conventional range and don't supplement to chase a number. (5) Before retesting, ask: do I expect the change to exceed the reference change value? If not, retesting will produce noise. (6) For multi-analyte panels: expect at least one flagged result in a healthy person and triage by clinical context, not by alarm.

Contraindications / failure modes

The dominant failure mode of "everything in range is fine": missing iron deficiency in menstruating women, missing B12 deficiency at the lower end of normal in vegetarians and metformin users, missing genuine hypothyroidism in a middle-aged adult whose TSH sits at 4.0 with positive thyroid antibodies, missing low ferritin in chronic blood donors. The dominant failure mode of "optimize every number into the narrow band": iatrogenic thyrotoxicosis from overprescribed levothyroxine (Brito et al. 2021), unnecessary biopsies and procedures from incidentalomas (Ahn, Kim & Welch 2014 on the Korean thyroid-cancer epidemic — 15× increase in diagnosis, no change in mortality), expensive supplement regimens with no measurable benefit, and the diffuse harm of health anxiety. Cascade events specifically — one flagged number begets a confirmatory test, then imaging, then a specialist visit, sometimes a biopsy — were reported by 99% of US internists in a national survey, with patient psychological harm reported in 68% of cases (Ganguli et al. 2019).

Stakes — what happens if you don't think carefully about ranges

Two stakes, one in each direction. (1) Real deficiency goes unaddressed when symptoms are dismissed because the number sits inside a wide reference range — the iron-deficient woman with ferritin 35, the genuinely hypothyroid 35-year-old with TSH 3.8 and antibodies, the dark-skinned northern-latitude resident with 25(OH)D 22. (2) Real harm accumulates when "optimization" becomes an end in itself — repeated panels, iatrogenic thyrotoxicosis, biopsy cascades from incidentalomas, supplements at doses that have failed in RCTs, anxiety, money spent on retesting in place of money spent on the high-yield levers (sleep, training, food, social connection). The cost ledger is real on both sides; the asymmetry is that the "test more, treat to optimal" side carries the louder marketing voice.

Out-of-scope

Adjacent topics the reader may want to look into: which specific panels are worth running (separate entry); the case for / against full-body MRI and direct-to-consumer screening; specific deficiency entries (iron, vitamin D, B12, magnesium); the cardiovascular biomarker stack (ApoB, Lp(a), CAC score) which has its own ranges-and-optimal logic; how to interview a lab-driven clinician.

The credibility range

Optimist case

Conventional reference ranges are a screening filter, not a wellness target — they were built to catch disease, not to define health. Functional-medicine practitioners and the optimal-range literature have surfaced real things mainstream medicine missed: iron deficiency in non-anaemic menstruating women with ferritin in the 15–50 range (the Krayenbuehl trial is hard to argue with); vitamin D inadequacy in northern-latitude, dark-skinned, and elderly populations that the 20 ng/mL skeletal threshold misses; symptomatic hypothyroidism in middle-aged women with TSH in the upper "normal" range and positive antibodies. The information-loss from collapsing a continuous biomarker into a binary in/out is real, and a thoughtful clinician adds value by treating the number as one input rather than a verdict. The optimal-range frame also encourages serial monitoring — tracking your own trajectory rather than comparing to a static population cutoff — which is genuinely useful for biomarkers where the within-person change matters more than the cross-sectional position (Fraser 2011).

Skeptic case

Most "optimal ranges" published by functional-medicine sources are not RCT-derived; they are expert-opinion thresholds based on observational associations, mechanism plausibility, and (sometimes) the commercial structure of the practice that promotes them. The two strongest test cases — TSH and vitamin D — both went the way of the conventional range: TRUST showed levothyroxine in the 4.5–10 TSH band produces no measurable benefit in older adults (Stott 2017), VITAL showed 2,000 IU/d vitamin D produces no cardiovascular or cancer benefit and the 2024 Endocrine Society guideline walked back the 30 ng/mL threshold (Manson 2019; Demay 2024). Meanwhile, the cost side of "optimize everything" is well-documented: ~21M Americans on levothyroxine they probably don't need (Brito 2021); 99% of physicians reporting cascade events from incidental findings (Ganguli 2019); a 15-fold increase in Korean thyroid-cancer diagnosis from ultrasound screening with zero change in mortality (Ahn 2014); the broader Welch-Schwartz-Woloshin case (Welch 2011) that lower cutoffs systematically generate overdiagnosis. The wellness-industrial complex has a financial incentive to convert healthy people into patients, and "optimal range" is a major mechanism by which that conversion happens.

Author's call

The frame is right; the execution depends on the analyte. The information-loss critique of population ranges is correct: an in-range value is not automatically reassuring, and the same number can mean different things in different people. But the universalized "optimize every number" practice is mostly not evidence-based; the two best-tested narrower thresholds (TSH 2.5, vitamin D 30) did not pan out in RCT. The defensible synthesis is analyte-specific: where RCT evidence exists for benefit at a narrower threshold (ferritin in symptomatic women; vitamin D in clearly deficient populations), apply the narrower threshold. Where it does not, default to the conventional range and pair it with clinical context — symptoms, trajectory, related labs, prior probability. The reader's main protection is interpretive discipline: treat a single near-boundary value as low-information, expect false positives on multi-analyte panels, and don't supplement to chase a number when an RCT has shown the intervention doesn't help. Evidence: 4 for the meta-claim that population ranges and "optimal" ranges diverge for real reasons; controversy: 4 because the field is split between mainstream and functional camps with both holding real ground.

Stakeholder + incentive map

Reference labs (Quest, Labcorp, regional hospital labs) print the population reference range per CLSI EP28-A3c. Method-specific, occasionally population-specific. No incentive to define "optimal."
Guideline bodies (USPSTF, Endocrine Society, AACE, ADA, AHA/ACC) issue cutoffs that drive billing and standard-of-care lawsuits. Historically conservative; have moved in both directions as RCT data lands (Endocrine Society on vitamin D the canonical recent example).
Functional / integrative medicine (IFM, A4M, the supplement-prescribing wing of naturopathy) advance narrower optimal ranges as a clinical-and-business model. Mixed signal: occasionally ahead of mainstream on real findings, frequently profit-motivated and undertested.
Direct-to-consumer testing (InsideTracker, Function Health, Levels, etc.) commercializes panel testing with branded "optimal" zones; growth depends on retesting cadence and supplement attach.
Supplement industry benefits from any narrowing of the "good enough" threshold, since the gap between current value and target is what sells product. Vitamin D supplementation roughly tripled in the US after the 2011 Endocrine Society guideline.
Pharmaceutical industry benefits similarly when prescription drugs sit downstream — levothyroxine being the leading example, statin thresholds being another.
Skeptics — Welch's overdiagnosis school, the Choosing Wisely campaign, the USPSTF tradition — counter that lowered thresholds systematically generate iatrogenic harm and divert attention from interventions with larger demonstrated effect.

Population variability

Reference ranges vary by age, sex, ancestry, assay method, time of day, fasting state, pregnancy, and recent illness. A few that matter most:

Age. TSH rises with age — the 97.5th percentile moves from ~3.5 in under-50s to ~7.5 in over-70s (Surks & Hollowell 2007). Fasting glucose, blood pressure, creatinine all drift. Applying young-adult cutoffs to older adults systematically over-diagnoses.
Sex. Ferritin reference ranges differ between men and menstruating women (women's range is often 10–120 vs men's 20–250). Testosterone obviously differs by sex; the harmonized male range is 264–916 ng/dL for 19–39 (Travison 2017) and changes with age.
Ancestry / skin pigmentation. 25(OH)D is systematically lower in dark-skinned populations at temperate latitudes; whether the lower value carries the same disease implication is contested (Black Americans have lower 25(OH)D than white Americans but better bone density). The 30 ng/mL cutoff under-recognized population-level differences in dependent ranges.
Assay method. Hormone immunoassays vs LC-MS/MS produce numerically different results for the same hormone. Comparing testosterone or estradiol across labs without method alignment is unreliable.
Acute physiology. Ferritin spikes with inflammation; HbA1c rises with hemoglobin variants; cortisol depends on collection time. A "high" or "low" value in the wrong context is not what it looks like.

Knowledge gaps

The biggest unknown is the per-analyte question: for which biomarkers does intervention at the narrower "optimal" threshold actually change outcomes? TRUST and VITAL settled TSH and vitamin D in older adults; the analogous trial has not been run for most other analytes proposed for optimal-range targeting (ferritin RCTs exist in symptomatic premenopausal women but not as broadly; testosterone RCTs in mid-range men are sparse). The "optimize every number" position needs to demonstrate, analyte by analyte, that pushing values into the narrower band changes hard outcomes. Until then, the default should be the conventional range plus context, with explicit narrower thresholds only where the RCT evidence supports them. A second gap: the cost of overdiagnosis from optimal-range thinking has been documented qualitatively (Welch 2011; Ganguli 2019) but not quantified at population scale — what fraction of the ~$200B/year in cascade-of-care spending traces specifically to narrowed reference cutoffs?

Scope choice. The brief named the topic at the level of the interpretive frame ("normal vs. optimal") with three named consequences (screening, follow-up testing, intervention thresholds). I covered all three through the analyte-by-analyte evidence section and the failure-modes section, rather than splitting them into separate addressing sections — they interlock so tightly that separating them produced redundant prose. The entry holistically covers the frame; no consequence from the brief was dropped.

Analyte selection in the evidence section. Picked TSH, vitamin D, ferritin, and HbA1c because they are the four cases where the normal/optimal debate is most active, the trial evidence is best, and the reader is most likely to encounter the question in their own life. Testosterone and the cardiovascular biomarker stack (ApoB, Lp(a), CAC) were considered and deferred — testosterone gets covered in research §3b under evidence but not in the article body because the case is similar in shape to TSH (narrower-window claim, thin RCT support) and adding it would have made the section repetitive. Pointed at the cardio stack in out-of-scope as the natural next entry.

Action type. Settled on know because the entry is teaching an interpretive frame, not prescribing a behavior. test would have been wrong because we are explicitly arguing against reflex retesting; do would have implied a prescribed behavior.

Evidence score 4, not 5. The structural claim (population reference intervals = central 95% per CLSI) is rock-solid and the two best-tested narrower thresholds (TSH 2.5 in TRUST, vitamin D 30 in VITAL) have definitive RCT answers. But per-analyte calls for biomarkers without those trials remain mixed — ferritin in non-anaemic women has trial support, testosterone "optimization" does not, and many other proposed optimal ranges sit on observational data only. 4 captures "strong frame, mixed details."

Controversy score 4. Real foundational disagreement between mainstream guideline medicine (USPSTF, 2024 Endocrine Society) and the functional-medicine / integrative camp, with significant downstream economic interests (lab companies, supplement makers, direct-to-consumer testing services) pulling on each side. The 2024 Endocrine Society vitamin D reversal is the clearest recent demonstration that the field is still actively churning.

Future-link candidates (separate entries that don't yet exist, this entry should cross-link once they do):

Iron deficiency without anaemia — the ferritin 15–50 case is significant enough to deserve its own entry, ideally for menstruating women specifically.
Subclinical hypothyroidism — TRUST and the levothyroxine over-prescription pattern.
Direct-to-consumer wellness panels — Function Health, InsideTracker, the branded-optimal-zone playbook.
ApoB and the cardiovascular biomarker stack — clear case where the narrower target has RCT support, opposite of the vitamin D story.
Cascade of care after an incidental finding — could justify its own entry on how to push back when a workup spirals.

What I deliberately did not include. Did not give specific number-by-number "optimal ranges" for a long list of biomarkers — that would have turned the article into a checklist of contestable cutoffs and undermined the main point, which is interpretive discipline, not memorization. The protocol section gives the decision procedure; specific cutoffs belong in per-biomarker entries.