External research

External research

This page lists the research used to back the content on this site. Every claim on every topic page links back here. For scientific terms used below, see the glossary.

Research selection and evaluation

Research selection follows a human-evidence-first hierarchy:

  1. Pre-registered RCTs and meta-analyses of RCTs — highest confidence. Grade A requires pre-registered primary outcomes. Unregistered trials and post-hoc outcome analyses are treated as exploratory.
  2. Well-designed RCTs — strong individual trial evidence. A single RCT, regardless of quality, caps at Grade B. Grade A requires independent replication from a separate research group with different funding.
  3. Prospective cohort studies — Grade B only. Observational findings in nutrition have a documented history of failing to replicate in controlled trials; they are treated as hypothesis-generating, not claim-confirming.
  4. Mechanistic, animal, and in-vitro studies — supporting context only, always labeled as lower certainty. Animal and cell-model results routinely fail to translate to humans in nutritional science.

Grading rules — all must pass for Grade A

  • Grade A — strong human evidence. Wording: “improves”, “increases”, “enhances”, “directly affects”, “well-established”. Requires: pre-registered primary outcomes (prospective or Registered Report), ≥2 independent research groups, hard endpoints (not surrogate-only), clinically meaningful effect size, no 2+ red flags.
  • Grade B — moderate evidence. Wording: “associated with”, “suggests”, “likely”, “tends to”, “indicates”, “supports”. Applies to: single RCTs, observational findings (hypothesis-generating only), post-hoc outcomes, surrogate-endpoint-only, industry-funded sole support, narrative reviews, attrition >20% without ITT, meta-analyses without RoB assessment, crossover trials with inadequate washout, very wide CIs, non-primary outcomes without multiple comparison correction, retrospectively registered trials, unregistered post-2000 RCTs, time horizon mismatch (short-term evidence for long-term claim), unverified blinding with subjective outcomes, unverified compliance (long/subjective studies).
  • Grade C — limited/early/mixed. Wording: “may”, “might”, “possible”, “emerging”, “preliminary”, “uncertain”, “limited evidence”. Applies to: animal/in vitro, single observational, mechanistic-only, preprints (until peer-reviewed), post-hoc subgroup findings (regardless of parent study quality). Mechanism alone — regardless of plausibility — does not upgrade grade.
  • Grade A language with Grade C evidence = overclaim — blocked.
  • Grade A language for observational-only or surrogate-only claims = blocked.
  • Preprints are Grade C until published in a peer-reviewed journal.
  • Post-hoc (unregistered) subgroup findings are Grade C regardless of parent study type.
  • Retrospectively registered trials and unregistered post-2000 RCTs are capped at Grade B.
  • Null results from underpowered studies cannot be cited as evidence of absence.

Additional grading rules applied to every citation:

  • Effect size: statistical significance alone does not qualify. An effect must be clinically meaningful in magnitude. Small-but-significant results receive Grade B language with the effect size stated explicitly.
  • Outcome type: claims based solely on surrogate biomarkers — cholesterol levels, inflammatory markers, hormone concentrations — are capped at Grade B. Grade A requires hard outcomes (disease events, mortality, physical function) or surrogates with validated predictive value.
  • Replication: Grade A requires evidence from at least two independent research groups (different institutions, different funding). Single-group or single-lab findings are capped at Grade B.
  • Funding: industry-only evidence is capped at Grade B regardless of study quality. Grade A requires independent corroboration confirming both direction and magnitude.
  • Trial registration: for RCTs published after 2000, outcome switching — reporting endpoints not pre-specified in the original registration — is treated as exploratory and capped at Grade B.
  • Pre-registration quality tiers: Registered Reports (journal commits to publish before data collection) carry the highest weight. Prospective registration (before data collection) is standard. Retrospective registration (after data collection began) is treated as near-equivalent to no registration — outcomes are capped at Grade B. Unregistered post-2000 RCTs: all outcomes are treated as potentially post-hoc.
  • Publication bias: meta-analyses are assessed for signs of publication bias. Positive results are published at higher rates than null results; a meta-analysis of a biased literature produces a biased estimate. When bias is detectable or likely, the pooled effect is interpreted conservatively.
  • Absolute vs relative effects: relative risk reductions (e.g., ‘50% lower risk’) are always reported alongside absolute numbers. A 50% relative reduction may mean an absolute change from 2% to 1% — meaningful context that changes interpretation. When only relative effects are available and the absolute baseline is unknown, this limitation is noted and claims use cautious language.
  • Confidence interval precision: a statistically significant result with a very wide confidence interval is imprecise — the true effect could be negligible or very large. Imprecise estimates receive cautious language regardless of the point estimate.
  • Statistical heterogeneity: when a meta-analysis pools studies with very different results (high heterogeneity), the pooled average is unreliable. High-heterogeneity meta-analyses receive cautious language and note the inconsistency.
  • Time horizon: claims implying long-term or ongoing benefit require evidence matching that timeframe. A 4-week trial cannot support claims of lasting benefit. When study duration is shorter than the claimed effect horizon, claims use cautious language.

Study quality markers

Beyond study type, every cited study is evaluated for design quality. Red flags include: small sample size (n < 20 per arm), short duration for long-term claims, population mismatch, unblinded subjective outcomes, post-hoc outcome changes, single-lab findings, high dropout (>20% differential between arms) without intention-to-treat analysis, inadequate washout in crossover trials, unverified blinding when the intervention has known unblinding properties (taste, smell, side effects) and the outcome is subjective, and unverified compliance (self-report only) for studies longer than 4 weeks with subjective outcomes. Any study with 2 or more red flags is capped at Grade B; 3 or more caps at Grade C.

Subjective outcomes

Claims based solely on self-reported outcomes — sleep quality, mood, perceived energy, pain scales — require stronger evidence for the strongest language. Grade A for subjective outcomes requires a double-blind RCT with a validated measurement instrument, at least 50 participants per arm, and independent replication. Missing any of these caps the claim at Grade B.

Guidelines and consensus statements

Guidelines, consensus statements, and position papers are committee outputs — interpretive documents, not primary evidence. They reflect a committee’s reading of the evidence at a point in time, filtered through institutional incentives and member conflicts of interest.

  • Guidelines are capped at Grade B regardless of the issuing body’s prestige. Only the underlying primary data (RCTs, meta-analyses, strong cohorts) can support Grade A claims.
  • “WHO recommends X” or “EFSA approved Y” is not evidence — it is an authority citation. The underlying studies are evaluated independently.
  • Many guidelines have been substantially revised or reversed over the decades — dietary cholesterol limits (reversed ~2015), saturated fat recommendations (weakened), low-fat dietary advice (revised), hormone replacement therapy (reversed after WHI trial). These reversals are not cited to undermine science but to demonstrate that consensus is provisional and must be evaluated against current primary data.
  • When a guideline committee has majority-industry ties or undisclosed composition, this is noted alongside the citation.

Review types

Not all review articles carry the same weight:

  • Systematic reviews — pre-registered search protocol, explicit inclusion criteria, quality assessment of included studies (e.g., Cochrane, PRISMA-compliant). Treated similarly to meta-analyses for grading.
  • Narrative reviews — single-author or small-group expert opinion summarizing literature without systematic methodology. Treated as expert opinion — Grade B maximum. Narrative reviews can inform topic context but cannot be the sole support for evidence-grade claims.

Individual response variability

Research reports population-mean effects. Many compounds show substantial responder and non-responder distributions — a positive average can obscure the fact that a meaningful subset of participants saw no benefit. When published studies document that 20% or more of participants showed no meaningful response, content acknowledges this: “for most people, X improves Y; a significant minority see little to no effect.” When the non-responder mechanism is identified — genetic variation, baseline status, or population-specific factors — it is disclosed.

Adversarial review

Every substantive claim is actively challenged — counter-evidence is searched for, including evidence that claims are wrong, overstated, or incomplete. When contradictory evidence exists, it is disclosed alongside the claim. When a claim is found overstated, it is rewritten to match the actual evidence. Claims are assessed for aggregate bias patterns: healthy user bias, reverse causation, publication bias, common confounders, and funder homogeneity. If two or more of these concerns apply to a claim, Grade A language is blocked regardless of individual study quality.

Scope and disclosure rules

Every claim is limited to the population actually studied. Clinical findings in patients with deficiencies or disease are not applied to healthy adults without an explicit qualifier. Results in elderly, trained, or clinical populations do not automatically extend to the general population.

Contradictory evidence of equal or higher quality is disclosed, not omitted. If published Grade A or B research contradicts a stated claim, the contradiction is noted alongside the claim.

Citations older than 10 years are marked (older evidence). For fast-moving fields — gut microbiome, epigenetics, emerging mechanisms — the threshold drops to 5 years. All links are periodically re-verified for dead links and retraction status. Retracted papers are removed immediately and any claims they solely supported are downgraded. If a correction changes the direction or magnitude of an effect, the claim wording is updated.

Where a conflict of interest exists — for example, a senior author affiliated with a company that profits from the studied product — this is flagged inline.

Deficiency vs adequacy

Evidence showing benefit from correcting a deficiency does not support claims of benefit in people who already have adequate levels. If the supporting evidence comes primarily from deficient populations, the claim specifies this — ‘in those with low levels’ or equivalent. Supplementation benefits above adequate intake require evidence specifically from non-deficient populations.

Formulation specificity

Evidence for one form of a nutrient does not automatically extend to other forms. For example, creatine monohydrate and creatine HCl are not interchangeable in evidence terms; magnesium glycinate and magnesium oxide have very different absorption rates. Claims specify the studied formulation. When a product uses a form not directly studied, claims are scoped to cautious language.

Safety claim standards

‘No adverse effects reported’ in a study does not mean safe. Safety claims require prospective safety data with adverse event monitoring, adequate sample size and duration, and population match. Without this, qualified language is used — ‘no serious adverse events reported in studies to date,’ not ‘safe.’ Short-term trial safety does not support long-term safety assertions.


Research terms glossary

Common terms used on this site and in the citations below.

  • RCT — Randomized controlled trial — participants are randomly assigned to treatment or control groups. The gold standard for measuring cause and effect.
  • Meta-analysis — A study that pools data from multiple trials to produce a combined estimate. Stronger than any single study, but only as good as the studies it includes.
  • Systematic review — A structured search and evaluation of all available evidence on a question. Often paired with a meta-analysis.
  • Cohort study — An observational study that follows a group over time to see who develops an outcome. Cannot prove causation — only associations.
  • Grade A — Strong human evidence — multiple pre-registered RCTs from independent groups with hard outcomes and clinically meaningful effects.
  • Grade B — Moderate evidence — single RCTs, observational findings, surrogate-only outcomes, or industry-funded claims awaiting independent replication.
  • Grade C — Limited or early evidence — animal studies, in-vitro research, mechanistic-only data, preprints, or post-hoc subgroup findings.
  • Surrogate endpoint — A biomarker used as a stand-in for a hard outcome. Example: LDL cholesterol instead of heart attack events. Claims based only on surrogates are capped at Grade B.
  • Hard outcome — A directly meaningful clinical result — disease events, mortality, physical function — as opposed to biomarker changes.
  • Pre-registration — When researchers publicly declare their study design and primary outcomes before data collection. Prevents cherry-picking results after the fact.
  • Post-hoc analysis — An analysis not planned before the study started. Higher risk of false positives — treated as exploratory evidence regardless of p-value.
  • Publication bias — Positive results get published more often than null results. A body of literature may overstate an effect because the contradictory studies were never published.
  • ITT — Intention-to-treat analysis — includes all participants as originally assigned, even if they dropped out. Prevents bias from selective dropout.
  • Dose-response — When higher doses produce larger effects in a predictable pattern. Supports a causal relationship but does not guarantee one.
  • Effect size — The magnitude of a result, not just whether it is statistically significant. A tiny but statistically significant effect may not matter in practice.
  • Confidence interval — A range showing how precise an estimate is. Wide intervals mean uncertain results; narrow intervals mean more reliable estimates.
  • COI — Conflict of interest — when a study’s funder profits from a positive result. Industry-funded evidence without independent replication is capped at Grade B.
  • Older evidence — Citation published more than 10 years ago (5 years in fast-moving fields). Flagged because newer research may have changed the picture.

Protein & amino acids

Glycine

Fasting & meal timing

Insulin & metabolic health

Fructose & liver

Fats & cooking

Oxalates & lectins

Processed meat & nitrosamines

Gut health & microbiome

Muscle mass & longevity

Creatine

Sleep

Caffeine

Vitamin D & K2

Polyphenols & olive oil

Cosmetics & endocrine disruptors

Bile & digestion

Thermic effect of food

Electrolytes & minerals

Water & microplastics

Training & exercise

Digestion

Carbohydrates

Nutrition

Nervous System

Circadian Rhythm

Stress Recovery

Nutrition Myths

Atp Metabolism

Nafld

Tmao

Aging

Betaine

Bioavailability

Blood Donation

Blood Pressure

Body Composition

Cancer

Cardiovascular

Exercise Performance

Homocysteine

Hyaluronic Acid

Inflammation

Insulin Sensitivity

Iron

Joint Health

Liver Health

Metabolic Syndrome

Oxidative Stress

Safety

Skin Health

Sweeteners

Taurine