// Editorial Methodology

Our Evidence Standard.

Every claim on Vitae Health is traced back to peer-reviewed research. Here is exactly how we evaluate evidence, because your health decisions deserve nothing less.

// Contents

Evidence HierarchyConfidence Ratings (GRADE)How We Write About ClaimsHow We Choose ResearchWhat We Never CiteHuman ConsultantsAI & TransparencyReport an Error
// Evidence Hierarchy

Not all evidence is created equal.

A single observational study can be confounded by hundreds of variables. Here is the hierarchy we follow, and we always tell you where each claim sits on this ladder.

1
Highest evidence

Systematic Reviews & Meta-Analyses

The highest form of evidence. These systematically identify, critically appraise, and (where appropriate) quantitatively pool all available studies on a topic, giving us the most reliable picture of what the science actually says.

Our primary citation source. When a systematic review exists, we lead with it.

2
Gold standard

Randomized Controlled Trials

The gold standard of individual studies. Random assignment to treatment or control isolates cause from effect, eliminating confounders.

We cite RCTs when no meta-analysis exists, or to highlight a landmark finding.

3

Cohort & Observational Studies

Can reveal associations across large populations and long timeframes, but cannot prove causation. Confounding variables are always present.

We cite these to contextualize and always note the evidence grade. Never as sole proof.

4

Case Reports & Expert Opinion

Individual clinical observations and professional consensus. Useful for generating hypotheses, but not for confirming them.

Mentioned only for historical context or to flag emerging areas lacking RCTs.

5

Anecdotal & Traditional Knowledge

Personal testimonials, folk remedies, and cultural practices. May contain useful signals but cannot be relied upon for health claims.

We never cite anecdotes as evidence. If mentioned, it is clearly labeled as such.

We primarily cite tiers 1-2 — tier 3 only with disclosure
// GRADE Certainty System

Confidence ratings, defined.

GRADE (Grading of Recommendations Assessment, Development and Evaluation) was developed by Guyatt et al. and is adopted by the WHO, Cochrane Collaboration, and over 100 health organizations worldwide. It is the international standard for rating how much confidence we can place in a scientific finding. Each rating means something specific: High Certainty means the evidence is so robust that further research is very unlikely to change the conclusion. Very Low Certainty means the true effect may be substantially different from what current studies show.

HIGH
High Certainty
Begins at: well-conducted RCTs

Further research is very unlikely to change this conclusion. Typically requires multiple well-conducted RCTs with consistent results, adequate sample sizes, low risk of bias, and no detectable publication bias. This is the ceiling. Most health topics do not reach it.

MODERATE
Moderate Certainty
Begins at: single pre-registered RCT or moderate-quality meta-analysis

The true effect is probably close to the estimate, but there is a real possibility it could be substantially different. A single well-powered RCT, or a meta-analysis with moderate heterogeneity (I² 25–50%), typically sits here.

LOW
Low Certainty
Begins at: observational studies (cohort, case-control)

Our confidence in the estimate is limited. Observational studies start here by default. Not because they are useless, but because confounding and bias can substantially distort even large, well-run cohort studies.

VERY LOW
Very Low Certainty
Begins at: case reports, cross-sectional, expert opinion, animal data

Very little confidence in the estimate. The true effect is likely substantially different from what current data suggests. Cross-sectional data, case reports, expert opinion, and most animal studies sit here when discussing human applications.

// Claim Tiers

Three tiers. Three different voices.

The language we use to describe a finding is calibrated to the evidence behind it. We will never use the vocabulary of certainty when the evidence only justifies caution — and we will never hide behind vague hedging when the evidence is solid.

Tier 1High Confidence

Requirements

  • Multiple independent RCTs with consistent results
  • Systematic reviews or meta-analyses of high-quality RCTs (low I², GRADE High/Moderate, no significant publication bias)
  • Mendelian randomization findings consistent with RCT data (MR uses genetic variants assigned at birth as proxies for exposures — providing causal inference without randomizing people's diets or behaviors)
  • For exposures that cannot be randomized: multiple large prospective cohorts across diverse populations with strong, consistent associations (RR >2), supported by a dose-response gradient and plausible biological mechanism

Language we use

"Research consistently shows"
"Multiple trials demonstrate"
"The evidence strongly suggests"
Tier 2Moderate Confidence

Requirements

  • Single well-powered, pre-registered RCT not yet independently replicated
  • Meta-analysis with moderate heterogeneity (I² 25–50%) or GRADE Moderate certainty
  • Strong, consistent observational evidence across multiple large cohorts for a biologically plausible association — explicitly framed as association, never causation

Language we use

"A study published in [journal] found"
"Research suggests"
"Evidence indicates"
Tier 3Preliminary

Requirements

  • Single underpowered, non-pre-registered RCT, or one with methodological concerns
  • Single prospective cohort study
  • Mendelian randomization without corroborating clinical trial evidence
  • Meta-analysis with high heterogeneity (I² >50%) — pooled estimate should be treated with caution

Language we use

"Early research suggests"
"A preliminary study found"
"One study indicates — replication needed"
// The Replication Problem

Why a single study is never enough.

Science has a reproducibility problem. Large-scale replication projects have shown that a substantial proportion of published findings — even those in top journals — fail to replicate when independent researchers attempt to reproduce them. This is not a fringe critique. It is a documented, quantified feature of how modern science works. It is why Vitae Health requires converging evidence from multiple independent studies before treating any finding as established.

36%
of 100 psychology experiments replicated successfully

97% of original studies were statistically significant. Only 36% of replications achieved significance, with average effect sizes roughly half those in the originals.

Open Science Collaboration, Science, 2015
~25%
of published preclinical cancer findings reproduced

Amgen scientists could not reproduce results in 47 of 53 landmark cancer biology papers — a failure rate of approximately 89%.

Begley & Ellis, Nature, 2012
20–25%
of published preclinical findings replicated in internal reviews

Bayer scientists failed to substantially replicate 75–80% of published preclinical findings when attempting to build on them internally.

Prinz et al., Nature Reviews Drug Discovery, 2011
// Hard Limits

What we will never cite as evidence.

These are not judgment calls. They are absolute disqualifiers — evidence types that cannot support a health claim regardless of how they are framed or how prestigious the journal that published them.

Never

Animal or in vitro studies as human evidence

Only 5% of animal-tested interventions obtain regulatory approval for humans (Mak et al., PLOS ONE, 2014 — systematic analysis of 376 therapies across 54 disease areas). A mouse result is a hypothesis, not a finding. In vitro cell studies are even further removed — a cell in a dish behaves nothing like a cell in a living human.

Never

Cross-sectional studies for causal claims

Cross-sectional data captures a single point in time. Because exposure and outcome are measured simultaneously, there is no way to know which came first. Cross-sections establish co-occurrence — never cause.

Never

Predatory journal publications

Papers from journals not indexed in PubMed/MEDLINE carry near-zero evidentiary weight. Red flags: peer review in days, vague journal names like "Global Health Review," no Clarivate impact factor, solicitation by cold email.

Never

Industry-funded studies without independent replication

Industry-funded nutrition reviews are 5–17× more likely to produce favorable conclusions than independent reviews examining the same data: SSB-industry reviews were 5× more likely to find insufficient evidence against sugar-sweetened beverages (Bes-Rastrollo et al., PLOS Medicine, 2013); artificial-sweetener industry reviews were 17× more likely to report favorable findings (Mandrioli et al., PLOS ONE, 2016). Funding source is a systematic predictor of outcome — it must be accounted for.

Never

Studies with unverified or hallucinated PMIDs

AI hallucination of author names, journal names, and statistics is a documented, recurring failure mode. Every citation must be confirmed on PubMed: the PMID must exist, authors/year/journal must match, and the specific claim must appear in the paper.

Never

Observational associations stated as proven causation

The phrase "X causes Y" requires RCT or Mendelian randomization evidence. Cohort studies — however large and well-conducted — establish association, not causation. The canonical example: observational cohort studies (including the Nurses' Health Study) consistently suggested HRT reduced cardiovascular risk; the Women's Health Initiative trial showed the opposite.

Never

Papers where statistical significance is treated as clinical significance

A study of 100,000 people can find a supplement reduces LDL by 0.3 mg/dL at p<0.001. That is statistically significant and clinically irrelevant. Effect size, absolute risk reduction, and NNT matter — relative risk reductions in press releases rarely tell the full story.

// Our Editorial Process

Five steps. Zero shortcuts.

From topic selection to publication, every piece of content passes through a rigorous pipeline designed to ensure accuracy and integrity.

01

Topic Selection

Every topic starts with one question: does this genuinely matter for human health? We select based on biological significance and evidence availability. Trends, virality potential, and advertiser interest play no role. If a topic is already well-understood by the public, we skip it.

02

Literature Review

We search PubMed, Cochrane Library, and major peer-reviewed journals (Lancet, NEJM, BMJ, JAMA, Nature, Cell). We read full papers, not just abstracts. We check funding sources, sample sizes, methodology, and conflict-of-interest disclosures.

03

Evidence Synthesis

Claude AI assists with literature search, data extraction, and initial synthesis across hundreds of papers. But every biological claim, every mechanism, and every conclusion is verified by a human against the original source papers. AI accelerates the process. Humans make every call.

04

Quality Gate

Before anything is published, automated checks verify: every claim has a citation, no banned health buzzwords are present, medical claims are properly qualified, caption length and structure meet standards, and a clear call-to-action exists.

05

Publication & Updates

Content goes live only after passing all checks. But publication is not the end — when new meta-analyses, retractions, or landmark RCTs emerge, we update existing content. Science evolves. Our content evolves with it.

// Human Oversight

AI accelerates. Researchers decide.

Every biological claim on this site goes through human review by people with formal research training. Our process is not just automated quality checks — it includes specialist consultation at the verification stage.

🎓
Research Consultants

Claims in specialized domains are reviewed by domain experts before publication. We consult researchers and academics with relevant postgraduate training to flag errors, misrepresentations, or misapplied findings.

🔬
Independent Verification

Our science verifier cross-checks every PMID against PubMed before a post is written. But human consultants go further: they read the full paper, assess methodology, and flag conclusions that exceed what the data supports.

Ongoing Error Correction

When a consultant or reader identifies an error in published content, we correct it publicly and disclose what changed. Getting it right matters more than appearing infallible.

// Red Lines

What we will never do.

These are not aspirations. They are non-negotiable commitments that define the boundary between education and exploitation.

Never

Recommend without evidence

If we can't point to a peer-reviewed study, we don't make the claim. Full stop.

Never

Use fear-based messaging

We inform with empathy and urgency — never guilt, shame, or manufactured panic.

Never

Sell products we don't believe in

We have no products, no affiliates, no sponsors. When that changes, this principle won't.

Never

Hide behind vague "studies show"

Every claim links to the actual paper. PMID, journal, year. You can check our work.

Never

Claim to replace medical advice

We are educators, not clinicians. We help you ask better questions — your doctor provides the answers.

// Transparency

How we use AI.

We use Claude AI by Anthropic as a research assistant. Here's exactly what that means — and what it doesn't.

AI + Human

Literature search & synthesis

AI scans thousands of papers across PubMed, Cochrane, and major journals in minutes — surfacing relevant studies that would take weeks to find manually.

AI + Human

Human verification of every claim

No AI-generated statement reaches publication without a human verifying it against the original source paper. AI suggests — humans confirm.

AI + Human

Broader coverage, no shortcuts

AI allows us to cover more of health science, faster — sleep, gut, hormones, toxins, longevity — without sacrificing depth or accuracy on any single topic.

AI + Human

Full transparency

We openly disclose that AI assists our research process. We believe transparency about methodology is itself a form of evidence-based thinking.

// Our Sources

Where our evidence comes from.

We always link to the original paper. You can check every claim we make.

PubMed / MEDLINE
Cochrane Library
The Lancet
NEJM
BMJ
JAMA
Nature
Cell
// Foundational References

The papers that underpin this framework.

These primary sources form the scientific basis of our evidence evaluation methodology.

Guyatt G et al.GRADE: an emerging consensus on rating quality of evidence and strength of recommendations
BMJ · 2008
Open Science CollaborationEstimating the reproducibility of psychological science
Science · 2015
Begley CG & Ellis LMDrug development: Raise standards for preclinical cancer research
Nature · 2012
Prinz F et al.Believe it or not: how much can we rely on published data on potential drug targets?
Nature Reviews Drug Discovery · 2011
Mak IW et al.Lost in translation: animal models and clinical trials in cancer treatment
PLOS ONE · 2014
Bes-Rastrollo M et al.Financial conflicts of interest and reporting bias regarding the association between sugar-sweetened beverages and weight gain
PLOS Medicine · 2013
Mandrioli D et al.Relationship between research outcomes and risk of bias, study design, and industry involvement in studies of health effects of artificial sweeteners
PLOS ONE · 2016
Higgins JPT et al.Measuring inconsistency in meta-analyses
BMJ · 2003
Sterne JAC et al.RoB 2: A revised tool for assessing risk of bias in randomised trials
BMJ · 2019
Hill ABThe environment and disease: association or causation?
Proceedings of the Royal Society of Medicine · 1965

Found an error?

Help us stay accurate. If you spot a misquoted study, an outdated finding, or a claim that doesn't hold up — we want to know. Corrections make us better.

corrections@vitaehealthco.com
All corrections credited Responses within 48h
// See it in action

Want to see this rigor applied every week?

The Weekly Dose breaks down one peer-reviewed study every Tuesday — with evidence grade, mechanism, and plain-language takeaway. No fluff, no selling.

Subscribe to The Weekly Dose

Free. Every Tuesday. One click to unsubscribe.