marketing tipMarketingCRM Strategy

A/B Testing Charity Emails the Right Way

Written by

Published

Annual Report Design on a Tiny Budget: A Practical Playbook - abstract artwork
5 min readPublished 05/03/2026Updated 21/05/2026

Most charity A/B tests are not really tests. They are two emails sent to small lists with results read too early and conclusions drawn too confidently. A disciplined approach that actually moves opens, clicks and donations over a year.

Most charity email A/B testing produces noise dressed up as insight. Two subject lines, sent to small samples, results read at four hours, winner declared, lesson logged, nothing changes. Done at scale over a year, this kind of testing actively makes the email programme worse: every false positive teaches the team something untrue.

Disciplined A/B testing is a different exercise. Done well, it compounds: tests inform each other, the team builds a real understanding of what their audience responds to, and the email programme genuinely improves campaign after campaign.

The five rules of disciplined charity email testing

1. Test one thing at a time

Subject line OR preview text OR send time OR layout. Not three of them in the same test. If the winning variant changes three things, you do not know which change drove the result, and your next test starts from confusion.

2. Use proper sample sizes

5,000 contacts per variant for open rate tests, 10,000 for click rate, 20,000 plus for donation conversion. Below those thresholds, observed differences are usually within the margin of natural variance. Smaller charities are not excluded; they just need to test repeatedly over multiple campaigns and aggregate results.

3. Wait for results

Open data settles within 48 hours, click data within 72 hours, donation data within a week. Reading and acting on partial results is the most common testing mistake in charity email, and it produces false confidence in moves that did not actually work.

4. Document the hypothesis, not just the result

Every test should have a written hypothesis explaining what you expect to happen and why. Without it, results are just trivia. With it, the team builds a body of evidence about what the audience values.

5. Test things that compound

Subject line tests compound less than expected because every campaign has a different subject. Preview text patterns, layout patterns, CTA wording patterns and journey timing all compound across many campaigns. Prioritise the latter.

What to actually test

Subject line patterns, not single subjects

Rather than testing two individual subjects, test patterns: "question vs statement", "named beneficiary vs cause-led", "urgency framing vs invitation framing". Run the same pattern test on several campaigns and look at the pattern-level effect.

Preview text discipline

Test whether having a deliberate preview text (the first line of the email) versus relying on inbox default lifts open rate. The answer is almost always yes, often by 10 percent or more.

Send time clusters

Test broad windows (Tuesday morning vs Sunday evening, etc.) rather than precise minutes. Within a window the difference is noise; between windows it can be substantial.

CTA placement and wording

Test single CTA vs three CTAs. Test wording ("Donate now" vs "Help fund a place at the centre"). Test button colour only if you must, but expect the effect to be tiny.

Journey timing

In automated sequences, test the gap between emails (1 day vs 3 days vs 7 days). The right gap is supporter-dependent and worth testing because it compounds across every contact in the journey forever.

What not to test

Emergency appeals

An appeal launched in response to a disaster is not the time to discover that variant B converts worse. Use battle-tested patterns and ship.

One-off campaigns

Annual gala invitations, one-off matched funding announcements. Insufficient repeat opportunities to learn from the test, and the cost of getting it wrong outweighs the marginal learning.

Cosmetic changes with no hypothesis

Blue button vs green button without a reason produces noise. Save your testing budget for changes that, if confirmed, would change something material in your programme.

The testing calendar

Run one disciplined test per month, focused on a single learning question. Document the hypothesis, the variants and the result. Quarterly, review the cumulative learnings and convert the confirmed patterns into defaults for future campaigns. Annually, retire the documented patterns and run them again to check the audience has not moved.

Twelve tests a year, with proper sample sizes and proper read times, produce more learning than fifty rushed tests. The compounding effect over two years is significant.

Reporting tests honestly

Show the confidence level

Most email platforms display a confidence indicator on tests. Anything below 95 percent should be reported as inconclusive, not as a win.

Show the absolute lift, not just the relative

A 20 percent uplift on a 5 percent click rate is a 1-point change. Report both numbers so trustees and team members understand what the test actually demonstrated.

Log the failures

Tests that show no significant difference are useful evidence: they tell you the change does not matter. Log them with the same discipline as the wins.

The point of testing is not to prove every campaign is improving. It is to build, over a year, a small library of patterns the team can rely on without arguing.

The one-page testing template

  1. Hypothesis (what you expect to happen and why).
  2. Variant A and Variant B definitions, with only one variable changed.
  3. Sample size per variant and total send size.
  4. Primary metric, secondary metrics and read window.
  5. Result, confidence level, and decision (adopt / reject / re-test).
  6. Implication for future campaigns.

Six lines per test. Stored in a shared document. Reviewed quarterly. That is the entire system the average UK charity needs to convert email testing from theatre into compounding programme improvement.

Further reading

Charity Attribution Without Overengineering | Google Analytics 4 Setup for Charities (Without the Pain) | Thank-You Emails That Actually Feel Thankful

Frequently asked questions

How big does our list need to be to run meaningful A/B tests?

For open rate tests, at least 5,000 contacts per variant. For click rate, around 10,000. For donation conversion, usually 20,000 plus, or you need to test repeatedly over several campaigns to accumulate signal.

How long should a test run before reading results?

Wait until at least 80 percent of opens have happened, typically 48 hours for the first wave and a week before drawing donation conclusions. Reading results at four hours is a recipe for false positives.

Should we test on every campaign?

No. Testing on emergency appeals and time-critical sends is a bad trade. Test on regular newsletter cadence and standing welcome journeys, where the audience is large, predictable and consequence-light.

Sources

External references used in this article. Links open on the original publisher’s site.

  1. Fundraising Regulator Code of Fundraising Practice
    Fundraising Regulator · Accessed 21 May 2026
  2. Charity Digital: Email Marketing Resources
    Charity Digital · Accessed 21 May 2026

You might also like:

Annual Report Design on a Tiny Budget: A Practical Playbook - abstract artwork
crm strategy
CRM Strategy,  Digital,  Marketing

A pragmatic attribution approach for UK charities: credit channels fairly, survive cookie loss, and produce decisions fundraisers can act on without an analyst.

Thank-You Emails That Actually Feel Thankful  -  abstract artwork
marketing tip
Marketing,  Storytelling,  Fundraising

Most charity thank-you emails read like receipts. This short, human checklist turns them into something supporters actually feel - and want to respond to.