Explainer
Polling Aggregation Explained: What Is a Poll of Polls & How Does It Work?
No single poll should be read in isolation. Individual polls have margins of error of around three percentage points and are subject to house effects. Aggregating multiple polls into a single “poll of polls” reduces these uncertainties and gives a more reliable picture of where public opinion actually sits.
Why averages outperform individual polls
Suppose ten polling firms each independently survey 1,000 people with a margin of error of ±3 points. Some will overestimate Labour by chance; others will underestimate. If the errors are genuinely independent, averaging the ten polls reduces the margin of error to approximately ±1 point.
More practically, any given poll might have had an unusual sample, a data quality issue, or anomalous fieldwork timing (e.g. conducted during a major news event that distorts responses). The average is more robust to any single poll being an outlier.
This is why political scientists and sophisticated poll watchers almost always use aggregated data rather than any single release.
Simple average
The simplest aggregation method is an unweighted average: add up all the figures from n polls and divide by n. If three polls show Labour at 33%, 35%, and 34%, the simple average is 34%.
The simple average treats all polls equally regardless of:
- How recent they are (a three-week-old poll gets the same weight as yesterday’s)
- Their sample size (a poll of 500 gets the same weight as a poll of 2,000)
- The quality or track record of the pollster
For a slow-moving political environment, simple averages work reasonably well. In periods of rapid change, they lag reality.
Weighted aggregation
More sophisticated aggregators apply weights to address the limitations of the simple average. The main weighting dimensions are:
Sample size weighting
A poll of 2,000 has a narrower margin of error than a poll of 500. Sample size weighting gives larger polls proportionally more influence in the average. The relationship is not linear: doubling sample size halves the margin of error, so a 2,000-person poll is twice as informative as a 500-person poll, not four times.
Recency weighting
Public opinion changes over time. A poll from four weeks ago is less informative about current opinion than one from yesterday. Recency weighting reduces the influence of older polls.
A common approach is exponential decay: weight = e−k×d, where d is the number of days since the poll’s fieldwork and k is a decay parameter. With k = 0.1, a poll from 10 days ago has weight 0.37 relative to a poll from today; a poll from 30 days ago has weight 0.05.
The choice of k reflects a judgement about how quickly opinion changes. In a fast-moving campaign, high k (rapid decay) is appropriate. In a quiet inter-election period, lower k is better.
Pollster quality weighting
Some aggregators weight polls by the historical accuracy of the polling firm. A firm with a strong track record of accuracy gets more weight. This requires a substantial historical record to estimate reliably and is controversial because past accuracy is an imperfect guide to future performance.
House effects and correction
House effects are systematic biases in individual firms’ results. If Firm A consistently shows Labour 3 points higher than the industry average, we say Firm A has a house effect of +3 for Labour.
There are two approaches to house effects in aggregation:
- Ignore them: Average the raw figures. House effects partially cancel out if you have enough firms in the average. This is the simpler approach
- Correct for them: Estimate each firm’s house effect and adjust their figures before averaging. This requires estimating house effects from historical data, which introduces its own uncertainty
Some high-profile aggregators (e.g. 538 in the US) correct for house effects as a core part of their methodology. Others, including many UK trackers, take the simpler approach and let averaging handle the issue.
Rolling averages
A rolling average uses only polls conducted within a fixed window, e.g. the last 30 days. As new polls come in, the oldest poll drops out. This gives a continuously updated, current estimate without requiring explicit recency weights.
The choice of window length involves a trade-off:
- Shorter window (7 days): Very current, but only a handful of polls; high remaining noise
- Medium window (30 days): Good balance of currency and noise reduction
- Longer window (90 days): Very smooth, but can lag genuine opinion shifts by weeks
Visualising aggregation: smoothing over noise
How BritPolls builds its tracker
Our Voting Intention Tracker uses a 30-day rolling window. Each poll is weighted by sample size, and more recent polls receive higher weight using a 14-day half-life decay function (weight falls to 50% at 14 days old, 25% at 28 days old).
We do not apply house effect corrections in the main tracker, but our pollster profiles document estimated house effects for each firm based on their recent history. Where a single pollster dominates a period’s data, we note this in the tracker.
We include all polls that:
- Are conducted by a BPC-member or BPC-affiliated pollster
- Have publicly available data tables
- Have a sample of at least 500 respondents
- Use GB or UK adults/registered voters as their population
Limitations of polling aggregation
Aggregation reduces random error but cannot eliminate systematic error. If all polling firms share the same methodological flaw — as they did in 2015 — the average will be wrong even with 20 polls. This is why the opinion polls explainer on methodology matters as much as aggregation technique.
Aggregators should always publish their uncertainty ranges, not just point estimates. A figure of Labour 35%, Conservatives 24%, Reform 27% should come with confidence intervals showing the range of likely true values.
Frequently Asked Questions
What is a poll of polls?
A poll of polls combines multiple individual polls into a single average. Random sampling errors cancel out across polls, giving a more reliable estimate than any single survey.
Why is a polling average better than a single poll?
Individual polls have margins of error of around plus or minus 3 points. Averaging ten independent polls can reduce this to around plus or minus 1 point. The average is also less affected by any one rogue result.
What is recency weighting in polling aggregation?
Recency weighting gives more weight to recent polls, since older data is less informative about current opinion. Exponential decay is a common approach, where weight halves every fixed number of days.
How are house effects handled in polling aggregation?
Either by ignoring them (and relying on averaging to partially cancel them out) or by estimating each firm’s systematic bias and correcting for it before averaging. Both approaches have trade-offs.
How does BritPolls calculate its tracker?
We use a 30-day rolling window with sample-size weighting and a 14-day half-life recency decay. We include BPC-affiliated polls of 500+ respondents with public data tables. House effects are documented in pollster profiles but not corrected in the main tracker.
What is herding in polling and how does aggregation help detect it?
Herding occurs when polling firms converge on each other’s results, consciously or unconsciously fearing a rogue outlier. It makes collective errors invisible: if all firms herd to the same wrong number, the polling average gives false confidence. Aggregation detects herding by testing whether the variance between polls is statistically lower than expected. The 2015 UK polling failure — all firms showing a Conservative-Labour tie that was wrong by 7 points — is the clearest modern example.