Pictured above are the results of the national vote from 2016-2022 compared to an average of the last 3 polls by Marist from those cycles, a poll generally regarded as a“High-Quality Pollster.” Since RCP’s blatant fuck up in 2022 where they dropped polls conducted less than 2 weeks before election day so their aggregates only included right-leaning polls, there’s been a lot of talk about “High-Quality Polling Averages” being the best way to predict elections. In theory, this sounds correct. If you aggregate pollsters who have been the most accurate in the past, you should get a relatively accurate result. However, there are two problems with this in practice: first, the polls generally considered to be “high quality” are just well-funded media polls with a far from perfect track record and second, pollsters are inconsistent.
If you follow me on Twitter, you’ve probably seen this before. Recently, I ranked pollsters based on the average of polls they conducted in the month before an election since 2016, with a greater weight being given to more recent elections. First looking at average error ranking, the bottom half includes 2 highly-regarded University polls and 3 Mainstream Media polls. I don’t think I’ve ever seen a high-quality polling average that doesn’t include NBC News, Marist, and Quinnipiac. Yet, statistically, these are all national polls that miss by over 3.5% on average. That’s a bad national poll. I think there are 2 major reasons why people do this, with one being malicious. The first is we generally associate well-known brands with being high-quality. We all know NBC News and CNN, so when they release a poll, we give it more attention than “HarrisX” a relatively unknown poll, despite HarrisX having a much better track record than NBC and CNN. The second is the malicious one: it’s really easy to lie with polling data. Because polls have different methodologies and inherent biases, we get large differences from poll to poll. For example, currently in the RCP average is Harvard/Harris which has Trump leading by 6, and Reuters/Ipsos which has Biden leading by 2. If I group the pollsters that have been more pro-Trump throughout the cycle, I get a Trump+2.8, but if I group the pro-Biden ones, I get Biden+1.2. This means as long as there are a few polls with your candidate in the lead, you can easily make an average that has your candidate ahead by calling it a “High-Quality Polling Average” when it’s actually just an average of the most favorable pollsters to your candidate. Although the largest problem with these exclusive averages is they often include the wrong pollster, there’s another very important one.
Polling Errors Are Never The Same
For this point, I’m going to use the Quinnipiac University poll as an example, though I could do the same with several others. In 2016 and 2018, Quinnipiac was a relatively accurate national poll. The average of their final 3 polls in 2016 was Clinton+4, a miss easily inside the margin of error and they missed by even less than that, just 1.6% in 2018. With this in mind, it would be common sense to put Quinnipiac in a high-quality polling average for 2020. They’d been within 2% on the national vote in the last 2 cycles, making them a good pollster, right? Nope. In 2020, the average of Quinnipiac’s final 3 polls was Biden+10.3% a nearly 6-point miss, making the second worst pollster I scored in 2020 after previous success. Fundamentally, an accurate poll requires luck. Polls have a larger margin of error than many of us realize, which can lead to a pollster having success one cycle, but failing the next. We can use past polling errors to get a relatively good idea of whether or not a pollster will be accurate, but there are always going to be surprises. Polling is a far from perfect science.
My Solution
I personally believe the best kind of polling average is a weighted average which includes just about every poll released within the last month. A weighted polling average gives greater importance to polls released recently and from pollsters with a smaller average error in the past. While this doesn’t completely account for pollsters having inconsistent cycles, it still gives pollsters who’ve struggled in the past a(n albeit smaller) impact on the aggregate without allowing major outliers to skew the entire thing. Overall, we have no clue which pollsters will get it right this time. But that’s not going to stop partisans from skewing their averages.