The folks over at Pew have put up a blog post of sorts that that starts what will no doubt be a long, torturous, and, if recent history is any guide, ultimately forgettable series of investigations aimed at trying to determine why, once again, the polls were wrong. The Pew post lays out three potential sources of the error:
- The nonresponse bias inherent in (very) low response rates
- The “silent Trumper” phenomenon, formerly known as “the Bradley effect”
- Flawed likely voter models
The second of these, the “silent Trumper,” has been widely debunked both in general and in the context of this particular election. So let’s put that aside. The sampling process, including likely voter selection, is the heart of the problem. And there we need to look separately at telephone polls relying on probability samples versus online polls that rely primarily on nonprobability samples, if we can even call them that. Let’s start with probability samples.
Probability sampling, with its grounding basic probability theory, is very robust but only as long as it meets three assumptions:
- The existence of a sample frame containing all or nearly all of the target population.
- Random selection of a sample from that frame.
- A high response rate when interviewing that sample.
Violate one of these assumptions and you run the risk of introducing biases that are devilishly difficult to measure and correct.
In the US it is relatively easy to assemble a sample frame that covers the entire US population by using a dual frame that combines both landlines and cell phones. It obviously is important to avoid duplicate respondents when the same household is sampled from both frames. This is not rocket science and is done pretty routinely.
The bigger challenge is the response rate. The long-term decline in respondent cooperation is well known and it afflicts a broad spectrum of research methods that require members of the general public to participate in research. Electoral pollsters and market researchers typically make it worse because their constituencies drive them to field periods of just a few days, making it impossible to thoroughly work the sample to maximize response. And so it’s not unusual to end up with a response rate in the single digits, a perfectly good sample spoiled.
We all understand that for the data we’ve generated to be reliable we need to make a fourth assumption: that those who did not respond are not different in important ways from those who did. Or put another way, that if we had achieved a 100% response rate our survey results would be no different than they are with the 10% response rate we have in hand.
None of us is so naïve as to believe that and so we make some effort to correct the likely bias through weighting. In this we have come to ascribe magical properties to demographics, seemingly to believe that if we can weight the data so that it approximates the known distribution (from the Census, for example) of the target population across age, gender, race, and Census region then we have eliminated bias. As if every person who ticks the same box in each of those categories is just like every other person who ticks those same boxes. A serious investigation to identify all of the potential bias caused by nonresponse is seldom undertaken.
As the final step, we use the fig leaf of margin of error to describe the accuracy of our estimates, a calculation that is heavily influenced by sample size and totally ignores nonresponse. Worse yet, the media may genuflect to the MOE and then go on to breathlessly report one and two percentage differences as proof of who is winning the horsey race.
Defenders of contemporary telephone surveys that rely on probability samples argue that, despite the obvious challenges, these surveys continue to be pretty accurate across a wide variety of benchmarks drawn from non-survey sources. There are two problems with that. First, it gives up the high ground of theory in favor of an empirical justification. It works until it doesn’t. Second, consumers of these data expect a precision that it increasingly elusive. More on that later.
Online surveys that rely on panels, river, exchanges, etc. present even greater challenges. More on that in my next post.
Reg Baker is Executive Director of MRII.