It’s the sampling, stupid! (Part 2)

In yesterday’s post I offered a quick overview of the potential pitfalls of telephone polls that rely on probability samples. The major worry is bias created by extraordinarily high nonresponse. When 90% or more of your sample chooses not to do your survey you should assume bias and have some plan to correct it. Most rely on simple corrections like demographic weighting, but that might not be enough to produce a reasonably representative sample. Already I have seen pollsters explain their struggle to accurately predict Trump’s victory by saying, “We didn’t have enough white males without a college degree in our samples.” That seems pretty astounding given that it was widely understood that this was a key demographic for Trump.

But I digress and need to get back to the even more worrisome issue of online sampling. At least with telephone surveys you have a shot at a reasonably complete sample frame from which to draw your sample. With online you seldom have such a list. Mostly you have either a large number of volunteers with lots of bias already cooked in or, with river samples and exchanges, no frame at all. (There are panels that use traditional probability sampling methods but they are few in number.)

The lack of a sample frame that includes all or nearly all of the target population makes the job of the sampler exponentially more difficult. In this context random selection makes no sense. Rather, the sampler must build a model that specifies exactly what the sample should look like if it is going to be a reasonable representation of the target population. This is more than getting the demographics right. Ideally, the model also would specify the distribution of those behaviors and attitudes that correlate with voting preferences and control them in sample selection so that they are distributed within the sample in the same way as in the target population. That is a very tall order because precise knowledge about the “correct” distributions is exceedingly hard to come by. Few online polling organizations do this well. Most opt for relatively simple models that use quota sampling to control only a few key variables, mostly demographics, in their samples.

The beauty of probability sampling is that all of this happens more or less automatically through random selection from a high quality frame. With online panels we have no such guarantee.

Online polls also face the same non-response challenges as telephone polls. Quota sampling is a convenient operational control but it has no effect on bias in the variables that not controlled. Post survey weighting is sometimes used to bring the sample in line with what the pollster thinks is the right distribution. As with telephone surveys, this can be as much art as science. People who study the behavior of polls over time often claim to see evidence of herding, that is the tendency for the polls to converge as we get closer to the election. Weighting is one way for a pollster to join the herd. This is true for both telephone and online polls. How widely it is practiced is anyone’s guess.

In the 2015 UK national elections the polls were, in the words of a report issued by the British Polling Council, “the most inaccurate since election polling first began in the UK in 1945.” The report pointed to unrepresentative samples as the key culprit. I expect we will see more of the same from inquiries into the recent US results.

But there is more to the story than bad frames, high nonresponse, and inadequate sample selection and adjustment models. After all, electoral polls are do not rely on general population surveys. Some confine themselves to registered voters while others aim to interview only those most likely to vote. More on that in my next post.

Reg Baker is Executive Director of MRII.

One thought on “It’s the sampling, stupid! (Part 2)

  1. Hi Reg
    The theoretical challenges of convenience sampling (my name for online panels) and random probability samples are, as you say, exponentially different. But, the reality of online samples and what telephone researchers achieve appears to be marginal. In the report you quote about the British polling debacle the report notes “Considering the final poll estimates (see Table 1 section 3), there is no difference inaccuracy between online and telephone polls.”

    Why do random probability polls do no better than convenience samples? The jury is out on that, but here are a couple of suggestions:
    1) the telephone samples are so far away from being random probability samples that they lose any benefit accruing from that approach.
    2) the online panel people know they are working with dodgy material, so they put in place more modelling to get their predictions closer to the result.

Leave a Reply

Your email address will not be published. Required fields are marked *