Way back in 2009, I attended an AAPOR Conference where there was a featured plenary session with Paul Donato, then Chief Research Officer at Nielsen, and Ken Pruitt, former Director of the US Census Bureau. Their topic was the somewhat ponderous, “The Role of Traditional Survey Research in a World of Electronic Measurement and Changing Information Needs.” While both agreed that surveys would remain important for the foreseeable future, there was some disagreement of the issue of sampling. Paul gave a nice overview of the productive and effective use of nonprobability online panels by market researchers, especially, I recall, in pharma. Ken acknowledged that such an approach might work for market research purposes but had a different view when it came to official statistics and public opinion surveys. I remember him saying, “In a democracy, you need to count everybody.” (To state the obvious, he did not mean that literally but rather as a defense of probability sampling.)
I was reminded of that as I read Simon Chadwick’s excellent piece in Research World about bias and especially how it finds its way into AI systems. Others have discussed the bias that results from ~80% of AI professionals being males, and mostly white males at that. But Simon goes on to point out the equally serious problem of bias in the data that AI systems learn from, and in this, the finger gets pointed directly at MR where, to quote Simon, “In an age of convenience samples, digital sample sourcing, automated scripting, and DIY platforms, discussion of bias has all but fallen away.”
Of course, a lack of concern about bias is not just an issue in MR. The data scientists who construct and analyze large databases are equally unconcerned about bias or representativity. They sometimes describe big data as “N=All,” when in fact that often is not the case. Ask a data scientist whether the data he is working with is representative, and you just might get an answer like, ”You don’t have to worry about sampling because these data have hundreds of thousands and even millions of rows.” Of course, the belief that if you have a large number of records bias disappears is an argument we have heard before from online panel enthusiasts.
The point here is the fundamental difference between the data we might need to sell more product to a well-defined audience versus what we need to teach an automatic decision-making system about how to make the right decision, free of the sorts of bias that are regrettably common in today’s society. We are awash in data, but much of it is dirty and incomplete. We need to stop taking it at face value. Or, to paraphrase Ken Pruitt, when it comes to AI, we need to count everybody.
Interested in learning more? Check out our affordable online Principles Express course, Sampling in Market Research. This course is self-paced and easy to complete in 9 hours or less for only $359. Learn more.
We are grateful for the course to be sponsored by Full Circle, a leading global provider of seamless, productive online survey experiences. Such sponsorships have funded the development of our new line of Principles Express courses.