Often we’re asked to test ideas, claims, brand names, slogans, product packaging, etc. Sometimes it’s just a few, and sometimes 100 or more!
If it’s just A vs. B, a simple choice question suffices: “Which do you prefer, A or B?”
To test five different ideas, sometimes a ranking question does the trick: “Please rank these from 1 (best) to 5 (worst).”
But, what if it’s a dozen ideas? Use the dreaded ratings grid on a 5-point scale? What if it’s 60 ideas? 500? What then?
The 5-point ratings scale (even if fully labeled and broken out as separate questions) has problems for contrasting ideas: lack of discrimination (lots of 4s and 5s) and scale-use bias (people using the scale differently, which throws a major monkey wrench into cross-cultural research). Grids with five-point scales are not nearly as effective at identifying the truly best ideas as a newer approach: best-worst scaling.
Best-worst scaling (geeky name: MaxDiff) captures dramatically more info than standard ratings scales and eliminates scale use bias (a silver bullet for cross-cultural comparisons). It’s widely used in market research, and also finds some use in economics and psychology.
How does best-worst scaling work? If you are measuring, say, a dozen flavors of ice cream, you could show four at a time per set (screen). For each set you ask which two flavors the respondent likes most and least. Across multiple sets, each flavor is covered–typically multiple times.
Each respondent could see six sets of four ice cream flavors to cover 12 flavors, each 2x. Twelve clicks, and done! Analyze using logistic regression (or similar), and you obtain metric-scaled MaxDiff scores that are VERY much better than ratings scales.
But, can best-worst scaling (MaxDiff) scale up to 60 items? Indeed! Twelve screens of five at a time covers all 60 items 1x per respondent. Twenty-four clicks and done! The coverage is a bit sparse; but the data are still far superior to ratings scales.
What about 500 ideas? Now we’re raising the bar!
Spending an equal amount of time asking about both stars (great ideas) and dogs (bad ideas) across 500 ideas would waste a lot of effort asking about ideas that aren’t going to compete for any top spots. But how to recognize the dogs ahead of time? A new adaptive approach called Bandit MaxDiff offers a clever and efficient solution.
The Bandit MaxDiff algorithm examines previous respondents’ answers to learn which items are likely to be the stars and the dogs. For each subsequent respondent, it oversamples the stars and rarely asks about the dogs. The last respondents are mainly just comparing stars to stars, leading to excellent precision on the top few items!
What is the bandit in Bandit MaxDiff? One-armed bandit is a slang term for slot machines in casinos. They have one arm (the lever you pull), and they usually take your money (like bandits). For at least the last 60 years, statisticians have been interested in what they have called multi-armed bandit problems. For example, if you want to invest your resources over multiple time periods across multiple activities, each with uncertain outcomes (like pulling different arms across multiple slot machines), how should you allocate the resources (your bets) to maximize the long-term payoff? In the case of Bandit MaxDiff, the invested resources are MaxDiff questions answered and respondents surveyed.
For 120 or more items, Bandit MaxDiff is 4x more efficient than standard MaxDiff at identifying the stars. It does with 250 respondents what would have taken 1000 standard MaxDiff respondents to do, saving 75 cents on the dollar for data collection! N=250 Bandit MaxDiff respondents are enough to do the job for 120 ideas with very high accuracy.
Idea screening on even 500 items is also quite doable with Bandit MaxDiff, with excellent precision on the winning ideas (given about n=2000). Want to test even more items? Just increase the sample size!
Bryan Orme is the president of Sawtooth Software, Inc. Sawtooth Software is a leading provider of advanced tools for interviewing, conjoint analysis, MaxDiff scaling, cluster/ensemble analysis, perceptual mapping, and hierarchical Bayes (HB) estimation. Sawtooth Software is a sponsor of the Principles Express course, Advanced Analytic Techniques.