Predictive analytics is the same as the multivariate analysis we have been doing all along. The only difference is that in traditional analysis we have to explain the relationships we identify and in predictive analytics we don’t. Suppose we find that when we increase the quality of a product, consumers buy more. Then our explanation will go something like “Consumers like higher quality products. Therefore when we improve the quality of a product, they buy more.” We causally connect the two. In predictive analytics, we do the same analysis. Only this time we say “If you increase the quality rating by 1 point, your sales will rise by x units.” We don’t necessarily have to say why.
This is good in a way. We can use the findings without having to worry as to why they work. So predictive analytics has become the favorite of big data and machine learning algorithms. But there is a price to pay. By not having to explain the relationships, we elevate correlations to the level of causation. The counter argument to this is that these are the days of big data. Big data, because of its size, can identify enduring patterns quickly, so we don’t have to worry about using correlations as causations.
There is some truth to this, but not much. Correlations can and do exist without causations. The most dramatic example of this is, of course, the failure of Google search to predict the spread of flu. As everyone probably knows by now, for several years Google accurately predicted the spread of flu ahead of the CDC. This was very helpful in terms of moving drugs and allocating medical resources to different regions. After years of predicting the spread of flu accurately, one year Google got it spectacularly wrong. It was off by some 49%. Why? Google based its predictions on conversations and searches on Google about the flu. Apparently, there are other reasons that can influence these factors.
To give you a personal example, when I ran into a client of mine, and she asked me why I did not respond to her request to work on her project. I went back and checked my spam folder. Sure enough, it was there because her email was headlined with the name of a popular drug for erectile dysfunction and the machine learning predictive model had decided that it was spam.
Of course, Google can and does improve its algorithms constantly, thereby improving its accuracy. Maybe we will get there some day. Meanwhile let’s remember that predictive analytics is good from many perspectives, but not perfect. So if you are do predictive modeling, you might want to review your findings to make sure it makes logical sense. You may still make mistakes, but maybe fewer of them.
Chuck Chakrapani is President, Leger Analytics.