Big data in official statistics

I am at a conference in Gatineau, Quebec (just across the river from Ottawa), which is all about big data and official statistics. It is heavily attended by people from national statistical institutes (e.g. US Census, Statistic Netherlands, and Statistics Canada who organized the event) from around the world, all of them eager to find ways to harness the potential of big data.

By my general rules of innovation in market, opinion, and social research this is the last place to find such a high level of interest in new methods. But as it turns out, they are subject to all of the same forces that are driving the MR focus on big data. By my count there are four trends pushing these folks toward big data solutions:

  1. The classic survey-based methods that have been at the heart of what they do are not yielding the results they once did. This is bound up in the problems of cooperation and unreasonably high response rate targets.
  2. The cost of continuing to do what they have always done is not sustainable in current environments.
  3. They have clients, too, and those clients want more data faster.
  4. The appetite among those clients for reams of tables is fading and they feel the need to move more to storytelling and other alternate ways of publishing their data that make it more useful. They also are under pressure to deliver raw data that can be processed and analyzed outside the agency.

At the same time, they see real barriers they need to overcome. They, too, are familiar:

  1. A dramatic increase in the demand for computing power to accommodate larger datasets.
  2. A shortage of staff with the right skills.
  3. The need to protect individual privacy in a world where classic anonymization techniques no longer work as well as they once did.

To be fair, they have been working with administrative data (i.e. data collected by other government agencies) for decades. For example, the US Census Bureau has been using administrative data to evaluate the accuracy of the survey type data they collect since the 1940s. While their big data efforts continue to have a strong administrative data focus, they also have begun to turn to non-governmental data (e.g. banking data, traffic data, POS data) potentially to replace surveys. In the Nordic countries their administrative databases are so well developed that they can use them in place of traditional censuses.

The folks working in this sector are doing a lot of interesting things. There is much they can teach us.

Leave a Reply

Your email address will not be published. Required fields are marked *