Arguing with data

In this chapter we explore how you can use data in public policy, and what it means to be in control over methodology and the data used in the debates you are in. Data can be a powerful tool, but it is no silver bullet.

I think vs data show in tech

One of the management stories about tech is that whenever in a discussion you lose if you start by saying “I think” in a discussion – what you should have said instead is “data show” – the use of the plural form of the verb signaling that you know that data is plural, and the willingness to defer to data showing that you have the right engineering mindset.

As all management stories this is exaggerated. A lot of discussions in tech are still decided by what the HIPPO thinks – the view of the Highest Paid Person in the rOom still matters. But there is a grain of truth here – tech and tech policy is focused on data in a way that can be both helpful and detrimental to success.

Data has a self-evident role to play in our toolbox, but it needs to be managed well.

Arguing with data

If we were all rational calculators, we would only need data to make decisions. We would look at the data and then decide course of action. This rational ideal hides behind a lot of the discussions you will end up having with internal stakeholders in your organization, and the implication is that politics is irrational.

This is wrong on a couple of counts. First, it is a misunderstanding of what the political question is. We have noted that politics is not concerned with matters of logically true or false, neither is it concerned with questions of legal or illegal – the fundamental political question is much more complex, and interesting: how do we live together?

No data set answers that question, and to the extent your definition of rationality requires that questions can be answered with data, then, well, politics is arational rather than irrational. Irrationality would be interpreting the data randomly or erroneously, but that is not what is happening here. Politics simply is uninterested, at a macro-level, with data sets.

We can lament that, and say that we should be more rational – but that matters little or not at all. Politics is an art to itself, a life form that has its own cognitive patterns.

That does not mean that economic data is useless, at all. But it means that you have to explore different strategies for arguing with data. There are a number of different usecases for good data including the following:

  • Using data to undermine a story. Say that you want to argue against making privacy policies mandatory. The story around privacy policies is strong – it is one that suggests that people should have a right to choose, and that transparency around what happens to their data will inform their decisions. How can you address that story with data? One way is to look at what it would really take for someone to read all the privacy policies that they encounter – a study that was actually done, and showed that Americans would have to spend 200 hours a year to just read through and understand privacy policies, and the aggregate cost of that could be as much as 365 billion US dollars.[1] Now, this did not mean that privacy policies went away, but it remains a useful data point for anyone arguing that the problem in privacy is not disclosure ex ante as much as the ability to, say, take your data and leave ex post if you do not agree with the way your data is processed. 
  • Using data to shift a frame. Here there are several different ways of proceeding – but the key is to identify the frame you want to change, and then find a new frame that works better. This is hard – old frames do not die easily – but it is useful in one-on-one conversations. An interesting example is the calculations around the value of search engines that were inspired by Hal Varian’s work. In one of these studies the researchers gave on group a library and another could use a search engine, and then they compared how long it took to find the answers to a set of questions. This was done in a study published in 2013, and the finding was that on average the search engine saved around 10-15 minutes. Then multiply that by the billions of searches a day, and calculate the value at minimum wage…the value of the new technology here turns out to be significant – and suddenly we are no longer discussing advertising but time savings.[2]
  • Using data in score cards to demonstrate responsibility and commitment. One of the more common methods of using data is defensively, when asked if a company is addressing a concern in a responsible way. One example of this was when the social media companies were accused of not removing terrorist content fast enough. Progress was then measured in a couple of different ways – including the average turnaround time. This metric was hard for the companies, as it required active monitoring of the entire corpus of content – so alternative measures were introduced, such as the average number of people exposed to content before it was taken down, and the amount of content that was never published at all was also included. The result ended up being a robust scorecard that companies have become increasingly comfortable being measured against. Here, the scorecard model was further strengthened by the development of an academic network that could provide more data on the problem – something that showed the complexity and spectrum of challenges here. In a sense this was much less a shifting of a frame as it was complicating a narrative – something that is an important use for data. The simple stories are always the most effective, and damaging, and there is much to be won by just showing that something is not so simple.[3]

There are many more use cases, but these give you an idea of things to do with data. But you should also note that in none of these cases the data angle has proven to be a game changer. This is important: you do not win with data, but you can set the world up for a new narrative and the real competition is between stories, not between data sets.

There are no instances of a data point winning a political argument.  

Economic impact data and stories

A special case here is the use of data to tell an economic impact story. The idea of economic impact is old, but the tech policy teams started to think about how to deploy such studies in 2009/10 as the the optimism around technology started to wane.

Companies started to have discussions about what needed to be true for the technology sector to be seen as a force for good, and one of the key conclusions was that in order to really be taken seriously the Internet companies needed to shift perception from the Internet as a fun place to the Internet as an industry sector with real economic significance.

“We need”, one of our bosses noted,”to show that we are a net benefit. We can agree and admit that there are challenges with hate speech, counterfeit goods and many other issues – but when you look at the whole, when you calculate the net benefit, we still come out positive.”

This was quite the leap for an industry that enjoyed decades of coddled status – but it was prescient. The Internet industry would increasingly be seen as extractive, and more and more politicians started doubting if the Internet was really turning out to be as great as it was made out to be. One US senator went so far as to say in a meeting with one of our companies that “this Internet thing was a mistake, wasn’t it” in what became a real wake up call for everyone.

So the decision was made to try to show that the Internet was just like manufacturing or logistics or mining – an industry to be taken seriously. And a number of different strategies were employed to reach that goal.

One approach was to calculate the overall value of the Internet as a sector as a percentage of GDP. Many of the models used here made serious economists cringe, but the story the numbers told were powerful: the Internet would, if a sector, represent anywhere between 5-15% of GDP in Western economies. The serious economists were right in judging the models rickety at best and napkin calculations at worst – but they worked because they were part of a story that everyone could agree with: it was time to take the Internet more seriously.

Other studies focused on individual companies and jobs, arguing that tech companies were supporting job creation across a number of sectors and that high growth companies used more tech than others, so indirectly tech was associated with driving high growth and employment.

Technology companies and the Internet evolved into a serious issue, and suddenly became a core issue for many politicians. The core concern of politics is, after all, jobs and growth, and we were arguing that the Internet was creating both.

Today economic impact reports are a standard repertoire in the larger consultancies portfolio of services and the evaluation of how many jobs a company creates or how it is driving growth are common – almost table stakes. But at that time the idea that the Internet was an important economic sector was still surprisingly new. Sure, everyone though that would happen – but that it was already happening was new.

Economic impact reports are helpful, but need to be done carefully and if at all possible with data that a company has unique insights into. LinkedIn, for example, has unique insights into how to think about labour markets. Other companies are great at tracking sales, and advertising companies can look at spending per sector and draw conclusions from these data.

But they also have a downside.

As we were launching these reports one of us had a small working meeting in Brussels, and we gamed out future attacks against the company – how we should be attacked by a smart and versatile opponent. These exercises – red teaming – are very useful ways of understanding weaknesses, because the people working in your policy team will automatically be sensitive to push back and weaknesses in the arguments that they are making.

In this particular case, one person in comms came up with a very simple scenario.

“So, say that we succeed with showing the economic impact we have”, he suggested, “then the next question is obvious, isn’t it? If you are so valuable, where is your tax?”

This turned out to be exactly right, and the emphasis on economic impact drove regulatory attention on other areas as well. By cementing the view that tech was much more important and core to economies, tech companies also invited a whole new level of scrutiny.

That was inevitable, over the long term, but we are almost certain that it was accelerated by the focus on economic impact. Now, that was not a bad thing, at all, but it was an interesting second order effect of the use of data that we had not fully realized until we saw the data and stories around economic impact really take off.

Mistake! One core mistake in these studies is to overstate your case. When a Facebook study claimed in 2015 that the company had created 4.5 million jobs, the response was lukewarm at best, suggesting that the overall numbers were not just off, but that the company was cynically misusing the data to overstate their importance. The model used was also criticized as using weird “output multipliers” that essentially were grasped out of thin air. If you release studies that overstate your importance, you tend to end up losing credibility overall! 

The other take-away from these reports is that they can live much longer than you think. A report like this allows you to use numbers that are 2-3 years old, and still make your case. It is not always the case that data need to be fresh to be useful in making your argument.

Data takers and data makers

All organizations are data takers or data makers. This is true internally as well as externally, and we would stress that for a policy team to be really free in the way it addresses its problems, it also needs to produce some of the data that is used to frame the challenges that it faces. Far too often this is left to another team – say, marketing – and the result is that your ability to frame the problem and challenges the company is facing is eroded severely by data produced in a methodological framework that can be really ill-suited for public policy or government affairs.

And this holds true externally as well. If you are a data taker in an area, you are not just on the defensive when it comes to the data presented, but you have lost the methodological high ground. The methods and frameworks used to produce the data are core commanding heights that you need to control, unless you want the yardsticks used to measure your worth be constructed and owned by others!

As you chart the issues you are working on, it is useful to note if you are a data maker or a data taker – and what it would take to change that. Sometimes it is better to make data together with others in your industry and this can also help spread the often significant costs that you will face as you start to produce data.

Some may react to the fact that we say data ”maker” – that seems wrong, doesn’t it? What happens here should be that we collect data and then analyze it – shouldn’t it? Well, the reality is that data is made – there are numerous simplifications and model compromises that mean that the data collected is not naturally objective piece of a universal truth. Data is made, and that is fine – and you should be transparent about how it is made, but you should not believe that it is fine for anyone to collect it because it “will show the same thing”.

Data makers have a huge strategic advantage. 

The limits of data

We have already said that data only goes so far, and that the real battle is between stories, but there are other limits of data as well. One of the more important limits is that data always comes from a frame of reference, as we just suggested, and that the interpretation is important.

Think about all of the surveys that ask people if they care about privacy. It turns out, whenever these studies are launched that 80-90% of people care deeply about privacy. But what does that mean? Imagine the situation: you are asked by a stranger to assess if you care about privacy – who answers that in the negative? Doing so would almost invite the judgment of the other person that you are not worth privacy at all! The framing, the question, produces data that is close to worthless.

Now, we do not argue with the idea that people may care about their privacy, but the better way to approach this question is to ask people if they think others care about privacy – because that will allow them to see the question as a question about attitudes generally rather than about their own personal worth as an individual.

The distinction between first person / third person in statistical studies is well known, but unless you are attentive to the methodology, you end up with stories that say that 90% of internet users are “worried about privacy”.

And, for the record, we do believe that people care about privacy – but that their concerns are much more complex and layered than is suggested by the fuzzy use of “worried”. There are real issues here – effectively obscured – by bad research methodology.

Data are only as good as the methodology it is produced by, and only as effective as the stories they are embedded in.

Conclusion

Data really matters, and getting the right data can help you frame, open and change discourse. But it is no silver bullet, and the way it is made matters. There is a competitive advantage for policy in being a data maker rather than a data taker.

As you approach this here are a few questions that can be helpful.

  1. What is the story you would like to tell about your issue / company and what would be the single most useful data point in telling it? Explore that data – but don’t make it up. Search for supporting points that can be turned into powerful story points rather than data points. 
  2. Don’t let the perfect be the enemy of the good. It is fine to be rigorous about methodology, but you should be aware that not everyone else is, and if you play at a higher level of rigor you may lose out. This is not science, but cannot be fiction either. How rigorous are your opponents?
  3. Explore economic impact reports and see if there are unique data sets that your organization can use to make points about jobs and growth! How do you contribute net on net?

Are you a data maker or a data taker in your core issues?


[1] See Anderson, N”Study: Reading online privacy studies could cost 365 US billion a year”, Ars Technica 2008-09-10 available at https://arstechnica.com/tech-policy/2008/10/study-reading-online-privacy-policies-could-cost-365-billion-a-year/ accessed on 2021-07-16.

[2] See Chen, Y, YoungJoo Jeon, G and Kim, Y ”A day without a Search Engine: An Experimental Study of Online and Offline Searches” available at http://yanchen.people.si.umich.edu/papers/VOS_2013_03.pdf accessed on 2021-07-16

[3] See The Global Network on Extremism and Technology at https://gnet-research.org/ accessed on 2021-07-16