Mimsy Were the Borogoves

Editorials: Where I rant to the wall about politics. And sometimes the wall rants back.

False positives, the Internet, and the grievance media

Jerry Stratton, January 5, 2015

There is a little known, except among statisticians1 rule of statistics: when the rate of false positives exceeds the incident rate in a population, that test is more likely to be wrong than to be right about some incident having occurred. It doesn’t matter how accurate the test is, if the false positive rate exceeds the incident rate.

For example, say you have a cancer test that is correct 98% of the time. This means2 that it has a false positive rate of 2%. Since it is wrong 2% of the time, 2% of the time it will say that someone has that cancer when, in fact, they are fine.

Now, suppose that this particular cancer occurs in one out of a hundred thousand people. Some concerned politician of a ten-million population city says, we have this test that is practically always correct, and we have a lot of people with this cancer. We should run this test on everybody.

What happens after the city runs its test on its ten million residents? The test will tell 98 people who have the cancer that they have it.3 And it will tell 200,000 people who don’t have the cancer that they have cancer.

There is a further rule of thumb that, the bigger your population the lower your incident rate for any non-trivial occurrence, just because of the way people work. That cancer test might have made sense when used against patients who come in to have something looked at: it might well be that among patients who come in for an examination for some problem, and who are, after they talk to a doctor, referred to this test, are one in ten likely to have this cancer. The population is a population of people who have something wrong, and that something wrong already resembles this cancer. In that population, of, say, a hundred patients referred to the test, the test will tell about nine or ten of them that they have the cancer when in fact they do have it, and will tell one or two of them that they have cancer when instead they are cancer free.

But expand the test’s population beyond people who in conjunction with their doctors know they are sick, and the test falls apart.

I think we are seeing the same thing in the explosion of false rape reports and false hate crimes in the news media. Most women don’t lie about rape, and most people don’t enjoy being hated. Limit the population that gets reported on to those who call the police and file a police report and whose cases are then prosecuted, and you’re probably going to have mostly true cases reported in the media.

Start trolling the entire population for juicy stories, as Rolling Stone did, and you will find juicy stories—and chances are, many of them will be lies. The traditional test doesn’t work outside of the traditional population. Sometimes it will be right, but a non-trivial number of times it will be wrong.

I agree with those who argue that the University of Virginia rape story failing doesn’t invalidate having a national conversation, but the conversation needs to be about the competence of our news media and the applicability of their reporting methods. The UVA story, for example, makes me wonder if Justice Thomas was an early victim of the paradox: his opponents searched very hard for something to use against him during his confirmation hearings.

How many other scandals have been the result of using a flawed sieve to sort through populations larger than the sieve was meant for?

The same goes for witnesses to events that certainly happened, such as the Michael Brown shooting. Search deep enough, and you can find a witness who will say what you want to hear. But if the test is merely that they’re saying it, then you’re going to have a whole lot of false positives.

Combined with the media’s tendency to stop looking when they find the answer they want, the false positive paradox virtually guarantees large numbers of “hype positives”.

In What Your Children are Doing on the Information Highway I wrote of the Internet as a word processor for social relations, and that however strange or unique your beliefs, someone on the net shares them. If the media chooses to use the Internet as a searchable database of preconceptions, they will be able to find stories that match whatever line they want, and they will be able to find people to give them those stories.

In response to Confirmation journalism and the death penalty: Iterative journalism is like the Red Queen in Alice in Wonderland: “Sentence first, verdict after.” The Elements of Journalism praises David Protess’s project that railroaded a mentally disabled man into prison for fourteen years, because it served their bias.

  1. And seemingly too little known among statisticians, or at least statisticians who talk to the media.

  2. In heavily simplified terms, of course, since rates always vary around the average.

  3. Technically, 98 people on average.

  1. Televised disdain ->