A NUMBER IS ONLY AS STRONG AS ITS WEAKEST LINK
There’s a time and a place for quoting numbers to several decimal places, but dressage, and other sports in which the marking is subjective, isn’t one of them.
By using this scoring system, the judges were leaving us to assume that we were witnessing scoring of a precision equivalent to measuring a bookshelf to the nearest millimetre. Yet the tool they were using to measure that metaphorical bookshelf was a ruler that measured in 10-centimetre intervals. And it was worse than that, because it’s almost as if the judges each had different rulers and, on another day, that very same performance might have scored anywhere between, say, 89% and 92%. It was a score with potential for a lot of variability – more of which in the next section.
All of this reveals an important principle when looking at statistical measurements of any type. In the same way that a chain is only as strong as its weakest link, a statistic is only as reliable as its most unreliable component. That dinosaur skeleton’s age of 69 million years and 22 days was made up of two components: one was accurate to the nearest million years, the other to the nearest day. Needless to say, the 22 days are irrelevant.
BODY TEMPERATURE A BIT LOW? BLAME IT ON SPURIOUS PRECISION
In 1871, a German physician by the name of Carl Reinhold Wunderlich published a ground-breaking report on his research into human body temperature. The main finding that he wanted to publicise was that the average person’s body temperature is 98.6 degrees Fahrenheit, though this figure will vary quite a bit from person to person.
The figure of 98.6 °F has become gospel, 5 the benchmark body temperature that parents have used ever since when checking if an unwell child has some sort of fever.
Except it turns out that Wunderlich didn’t publish the figure 98.6 °F. He was working in Celsius, and the figure he published was 37 °C, a rounded number, which he qualified by saying that it can vary by up to half a degree, depending on the individual and on where the temperature is taken (armpit or, ahem, orifice).
The figure 98.6 came from the translation of Wunderlich’s report into English. At the time, Fahrenheit was the commonly used scale in Britain. To convert 37 °C to Fahrenheit, you multiply by 9, divide by 5 and add 32; i.e. 37 °C converts to 98.6 °F. So the English translation – which reached a far bigger audience than the German original, gave the figure 98.6 °F as the human norm. Technically, they were right to do this, but the decimal place created a misleading impression. If Wunderlich had quoted the temperature as 37.0 °C, it would have been reasonable to quote this as 98.6 °F, but Wunderlich deliberately didn’t quote his rough figure to the decimal place. For a figure that can vary by nearly a whole degree between healthy individuals, 98.6 °F was (and is) spurious precision. And in any case, a study in 2015 using modern, more accurate thermometers, found that we’ve been getting it wrong all these years, and that the average human temperature is 98.2 °F, not 98.6 °F.
In the General Election of May 2017, there was a shock result in London’s Kensington constituency. The sitting MP was a Conservative with a healthy majority, but in the small hours of the Friday, news came through that the result was too close to call, and there was going to be a recount. Hours later, it was announced that there needed to be a second recount. And then, when even that failed to resolve the result, the staff were given a few hours to get some sleep, and then returned for a third recount the following day.
Finally, the returning officer was able to confirm the result: Labour’s Emma Dent Coad had defeated Victoria Borwick of the Conservatives.
The margin, however, was tiny. Coad won by just 20 votes, with 16,333 to Borwick’s 16,313.
You might expect that if there is one number of which we can be certain, down to the very last digit, it is the number we get when we have counted something.
Yet the truth is that even something as basic as counting the number of votes is prone to error. The person doing the counting might inadvertently pick up two voting slips that are stuck together. Or when they are getting tired, they might make a slip and count 28, 29, 40, 41 … Or they might reject a voting slip that another counter would have accepted, because they reckon that marks have been made against more than one candidate.
As a rule of thumb, some election officials reckon that manual counts can only be relied on within a margin of about 1 in 5,000 (or 0.02%), so with a vote like the one in Kensington, the result of one count might vary by as many as 10 votes when you do a recount. 6
And while each recount will typically produce a slightly different result, there is no guarantee which of these counts is actually the correct figure – if there is a correct figure at all. (In the famously tight US Election of 2000, the result in Florida came down to a ruling on whether voting cards that hadn’t been fully punched through, and had a hanging ‘chad’, counted as legitimate votes or not.)
Re-counting typically stops when it is becoming clear that the error in the count isn’t big enough to affect the result, so the tighter the result, the more recounts there will be. There have twice been UK General Election votes that have had seven recounts, both of them in the 1960s, when the final result was a majority below 10.
All this shows that when it is announced that a candidate such as Coad has received 16,333 votes, it should really be expressed as something vaguer: ‘Almost certainly in the range 16,328 to 16,338’ (or in shorthand, 16,333 ± 5).
If we can’t even trust something as easy to nail down as the number of votes made on physical slips of paper, what hope is there for accurately counting other things that are more fluid?
In 2018, the two Carolina states in the USA were hit by Hurricane Florence, a massive storm that deposited as much as 50 inches of rain in some places. Among the chaos, a vast number of homes lost power for several days. On 18 September, CNN gave this update:
511,000—this was the number of customers without power Monday morning—according to the US Energy Information Administration. Of those, 486,000 were in North Carolina, 15,000 in South Carolina and 15,000 in Virginia. By late Monday, however, the number [of customers without power] in North Carolina had dropped to 342,884.
For most of that short report, numbers were being quoted in thousands. But suddenly, at the end, we were told that the number without power had dropped to 342,884. Even if that number were true, it could only have been true for a period of a few seconds when the figures were collated, because the number of customers without power was changing constantly.
And even the 486,000 figure that was quoted for North Carolina on the Monday morning was a little suspicious – here we had a number being quoted to three significant figures, while the two other states were being quoted as 15,000 – both of which looked suspiciously like they’d been rounded to the nearest 5,000. This is confirmed if you add up the numbers: 15,000 + 15,000 + 486,000 = 516,000, which is 5,000 higher than the total of 511,000 quoted at the start of the story.
So when quoting these figures, there is a choice. They should either be given as a range (‘somewhere between 300,000 and 350,000’) or they should be brutally rounded to just a single significant figure and the qualifying word ‘roughly’ (so, ‘roughly 500,000’). This makes it clear that these are not definitive numbers that could be reproduced if there was a recount.
Читать дальше