Kibis 'n' bits | News | Coagulopath

An exercise: what word would you use to describe a goldsmith who, to enrich himself, intentionally confuses the avoirdupois and troy weights of gold?

Probably “thief”.

What word would you call a construction supplier who conflates long tons and short tons? Or a financier who mixes up British billions and US billions?

Probably the same.

But in consumer electronics it’s common to see a disk drive advertised as having a certain storage capacity (such as 1TB):

Only to install it, and have your computer inform you that it’s actually about 7% smaller than that (931GB).

This isn’t a mistake. It’s a deceptive marketing practice caused by confusion about what a kilobyte or a megabyte actually refers to.

Humans in western society do their sums in base 10. We tend to express large numbers as “a lot of 10s”.

1 000 000 000 000 tera 1012
1 000 000 000 giga 109
1 000 000 mega 106
1 000 kilo 103

Due to engineering reasons[1]One big reason is reliability of state. On/off switches (ie, binary) are the most error-resistant way of storing information, as they only have to capture two states. The transistor is either on or … Continue reading, computers work in base 2 (ie, binary). They express large numbers as “a lot of 2s”.

1 099 511 627 776 tebi 240
1 073 741 824 gibi 230
1 048 576 mebi 220
1 024 kibi 210

Note that there’s a completely different set of SI prefixes when you work in base 2. Due to “convention” (which usually means “some corporate programmer in 1978 decided to do things this way, and we’re still stuck with it”), most computing applications refer to data storage in the base-10 prefixes.

Note that bytes (or more accurately bits, 8 of which produce a byte) are not an SI unit. They’re a dimensionless quantity indicating an on/off state. In 1997 the IEEE Standards Board recommended that SI prefixes be used for measures of data storage, they also noted that binary prefixes are also acceptable usage.

When a computer tells you it has “100GB”, there’s confusion about whether this is 100 gibibytes (107.374 gigabytes) or 100 gigabytes (93.1323 gibibytes). Additional levels of confusion appear because this convention is applied inconsistently at both ends. Apple products such as iPhones and iPads usually report their disk space in base-10. And bandwidth is usually measured in base-10.

You might say that “TB == a trillion bytes” is correct for some value of TB, so the labelling isn’t misleading. But it seems obvious to me that consumers calculate their needs according to how much space they have left on their computer – ie, their decisions are being guided by base-2.

It would be relatively simple for manufacturers to advertise their products in tebibytes – or to at least explain on the packaging what the difference is. But most of them don’t seem to do that.

 

References

References
1 One big reason is reliability of state. On/off switches (ie, binary) are the most error-resistant way of storing information, as they only have to capture two states. The transistor is either on or off. But imagine a switch that somehow has to store ten states. The circuit is now susceptible to noise, drift, and voltage swings. In principle, it’s possible to build a computer that uses base-10, and such have been made – such as the Harwell Dekatron, which uses vacuum tubes to measure state.