The Future Of Hard Drive.
On September 14th 1956 IBM announced the first commercial computer to use a magnetic hard disk for storage. Weighing in at about 1000 kilograms, the 305 RAMAC was the world’s most expensive jukebox. It stored 4.4 megabytes on 50 double-sided disks, each one measuring two feet in diameter and spinning 1200 times a minute.
Companies could lease the machine for US$3200 per month - roughly equivalent to paying US$100 million annually for a gigabyte of storage today.
Almost 70 years later, a gigabyte of storage costs pennies. Their bytes are stored not in one jukebox, but in a great many of them: sliced up, replicated and distributed over a vast collection of computers and storage devices in massive data centres scattered across the world. In a word, the cloud.
The cloud is an abstraction of everything one could do on a 305 RAMAC and more.
It endeavours to separate the actions of storing, retrieving and computing on data from the physical constraints of doing so. To users, the cloud is a big virtual drawer or backpack into which you can put your digital stuff for safe-keeping, and later retrieve it to work on (or play with) anywhere at any time. It does not matter to you where or how - or indeed in how many pieces divided among various hardware devices your data is kept.
But to cloud providers the cloud is profoundly physical. They must build and maintain the physical components of the cloud and the illusion that goes with it, keeping up as the world produces more data that needs storing, sorting and crunching. The quantities of data being created are ever growing too.
In 2023 the world generated around 123 zettabytes (that is, 123 trillion gigabytes) of data, according to International Data Corporation, a market-research firm.
Picture a tower of DVDs growing more than 1km higher every second until, after a year, it reaches more than halfway to Mars. This data must be stored in different ways for different purposes, from spreadsheets that need to be available instantly, as on a bookshelf, to archival material that can be put in an attic.
How is it possible to do all this in an orderly, easily retrievable way?
For a start, it helps to recognise the technical leaps in storage that have made the cloud possible. For each type of data and computational task there are different kinds of physical storage with trade-offs between cost, durability and speed of access. Much like the layers of the internet, the cloud needs these multiple layers of storage to be flexible enough to adapt to any kind of future use.
Inside an unassuming building in Didcot, England, in the Scientific Computing Department at the Rutherford Appleton Laboratory, one of Britain’s national scientific-research labs, sit Asterix and Obelix, two stewards of massive quantities of data. They are robotically managed tape libraries—respectively the largest and second-largest in Europe. Together Asterix and Obelix store and keep organised the deluge of scientific data that comes in from particle-physics experiments at the Large Hadron Collider, along with various other sorts of climate and astronomy research.
Asterix and Obelix form a sizeable chunk of the lab’s self-contained cloud (its computing power is conveniently located in the same room).
Together the two can store 440000 terabytes of data - equivalent to a million copies of the three "Lord of the Rings" films, extended edition, in 4k resolution.
Each is made up of a row of cabinets packed with tape cartridges; if all the cartridges were unspooled, the tape would stretch from Athens to Sydney.
When a scientist requests data from an experiment, one of several robots zooms horizontally on a set of rails to find the right cabinet, and vertically on another set of rails to find the right tape. It then removes the tape and scans through the reel in order to find the requested information. The whole process can take up to a minute.
Magnetic tape, similar to that used in old audio cassette tapes, might seem like an odd choice for storing advanced scientific research. But modern tape is incredibly cheap and dense (its data density has increased by an average of 34% annually for decades). This has been made possible by reducing the size of the magnetic particles—called “grains”—in which information is stored and by packing them more closely together.
A single cartridge, maybe the size of two side-by-side audio cassettes, can hold 40 terabytes of data. That equates to almost 1m 305 RAMACs. Plus, it is durable and requires little energy to maintain. These qualities make tape the storage medium of choice not only for this scientific data, but also for big chunks of the cloud at Amazon, Google and Microsoft.
Flash memory, in common use on laptops and phones, is best for when data needs to be frequently looked up or modified, like recent photos.
Solid-state drives save data by trapping or releasing electrons in a grid of flash-memory cells. Retrieving the data is as simple as checking for the presence of electrons in each cell, and involves no moving mechanical parts; it takes about one-tenth of a millisecond, though if it is in the cloud instead of on your phone, add a few dozen milliseconds for delivery from the data centre. The data remains even when the power is turned off, though memory will eventually degrade as electrons leak out of the cells.
As new photos you take go to a data centre, your older ones get demoted from flash to old-fashioned hard-disk drives spread across multiple data centres, most likely including some in the country or at least the continent you live in.
These read and write data mechanically onto a spinning magnetic disk, not dissimilar from the 305 RAMAC, and are more than five times cheaper per gigabyte of storage than flash (though that gap is closing). Retrieval takes a sloth-like 5-10 milliseconds.
Even on the side of the cloud provider, the exact physical device on which data is stored is abstracted away. One way that this is often done is called RAID (redundant array of independent disks). This takes a bunch of storage hardware devices and treats them as one virtualised storage shed. Some versions of RAID split up a photo into multiple parts so that no single piece of hardware has all of it, but rather several storage devices have slightly overlapping fragments. Even if two pieces of hardware break (and hardware failures happen all the time), the photo is still recoverable.
The cloud is also redundant in another way. Each piece of data will be stored in at least three separate locations. This means that were a hurricane or tornado or wildfire to destroy one of the data centres that had a copy of your photo, it would have two copies left to fall back on. This redundancy helps make cloud storage reliable. It also means that most of the time, millions of hard-disk drives are spinning on standby, just in case.
Still, companies are working on making the infrastructure of the cloud more robust. Tape, in particular, has its disadvantages as a long-term storage medium. It must be kept within a certain range of temperatures and humidities, and away from strong magnetic fields, which could erase the information. And it requires replacing every decade or two. So the hunt is on for something that takes up less room, lasts longer and requires less maintenance.
One promising medium is glass. A fast and precise laser etches tiny dots in multiple layers within platters of glass 75mm square and 2mm thick. Information is stored in the length, width, depth, size and orientation of each dot.
Encoding information in glass in this way is the modern equivalent of etching in stone. If you fry, boil, scratch or even microwave glass slides, you can still read the data.
Researchers at Microsoft are harnessing this tech to build a cloud out of glass. They increased capacity so that each slide can hold just over 75 gigabytes, and used machine-learning to improve reading speed. They claim their slides will last for 10000 years. Microsoft has developed a system (much like the tape robots) that can handle thousands, or even millions, of these slides.
Achieving this kind of scale, without the need to supply power to storage shelves or to replace the storage devices themselves, is necessary to build a truly durable foundation for the cloud.