Behind the mind-boggling growth of digital data storage

Article By : Dr. Lauro Rizzatti

In 2010, digital data storage requirements hit the "Zetta" prefix, with only one prefix, the "Yotta," left available.

Until the 19th century—let's say until the Napoleonic Wars—life on earth proceeded at a slow pace with no significant differences over long periods of time. If you were a farmer in ancient Egypt, your daily life would not have been much different 2,000 years later under Louis XIV, the Sun King of France, save for possibly somewhat less harsh conditions and slightly more food.

The setting abruptly changed in the 19th century, even for humble farmers. Driven by scientific discoveries and a flurry of inventions, the technological revolution introduced a radical inflexion point and gave rise to massive growth that continues today at an ever-increasing pace. Myths were shattered and questions that had remained unanswered for millennia suddenly found answers, which triggered new questions and opened doors into new fields of human knowledge.

Discoveries in the early 1800s led to new findings in the ensuing decades that, in turn, set the path to breakthroughs and inventions on an accelerated scale unseen by humankind since Homo sapiens first walked the earth.

Where better to look for proof of the exponential progress of the sciences than in the mindboggling escalation of numerical prefixes associated with physical metrics?

The metric system was one of many new ideas conceived during the French Revolution at the close of the 18th century. It was intended to rein in control and order among the many confusing and conflicting systems of weights and measures being used in Europe. Back then, units of length, land area, and weight varied not just from one country to another, but from one region to another within the same country.

The metric system replaced the traditional units with one fundamental unit for each physical quantity, now defined precisely by the International System of Units. Multiples and fractions of these fundamental units are created by adding prefixes to the names of the defined units. These prefixes denote powers of 10, so that metric units are always divided into 10s, 100s, 1,000s, etc.

As originally conceived, the range of prefixes covered six orders of magnitude (106), from one milli (1/1,000) at the low end to one kilo (1,000) at the high end. Over time, these multipliers have been extended in both directions.

About two decades ago, in 1991 to be precise, the 19th General Conference on Weights and Measures extended the list of metric prefixes to the powers of +24 and -24, as illustrated in Table 1.

[storage metric prefixes table]
__Table 1:__ *Metric prefixes defined at the 19th General Conference on Weights and Measures in 1991 (Source: Lauro Rizzatti)*

Are the latest ranges, now covering a space of 48 orders of magnitude (1048), large enough to assure that any physical measurement is going to be included?

The evolution of digital data

Let's take a look at digital data—an area that has seen exponential growth in the past decade or so—which may be classified as either structured or unstructured.

Structured data is highly organised and made up mostly of tables with rows and columns that define their meaning. Examples are Excel spreadsheets and relational databases.

Unstructured data is everything else. Examples include the following:

  • Email messages, instant messages, text messages…
  • Text files, including Word documents, PDFs, and other files such as books, letters, written documents, audio and video transcripts…
  • PowerPoints and SlideShare presentations
  • Audio files of music, voicemails, customer service recordings…
  • Video files that include movies, personal videos, YouTube uploads…
  • Images of pictures, illustrations, memes…

[storage structureddata]
__Figure 1:__ *Graphical representations illustrate the difference between structured and unstructured data (Source: Sherpa Software)*

The volume of unstructured data exploded in the past decade and half. Just compare the size of a text file such as The Divine Comedy—which was translated into English by Henry F. Cary in 1888—at 553kB with the file size of an HD video that stores a movie like The Bourne Identity at 30GB. The difference is of seven orders of magnitude (107) or 10 million times.

Statistics published by venues that track the digital data market are staggering. According to IDC Research, digital data will grow at a compound annual growth rate (CAGR) of 42% through 2020. In the 2010-2020 decade, the world's data will grow by 50X; i.e., from about 1ZB in 2010 to about 50ZB in 2020.

"Between the dawn of civilisation and 2003, we only created five exabytes; now we're creating that amount every two days. By 2020, that figure is predicted to sit at 53 zettabytes (53 trillion gigabytes)—an increase of 50 times," said Hal Varian, chief economist at Google.

And IBM found that humans now create 2.5 quintillion bytes of data daily; that's the equivalent of about half a billion HD movie downloads.

[storage cambrian explosion table]
__Figure 2:__ *The growth of structured versus unstructured data over the past decade shows that unstructured data accounts for more than 90% of all data (Source: Patrick Cheesman)*

Next: Measuring digital data »

Leave a comment