In today’s digital world data is the most precious commodity. According to Statista about 181 zettabytes of data will be created in the year 2025 and through some very crude Microsoft Excel analysis I’m predicting we’ll enter the Yottabyte era sometime between 2030 – 2035. These numbers are almost too large to comprehend but clearly signify the value our society puts on all things data. And the most successful technology companies in the world have built their thrones on the backs of bits and bytes. This blog post will explore how much data exists in the known universe, discuss why data is so important and not so boldly go where many have gone before (yes I am Treky).

We’re going to focus specifically on the B2C retail space — more specially, on ecommerce companies. The opportunity to generate and analyze data in an ecommerce-first world is simply exponentially higher than it is for purely brick-and-mortar shops. At the most basic level, understanding your online customer comes from understanding the data they generate. Data comes in many different flavors from many different places including click streams, purchases, interactions, searches… the list goes on and on.

The Zettabyte Era

We are currently in what’s known as the Zettabyte Era; simply meaning the amount of data created in the known universe (because of we do have a data footprint in space) has reached multiple Zettabytes yearly. How we do we define data? Well we’ll look to Statista again and they say this includes all data created, captured, copied and consumed. But I digress, to truly understand what a Zettabyte is we need to dissect it a bit so we will start with a single byte and move on from there. A byte is the space required to store a single character of text in a modern-day computer system according to Wikipedia.

  • Byte = one character of text
  • Megabyte = 1000 kilobytes
  • Gigabyte = 1000 Megabytes
  • Terabyte = 1000 Gigabytes
  • Petabyte = 1000 Terabytes
  • Zettabyte = 1000 Petabytes
  • Yottabyte = 1000 Zettabytes
  • Brontobyte (not official) = 1000 Yottabytes

That means that 1 Zettabyte is equal to 1 sextillion bytes or 1,000,000,000,000,000,000,000 (that’s a lot of zeros!). The sheer volume of data generated today demonstrates how important, impactful, and valuable data continues to be in our society. Every second of every day new data is being generated, analyzed, and pondered over – all in the name of gaining insights. Even this blog post and the related podcast have contributed to the increasing global data footprint.

Amazon – The Sultan of Data

If you’re like me, an avid Amazon user, then services like the Amazon Marketplace, Amazon Prime, Amazon Alexa, Amazon Kindle, etc. are a mainstay in your daily life. If you submit a “Request my Data” form prepared to be astonished and maybe even a little terrified at the amount of data you’ll receive back. Jeff Bezos realized early on that data will play an outsized roll in customer engagement and analysis in the years and decades to come. And as such I have dubbed Amazon The Sultan of Data.

One point I want to quickly make: I don’t think generating and analyzing data in such a way in inherently bad. Especially if you’re an online retailer trying to figure out what your customer base wants. It’s how the data is ultimately used and misused that causes issues. Consumer needs identified through the mining and analysis of consumer data are the main driver of innovation without a doubt. In fact, it’s exactly this type of analysis of data at Amazon that led to the birth of AWS in a very indirect way. AWS was created because the Amazon marketplace needed to become a hyperscale solution to keep up with customer demands. According to TechCrunch Amazon built out a series of decoupled API access fashion services to create this hyperscale solution. These services were eventually used to build out the first iterations of AWS. In this way, Amazon was able to draw insights from a totally unrelated set of consumer data from a need for a high-performing website. And as we know, when it comes to eCommerce “slow” is the new “down.”

For me personally, if a website is simply down I’ll give them another chance because they didn’t waste much of my time. If their website is impossibly slow and I’m forced to slog through what seems like endless minutes only to run out of patience it will be taken as a personal affront to my honor and out of principle I can never return. But back to Amazon; they were able to determine before anyone else that the tech world needed cloud services and that a public cloud provider would generate huge demand. Other businesses and marketplaces like Amazon would also need high performing websites. In this way, customer needs drove innovation in a very indirect way that — without a doubt — has changed the world. Amazon’s ability to continuously reinvent themselves have kept them sitting on top a mound of gold built off data. Therefore, it’s imperative for retail and ecommerce companies to understand their consumers. If you’re mining and refining data correctly and gaining real, actual customer insights, you’ll never know where it can lead. But you can bet wherever it does will almost certainly be accompanied by success. On the flip side of that coin, if you’re not doing this, you’re not living up to your full potential as a company and as a brand. The most valuable IT departments are those that enable the business to innovate consistently over a long period of time by really partnering with the business. If you’re busy putting out fires and dealing with downtime or – even worse in today’s ecommerce world – a slow website, you’re losing customers and money.

Making Your Data Work for You

Now for the million-dollar question: Where can you invest your IT dollars to provide your business counterparts with the customer insights they crave? Firstly, secondly and thirdly if you’re still running most of your infrastructure on-premises you’re doing it wrong, I’m sorry to say. The cloud is the only place, not the best place, the ONLY place, you can continuously transform and re:Invent (pun intended) your business. Maybe, and it’s a very big maybe, you can keep on managing some form of transformation for the next few years on-premises. But the future belongs to the cloud so migrate now or forever hold your place in the graveyard of fallen giants who too thought they could resist winds of change. To bolster my point consider the following points:

  • Time to Implementation: In a capital expense world, going from anticipating an infrastructure need to having it racked, stacked, and ready to use can take months. Given the further exasperation of supply chain issues because of COVID, this has gotten worse. To be fair, the cloud has also experienced resource constraints. But those wait times pale in comparison to hardware order submissions, fulfillment, and deployment.
  • Failing Fast: If you want to order a new piece of hardware, especially a hyper convergent solution meant to support a new system, you will likely need to conduct a lengthy assessment to determine the best hardware and software to use as well as identify datacenter requirements and operational limitations. If that assessment misses the mark, you’ve wasted a substantial amount of money and might be stuck with a system that doesn’t support your needs for years to come. And if you have a system that doesn’t support your needs, there’s no way you’re going to be generating accurate consumer data and gain accurate insights. Worse than that, these types of failures waste a significant amount of team effort, affect team morale, and may result in turnover.

Consider these two points when using cloud services. First, your assessment will likely surround a POC, so it won’t be theoretical. You may even be able to generate some data that can help you understand current limitations, improve operations, etc. The POC will also give your team hands-on experience while keeping immediate and future costs to a minimum. If the POC doesn’t work as intended, you can simply walk away having gained some intellectual capital (e.g. data) in the process. Maybe in the future you can utilize that service in a different way. You can try more, do more, fail faster, conduct more experiments, and ultimately generate more data and garner more insights when you operate in the cloud. It is simply easier to handle the consumption, storage, and analysis of data in the cloud – especially when you consider the rapid advancement and development of AI and Machine Learning cloud services. Ultimately, the more agile and nimble you are in your approach, the greater chance you have of successfully executing a strategy that will capture those ever-elusive customer insights.

Transforming You Business According to Google

Google has a very insightful white paper on cloud adoption outlining the cloud maturity lifecycle as they see it. It’s an important read and I want to quote them directly on what it means to achieve the highest level of cloud adoption:

“Existing data is transparently shared. New data is collected and analyzed. The predictive and prescriptive analytics of machine learning applied. Your people and processes are being transformed, which further supports the technological changes. IT is no longer a cost center, but has become instead a partner to the business.”

The white paper explains in detail how to achieve this “transformational” level of cloud adoption and I recommend checking out both AWS’s and Microsoft’s cloud adoption frameworks as well.

Keeping It Stateless

I want to make one final note before we conclude. Data should be one of the only, if not the only, stateful aspects of your IT footprint. Everything surrounding your core business-critical data should be as stateless, decoupled, and ephemeral as possible. The goal should be to treat infrastructure as code and to build in automation at every layer of your stack. You should be able to nuke anything that doesn’t need to maintain data over long periods of time and rebuild it quickly and easily in an automatic fashion. While on-premises datacenters fail much more often than those of cloud providers, no infrastructure is infallible. Make sure that you’re maintaining multiple copies of your business-critical data across fault domains, availability zones, regions, and even cloud providers. This will ensure fast recovery and minimal fallout. How mobile, replicated, and secure your data will determine how prepared you are for disaster. Losing infrastructure is recoverable. Losing business-critical data is not.


Take a good long look at your organization and how it generates, maintains and analyzes data. And ask yourself; are you a transformative organization or are you simply keeping the lights on? If you can’t answer this question unequivocally, I assure you it’s the latter.

Thanks for checking out my blog! You can find the accompanying podcast on Spotify. And please follow me on LinkedIn for more insights 🙂