175 Zettabytes - A Data Volume Model for Valuing the Cloud
The IDC Estimates that by 2025, world data volume creation will reach 175 Zettabytes. As they allude to in their report - if you could store all of that data on DVD’s - that stack of DVD’s would circle the earth 222 times. It’s an astronomical number.
Additional estimates indicate that the amount of data stored in the public cloud could reach 54% by 2025.
Given these numbers, can we construct revenue growth projections for the big cloud providers using data volume creation each year?
Why Does This Matter?
Investment Opportunity for Cloud Computing
In 2021, AWS, Azure, and Google Cloud brought in nearly $125 Billion in combined revenue.
AWS: FY 2021 Revenue: $62 Billion, $18.5 Billion in Operating Income, 37% YoY Growth
Azure: 2021 (calendar year, and reported within Intelligent Cloud segment) Revenue: ~$43 Billion, 46% YoY Growth
GCP: FY 2021 (reported with GSuite) Revenue $19 Billion, 47% YoY Growth
Each company’s cloud segment is growing at 35%+ per year, with no signs of stopping.
From Amazon 10K (FY 2021): Cash capital expenditures were $35.0 billion, and $55.4 billion in 2020 and 2021, which primarily reflect investments in additional capacity to support our fulfillment operations and in support of continued business growth in technology infrastructure (the majority of which is to support AWS), which investments we expect to continue over time.
Capex continues to grow for the cloud providers - AWS has some $30 Billion allocated on a yearly basis up some 50% from prior year.
AWS has $80 Billion (as of 12/21/21) in contract backlog they expect to fulfill over the next ~4 years.
How long can this growth continue, and what are the implications on the respective stock valuations?
Understanding The Direction of Our Digital World
It's not quite apples to apples, but world GDP is ~95 Trillion. When cloud computing reaches a trillion dollar run rate, it could be ~1% of World GDP.
Everything can now be broken down into 1’s and 0’s at some level. At the compute layer, we invented a way to programmatically control energy and represent any type of information. At the network layer, we have a decentralized way for anyone to ‘plug in’ to a global network and share any of this data instantaneously (the internet). And as Matthew Ball alludes to in his Metaverse Essays, we may soon have new abstraction layers on top of the internet protocols and infrastructure.
The metaverse, AI, self driving cars, blockchain technology - all of these require compute and networking. Will the public cloud capture this growth in data volume and compute? Understanding just how big our digital world could be gives context into what life could be like in the future.
Opportunities for Blockchain and Decentralization
What role will blockchain technologies and decentralization play in the future?
As more workloads move to the cloud, could legacy enterprise data centers find a new role playing a part in decentralized compute or file storage?
If Cloud Computing is a trillion dollar a year market in five years, how big can a world computer (Ethereum) or decentralized storage and protocols become? (IPFS, Filecoin, Siacoin, Storj, Filebase)
(We’ll revisit these ideas in the future)
Forms of Data Storage and Compute
Local Devices
Data can be stored on a local machine. Depending on what type of data this is, that device could be a phone, a laptop, an assembly line robot, a car, a refrigerator, etc. Really anything that has a computer chip in it can store data or process data (and oftentimes transmit the data elsewhere).
Enterprise Data Centers
Enterprises build and manage their own data centers. Depending on the size and industry of the company, these can be massive - Visa and Facebook’s data centers are just two examples of the scale these can reach.
The Public Cloud (Cloud Computing)
AWS, Azure, and GCP lead the Cloud Computing Market, which is effectively an abstraction of running your own data center. This solves problems of capital costs, time to value, scalability, maintenance, and redundancy, and offers a number of additional benefits.
Data Volume Valuation Model
Can we model out expected revenue for each of the major cloud providers given the overall themes of data growth and cloud workloads? The full valuation model and all assumptions are linked below (note these are all estimates):
Data Volume Growth
IDC estimates that 175 Zettabytes of data will be created in 2025, which represents a CAGR of 27%
Percent of Data Retained
Of all of the data created, how much is actually retained and persisted? In this model we estimate 2% (going to 1.5%), which is obtained by looking at past cloud performance and storage estimates and backing into this figure.
Percent of Data Stored in Public Clouds
The volume stored in the cloud will continue to rise from an estimated 42% to 54% over the next five years.
Data Storage Costs
AWS/GCP/Azure have similar pricing on cloud storage. For the model purposes, we use $240/TB/year for hot storage.
We estimate that 3.1% of data stored will be in ‘hot storage’ or more readily available. The remainder will require less frequent access patterns, putting costs at $120/TB/year.
We also estimate that storage costs will drop 5% per year every year.
Lastly, we assume an overall discount of 50% off of marked prices to account for the aggregation of enterprise deals and other long term discounts from the major cloud providers.
Data Storage Revenues
Using all of these variables we can estimate data storage revenue per cloud provider by multiplying:
Data Storage Revenues =
Data Created
x % Data Retained
x % Data Stored in the Cloud
x Data Storage Costs
x % Market Share of Cloud Provider
Compute to Storage Revenue Ratio
Using estimates from the Snowflake pricing page, we estimate a ratio of 5.5 for compute to storage - meaning that for any data stored, the compute revenue (any combination of services) will be 5.5x that amount. This could go up slightly as machine learning becomes more prevalent in future years - offset by the decreases in training costs, etc.
Putting all of these variables together we can project revenues for each major cloud provider. A quick conclusion shows that the variance between the DCF projections and this data model projection are in line and this model offers one more data point confirming the growth potential.
If we refer back to the AWS DCF Model, we can see that this translates into a $750 Billion Valuation for AWS Specifically.
Where Does this Model Fall Short?
This is just one data model for predicting cloud valuations with broad assumptions and high level estimates - any more detail throughout would lend further credibility to the valuations.
It would be incredible to see some of the real data and usage patterns aggregated by AWS/Azure/GCP across services, customers, and regions. This presentation from Peter DeSantis at re:Invent 2021 offers just a glimpse into what an aggregated form of this data looks like - and the cloud companies have an incredible pool of data and insight into our digital world.
International Growth Specifics
Data creation and compute requirements are clustered and will see various growth rates. IDC offers insight into China’s growing market share of the Cloud - could this skew future growth projections and pose challenges for US Based companies?
China’s Datasphere is expected to grow 30% on average over the next 7 years and will be the largest Datasphere of all regions by 2025
Application Specific Data?
The ~5 to 5.5x multiplier of compute to storage is based on just one example and some historical numbers. This could vary by workload and application, and the aggregate could change.
Specific areas of high data volume increase, like self driving cars could take growth out of the cloud and to the edge or enterprise data centers (not the cloud).
New Cloud Competition?
Competition is fierce in cloud. WSJ reported that Google was offering investments in companies to help win deals from Azure and AWS. Can HPE, Oracle, Alibaba gain market share? There are only a handful of companies that can keep pace with the required Capex needed to grow a $100 Billion dollar business at a 30% rate.
Can Decentralized Technologies Take Meaningful Market Share?
It is still very early for blockchain based technologies. Can they continue their growth to become a dominant form of storage or compute?
The Filecoin network stored 25 Petabytes of Data in 2021, up from just 1.5 Petabytes a year earlier. This is not yet a comparison point for the major cloud providers - but if the Filecoin network can 10x for the next 5 years, they could store 2.5 Zettabytes on their network by 2027, which would be meaningful market share.
Conclusion
This model offers one more point of validation as to the anticipated 20-40% YoY growth the major cloud providers could see in the next five years. Some additional macro factors are outlined here.
For Amazon and AWS specifically this article (and video) goes into further detail and has a full DCF model with revenue estimates and additional drivers.
Using an EBITDA multiple of 18x, 5 year revenue CAGR of 24%, and 5 year FCF CAGR at 19%, puts AWS’s valuation alone at ~$750 Billion.
It will be interesting to see how all of this plays out in the next few years, and I welcome any thoughts, ideas, or feedback on the topic.
If you enjoy this kind of content, please subscribe to the Exponential Layers newsletter and YouTube channel to get all updates on new videos and articles.