Living Up to the Hype
We are witnessing the largest and fastest growth of digital infrastructure of all time. Based on the updated Infrastructure Masons State of the industry report, 41GW has been added to the pipeline over the last 12 months and our industry is facing significant challenges to keep up. According to many industry insiders, this number will double again in 2025. Based on the activity in the last year, our industry is on track to double data center capacity globally in two years and triple in five. This is not a trend. It is a tsunami and it continues to build.
My first question is why? What changed? What is driving this unprecedented growth?
My second question is what does it mean to our industry? What should infrastructure players and providers plan for? What’s real? What’s hype? What do we bet on? What do we avoid? How do we pull this off?

The AI Playing Field
To answer these questions we first need to understand the playing field.
- There are two key AI Infrastructure Players – Hyperscalers and everyone else.
- There are two key AI Users – Individuals and Enterprises.
- There are two key AI Use Cases – Training and Inference.
- There are two key AI Risks – Private Data and Sustainable Growth.
- The difference between the trends of the past and the AI boom is adoption.
Individuals found the easy button and are pressing it every second of every day. ChatGPT is the fastest growing application in history. Why? AI tools are making people more productive. Like the advent of Internet search, we have the world at our fingertips. Individuals can become experts on almost anything at any time.
Hyperscalers quickly spotted the market differentiation and doubled down on investments and integration of AI into their platforms. The AI arms race is in full swing and these hyperscalers understand that their future depends on how their AI capabilities keep individuals and enterprises on their platforms.
Conversely, Enterprise companies have been slow to adopt AI. There is a very important reason for this trend that will be apparent after we understand the implications of hyperscaler platforms and large language models (LLM).

The AI Arms Race
The AI arms race is a battle between the biggest companies in the world with the deepest pockets. They are spending hundreds of billions in capital for data centers and hardware and tens of thousands of Megawatt hours resulting in billions of dollars of expense to train LLMs. Apple, AWS, Google, Microsoft, Meta, and Oracle represent over $11T in combined market cap and the majority of the 41GW of data center expansion in the last year. They are also the largest purchasers of NVIDIA GPU servers with Microsoft amassing 1.8 million GPUs by the end of 2024 representing 15% of NVIDIA’s revenue. These numbers are mind blowing. Some shareholders see these massive investments and question the returns. The hyperscalers see it much differently. AI enhancements to their platforms are significant to their bottom line. Adding AI capabilities correlates directly to the number of users and revenue. $150M to train an LLM can lead to billions in sustained revenue. AI capabilities integrated into these platforms will not only retain their user base, it will help it grow. More importantly, these platforms have no choice. Not only does AI have the fastest adoption rate of any technology, it also has the fastest iteration rate. That means it’s an all or nothing battle. To put it bluntly, if these platforms don’t adapt and lead in AI, they lose. The competition is fierce and the stakes have never been higher.
For the last two years we have been enamored with massive AI cluster deployments driving GW data center campuses to training even bigger LLMs. The largest LLM is Microsoft’s GPT4 with 1.7 trillion parameters. Llama 3.1 has 405 billion, PaLM 2 has 340 billion, Falcon has 180 billion and Mixtral 8 has 141 billion. While larger parameter counts represent more data it doesn’t directly correlate to better performance. Some smaller models can outperform bigger models on specific tasks. For example GPT4 essentially has immediate access to all information in digital format up to December 2023 but Mistral 7B is faster at code generation, text processing and mathematical reasoning. Bigger is not always better.

Once models are trained they are used for inference. We essentially “infer” knowledge from these trained models. While training takes massive amounts of power in huge clusters of GPUs for thousands of hours to make a usable model, inference represents small transactions to answer a prompt from a user or a machine.
Based on input from Jeff Clarke, COO of Dell who spoke at Goldman Sachs investor conference in September of 2024, AI LLM training represents 90% of the global AI consumption. 10% is used for AI inference. Jeff predicted that these numbers will flip in the next 5 years. By 2030, 90% of AI consumption will be inference and 10% will be training, with the largest consumers being Enterprise companies. The reason? Enterprises have the largest and most comprehensive private data sets in the world. That data will drive heavy inference consumption.
The Enterprise AI Dilemma
Initially, many Enterprises jumped on the AI bandwagon leveraging ChatGPT and other GPT like services through cloud platforms. Developers uploaded code to help increase their code output, paralegals uploaded legal documents to decrease turnaround times and help build their arguments and influence rulings, business execs uploaded strategy documents, sales models and market research to predict business challenges and opportunities.
That quickly changed when executives realized the risks. Enterprise companies were freely providing these LLMs with their IP and trade secrets. This meant that the next LLM release could provide their competitors access to what differentiated their business. This led to strict policies limiting use of AI including outright bans of the 3rd party solutions such as ChatGPT.

Cloud Repatriation & Inference
These policies significantly slowed Enterprise adoption of AI on traditional hyperscale platforms. It also fueled a trend that raised some alarm bells at hyperscalers. Enterprise companies started taking more deliberate steps to ensure that their company’s private data stays private. Cloud repatriation is the process of moving data, applications, or workloads from a public cloud environment back to on-premises infrastructure or private clouds. This opened the door to the new wave of private cloud platforms that offer dedicated GPU clusters and ML Ops tools in dedicated environments. Lambda Labs, Coreweave, Paperspace, Crusoe, Runpod, Ori, Vast, Tensordock, Cudo Compute, Apolo, Onue, and my own startup Cato Digital, stepped up. These walled gardens allow enterprise customers to safely use their data to drive their own AI differentiation without having to make massive capital investments in data centers and hardware. Another important trend emerged from this shift. Small Language Models (SLM) and Fine Tuning. No need to spend hundreds of millions training an LLM when you can just use the model that someone else trained and customize it with your private data. SLMs are designed to understand and generate human language but are more focused and efficient compared to LLMs. They are smaller in size, usually 500 million to 20 billion parameters, domain specific with specialized datasets tailored for specific tasks, far more efficient making them ideal for edge environments, provide faster inference due to their compact size, and are generally less expensive than LLMs.
AI Fine tuning is a process of adapting a pre-trained model to perform specific tasks by further training it on a smaller, targeted data set. This means an enterprise company can select a pre-trained model, prepare a smaller, task-specific data set such as product, customer support or other company private data, and retrain the model with that new data. The result is a smaller, more efficient model that is customized to that business’s needs. This means that enterprises can now offer tailored experiences for their customer base that is 100% isolated from the cloud platforms. The risk of exposing private data is removed. Moreover, the smaller model size provides a low cost method to continuously fine tune those models. This puts the enterprise companies in control of their destiny.

The Right Tool For The Job
LLM training will continue for the foreseeable future but it will be driven by the largest companies in the world. They need the latest and greatest GPUs from NVIDIA configured in massive clusters operating like one big server. These are the right tools for the jobs as these titans will continue to battle for AI dominance.
Enterprises need something different to compete in the AI era. Cloud repatriation to keep enterprise private data private, SLMs, fine tuning and inference. These workloads do not require the latest GPUs. In fact, they are overkill for the majority. For example, at Cato we offer NVIDIA DGX V100 GPU servers in 8 way and 16 way configurations with 128GB to 512GB of combined VRAM on an SXM motherboard with direct communication between the GPUs via NVIDIA’s NVLink. These are power house machines ideal for SLM training, fine tuning and inference. They are also 1/5th the cost.
Consider this scenario. Fine tuning an SLM to power an industry specific AI chatbot may result in a 10GB file. That file can be loaded into a single V100 32GB GPU. That GPU can provide incredibly fast inference from that model 24x7x365. It can also ingest new data and continuously fine tune that model on the same GPU enhancing the inference in near real time. In fact you could run 16 parallel models that are doing both fine tuning and inference continuously. Cato’s DGX-2 16 way V100 GPU servers are ideal for inference workloads like this. With that many resources in the same machine, they operate like their own cluster. They are the right tool for the job at the right price.
We are in the early innings of AI. Inference will become 90% of AI workload by 2030. Enterprises will become the largest consumers of AI and they need to keep their private data private. The playing field is changing, opening opportunities for many new AI platforms but the hyperscalers will also be innovating to address these challenges and maintain marketshare. Don’t get distracted by the hype. The key is to find the right tool for the job. Feel free to reach out directly if you have any questions, thoughts or questions or if Cato Digital can help you on your AI journey. dean@cato.digital.
ABOUT THE AUTHOR
Dean Nelson is CEO of Cato Digital and the Founder & Chairman of Infrastructure Masons. His 35-year career includes leadership positions at Sun Microsystems, Allegro Networks, eBay, PayPal, and Uber. Dean has deployed 10 billion USD in digital infrastructure across four contents. Since its founding in 2016, iMasons has amassed a global membership representing over 200 billion USD in infrastructure projects spanning 130 countries.