Ask a Caltech Expert: Adam Wierman on the Pros and Cons of Data Centers
With the recent growth of AI, particularly with widespread adoption of large language models (LLMs) such as ChatGPT, there has been a dramatic increase in the need for data centers—facilities that house the computers and data storage systems that make the technology possible. And while much has been said about the large-scale and even global environmental impact of these computing centers in terms of energy and water usage, the local impact on small communities, where large data centers are increasingly being built, is emerging as a point of particular concern.
Adam Wierman, the Carl F Braun Professor of Computing and Mathematical Sciences and director of Information Science and Technology at Caltech says it is crucial to fully understand and manage the impact of data centers on these smaller communities. His group uses mathematics, machine learning, and ideas from optimization, control, and economics to make the networked systems that govern our world more sustainable and resilient. Over the last several years, Wierman and his colleague from UC Riverside, Shaolei Ren, have focused on the public health costs associated with data center–generated air pollution and the water demands of hyperscale data centers.
Wierman recently discussed these issues and his increasing focus on local communities. He underscores that data centers are a necessary part of the modern world, and if they are properly planned for and managed, they can bring resources, jobs, and a flexible energy load to the community.
First off, how do we define data centers?
Data centers are the physical infrastructure needed to run the computing that happens below the surface when you give ChatGPT or another LLM a task. They can be what we call "hyperscale," these massive campuses consisting of multiple buildings, each of which can be the size of multiple football fields. They house a huge number of servers that demand power and the water to cool them on a scale that sometimes rivals entire cities. At the same time, there are data centers that meet more local, so-called "edge computing" needs. These are far less conspicuous and might take up multiple floors of a skyscraper or an abandoned warehouse.
How have data centers grown over the years?
There has been a ton of growth in data centers over the years. When I moved into this research space back in the early 2010s, there was a lot of growth around cloud services, resulting in a big exponential growth in data centers. The electricity needs at that time were a big concern, and research focused intensely on questions like: How do we manage this growth of data centers? How do we make sure that the grid can handle them? How do we make sure that the environmental impact is constrained? And, in some sense, it was a huge success story back then in terms of the impact of academic work. Instead of exponential growth throughout the 2010s, it kind of leveled out for about a decade—the numbers were still growing, but their electricity usage and their impact environmentally was fairly level.
But the recent surge of AI has led to a massive growth rate. In the early 2010s, data centers accounted for about 1 to 1.5 percent of US electricity usage. There were fears that it was going to get up to 3 to 5 percent. We are again in an exponential phase of growth, and people are projecting data centers will hit 15 to 20 percent of US electricity usage in a few years. The question is: Can research constrain that? Or can we at least limit the impact of that on the communities, plan for it, and grow sustainably?
Why is it important to focus heavily now on the communities where these data centers are being built?
A big shift between the 2010s version of this and the 2020s version of this is that in the 2010s, we were focusing on power, carbon, and kind of the global large-scale impact of data centers. Now, the local impacts are really becoming much more binding because this isn't just a large load coming in. This is a massive community-shifting load coming into areas and stressing infrastructure in ways that are hard to imagine.
What are some of the ways in which these local communities are impacted?
First, there's the noise, especially when multiple data centers come in. These data centers rely on backup mobile generators, and they are super loud. They also are about the dirtiest source you can have in terms of pollution for the local community.
I think when you hear the phrase "backup generators," you think, well, they don't use those very often, right? They don't matter very much. But actually, these backup generators run a lot more frequently than you think just because you have to turn them on and off for testing and maintenance. Also, if there is any sort of power imbalance, they are put to use. And they run on diesel or, these days, gas. So they have a pollution impact that is significant and that people don't often associate with data centers. We looked at the health costs of this added pollution, with increases in asthma and heart problems based on well-established models of air dispersion and health impacts from air pollution, in the 2024 paper.
I will add that we are starting to see that, when there is a lack of power generation, people are suggesting that data centers bring their own generators. Here you're not even running it just as backup generation anymore. You're running it all the time. This goes against the push by many utilities to have renewable standards, renewable integration, to reduce the air pollution impact of their fleet of generation for a community. We are seeing Carbon Zero pledges going out the window as hyperscalers start to bring in their own generators to their data center projects.
Then there are the power impacts on the community in terms of potential rate increases if the generation and needs of the data center aren't compensated appropriately with the utilities—something that is very difficult for a utility to get right when figuring out these contracts. There are power-quality issues. The data centers are often very risk averse. If there's some sort of issue in the grid that they're connected to, they'll disconnect, taking a massive load off the grid. That then makes it even more difficult for the grid to stabilize. So, we've seen communities where there are more frequent brownouts or power outages as a result of the data center moving in.
And then there's water. There are examples where data centers come into a community and use more water daily than the entire community used before. And this can use up the local watershed. This is unheard of, historically, in terms of an industry moving into a community. When water first got onto people's radar related to data centers, the focus was on the aggregate total use. That is not huge in terms of percentage of the national level of use. But if you look more locally, there are projects where a data center is moving into a drought-sensitive area and using an absolutely massive amount of the community watershed. That makes a huge impact on water availability for the long haul and water infrastructure locally. Also, peak water usage can be six-to-10-plus times the typical rate. If you've done your capacity analysis as a community based on the average, now you're in big trouble on the hot summer days in terms of the strain you're placing on your infrastructure.
Getting back to air pollution, in Loudoun County in Northern Virginia, there is an area known as Data Center Alley. That community is making decisions about the regulations of data centers that come into it, but the air pollution from those data centers is spreading all the way up the East Coast, impacting Maryland, Delaware, Rhode Island, Pennsylvania, New Jersey. And those communities are feeling the health impacts of the pollution without having any decision-making on what those regulations are and without getting the tax or job benefits from the data centers that move in.
That's quite a list. Are there things that can be done to mitigate the impact of data centers on these local communities?
Yes. Data centers are sometimes viewed primarily as the problem, but there are things we can do to minimize the community impact. It's also important to note that they are a symptom of a larger problem, which is that we have not modernized infrastructure in the US.
One thing that is often overlooked is that unlike other large electric loads that are coming online now (such as the electrification of transportation with electric vehicles and industry), data centers are actually well suited to help. They have flexibility in terms of when and where they use energy and when and where they use water. They have lots of opportunities for flexibility if the infrastructure can partner with them. They're also hugely instrumented, which means they're highly controlled and smart in the way they do resource allocation. And they have lots of capital to invest in infrastructure improvements to enable them to move into a community if the infrastructure is able and willing to take that investment and build on it.
That all means that you can work with a data center and really make a difference on the grid. So, yes, they're a big part of the problem, but they are also potentially a big part of the solution.
We are also at a point where there are just-demonstrated technologies that can work at scale, that can be built quickly, and that can lower the community impact. If you're using adiabatic cooling, which has this sort of closed-loop system, the noise drops. If you avoid having an on-site generator because you have a large backup battery storage on site, then you avoid the noise and pollution impacts from the generators.
If you want to limit water usage, you can actually trade off between evaporative cooling techniques and these adiabatic techniques, and have a water cooler on-site that lets you smooth your water usage and avoid the times when the infrastructure is really strained or when it might be very energy intensive to use the closed-loop system.
Are there places where the right things are being done?
Everybody is struggling. But there are some great success stories. Finland has a data center that I love to point out, where they use the waste heat from the computer to heat the surrounding community. That's an amazing dual use of what you're otherwise spending a huge amount of compute and power and water to eliminate.
There are also differences in how communities are regulating. Canada is much better at regulating noise than the US. We can learn from that.
Even in the US, it's very different depending on the state and the community that you're in. Some communities are starting to really incentivize data centers to participate in demand response programs and rewarding them for that in various ways, which then basically lets the data center participate like a battery would in making it easier for the community to have renewable energy and manage the peaks that come at different times of day.
Part of the goal is to take these best practices that are in lots of different communities and countries and point out that they can all be done together, and that when you put them together, a data center can offer a positive community impact.
What is most concerning about the rising demand for data centers?
The push right now to emphasize the speed in which data centers can deploy is potentially magnifying some of these issues we've been talking about. Right now, there's a really strong push that to compete with China, to compete in AI, we need to build more and larger data centers as fast as possible. As a result, people are looking for ways to allow data centers to move more quickly to construction with less regulation.
In this kind of move-quick-and-break-things approach, the thing that breaks is the local communities. So finding the balance where we can build data centers quickly while we're creating incentives and we're creating the technology and we're really ensuring that the community impact is managed and understood as we build is crucial.
We're working with the Linde Center [Ronald and Maxine Linde Center for Global Environmental Science] now to try to provide a model framework to help with making these data-center deployment decisions, whether you're writing state-level legislation or evaluating a particular project in your community. What factors do you need to consider? What metrics do you need to have transparency around in health and water and power? We're hoping that this summer we will have a model framework for thinking about those decisions.
You can submit your own questions to the Caltech Science Exchange.