How Is AI Revolutionizing the Biological Sciences?
I like to break down this intersection between AI and the life sciences by the types of data that are commonplace in biological research:
- Imaging data, which tell you how living systems vary in space and time
- Sequence data, which tell you how the "parts lists" vary across different organisms and within organisms
- Structural data, which tell you how biological molecules interact with and change the physical world
For each one of these different data domains, there are new AI methods that can do some remarkable things.
Before AI, solving problems in biology involving these types of data was not necessarily impossible, but it was very hard, particularly to solve problems in a general way. Most of the solutions were tailored for a particular data set, a particular system, or a particular setup, and you couldn't effectively share any solutions that were developed across labs.
Broadly speaking, what my lab has been trying to do is to build the software systems necessary to solve problems for all biological images of all varieties and all systems. It's an ambitious goal, but we see uniform software challenges across all these different systems, from imaging cells dynamically in dishes to imaging them in their native environment in fixed tissues.
In some of our collaborations, we are exploring the architecture of solid tumors, where tumor cells live, where the immune cells live, and how they talk to each other. Our software has been successful at automating this exploration. In our own lab, we're interested in how cells encode information. For these experiments, we need to find cells in static images of tissues and link them over time so that we have consistent records of a cell's behavior. With AI, we hope to be able to increase the resolution of what we can detect to the level of a single cell. The infrastructure that you need to do something like this at scale hasn't really existed until recently.
The intersection of biology and AI has moved very quickly in the past five or six years. Before then, scientists would manually collect data by looking at images and movies, or we would build a computer-vision program and humans would manually fix the errors the program made for each data set. That manual intervention takes an inordinate amount of time; for a single research paper's worth of data, it could easily be somewhere between three to six months.
Now we're at the point where some of these AI systems perform as well as humans at these tasks. You might do some quality control to make sure the outputs look reasonable, but usually they do, and you can move forward with your studies.
When I started working in this space around 2014, it was clear that AI algorithms were going to be able to replace manual work, but it was a matter of figuring out the right architectures, giving the AI the labeled data that it needed to learn how to perform these tasks, and then creating software systems for making these algorithms accessible. Now, we're starting to see progress on all three fronts. There have been advances across the whole computer-vision space that have improved the accuracy and data efficiency of the AI algorithms, and labs like ours have made substantial efforts to build software tools that make these methods available for everybody to use.
Over the next five or so years, I'm looking forward to advances in two areas of AI for biology. One is automating analysis. Today we use AI to process the raw data that's being produced by different imaging platforms. What does science look like if you don't have to wait six months or so to get the answers from your imaging measurements? What if you got them the next day or the same day, or on the fly? How would that change how you do science? The other trend that I see is using AI to design experiments. How would you change your experimental designs to enable more accurate and more scalable measurements?
—David Van Valen, assistant professor of biology and biological engineering; Heritage Medical Research Institute Investigator
You can submit your own questions to the Caltech Science Exchange.