Artificial intelligence (AI) learning machines can be trained to solve problems and puzzles on their own instead of using rules that we made for them. But often, researchers do not know what rules the machines make for themselves. Cold Spring Harbor Laboratory (CSHL) Assistant Professor Peter Koo developed a new method that quizzes a machine-learning program to figure out what rules it learned on its own and if they are the right ones.
Computer scientists “train” an AI machine to make predictions by presenting it with a set of data. The machine extracts a series of rules and operations—a model—based on information it encountered during its training. Koo says:
“If you learn general rules about the math instead of memorizing the equations, you know how to solve those equations. So rather than just memorizing those equations, we hope that these models are learning to solve it and now we can give it any equation and it will solve it.”
Koo developed a type of AI called a deep neural network (DNN) to look for patterns in RNA strands that increase the ability of a protein to bind to them. Koo trained his DNN, called Residual Bind (RB), with thousands of RNA sequences matched to protein binding scores, and RB became good at predicting scores for new RNA sequences. But Koo did not know whether the machine was focusing on a short sequence of RNA letters—a motif—that humans might expect, or some other secondary characteristic of the RNA strands that they might not.
Koo and his team developed a new method, called Global Importance Analysis, to test what rules RB generated to make its predictions. He presented the trained network with a carefully designed set of synthetic RNA sequences containing different combinations of motifs and features that the scientists thought might influence RB’s assessments.
They discovered the network considered more than just the spelling of a short motif. It factored in how the RNA strand might fold over and bind to itself, how close one motif is to another, and other features.
Koo hopes to test some key results in a laboratory. But rather than test every prediction in that lab, Koo’s new method acts like a virtual lab. Researchers can design and test millions of different variables computationally, far more than humans could test in a real-world lab.
“Biology is super anecdotal. You can find a sequence, you can find a pattern but you don’t know ‘Is that pattern really important?’ You have to do these interventional experiments. In this case, all my experiments are all done by just asking the neural network.”
The team published their new methods and tools in PLOS Computational Biology. Their tools are now available to everyone online.
Written by: Luis Sandoval, Communications Specialist | sandova@cshl.edu | 516-367-6826
Funding
Cancer Center Support Grant, the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory, National Cancer Institute
Citation
Koo, P., et al., “Global Importance Analysis: An Interpretability Method to Quantify Importance of Genomic Features in Deep Neural Networks”, PLOS Computational Biology, May 13, 2021. DOI: 10.1371/journal.pcbi.1008925
Principal Investigator
Peter Koo
Assistant Professor
Cancer Center Member
Ph.D., Yale University, 2015