Commercially viable biofuel crops are vital to reducing greenhouse gas emissions, and a new tool developed by the Center for Advanced Bioenergy and Bioproducts Innovation (CABBI) should accelerate their development — as well as genetic editing advances overall.
The genomes of crops are tailored by generations of breeding to optimize specific traits, and until recently breeders were limited to selection on naturally occurring diversity. CRISPR/Cas9 gene-editing technology can change this, but the software tools necessary for designing and evaluating CRISPR experiments have so far been based on the needs of editing in mammalian genomes, which don’t share the same characteristics as complex crop genomes.
Enter CROPSR, the first open-source software tool for genome-wide design and evaluation of guide RNA (gRNA) sequences for CRISPR experiments, created by scientists at CABBI, a Department of Energy-funded Bioenergy Research Center (BRC). The genome-wide approach significantly shortens the time required to design a CRISPR experiment, reducing the challenge of working with crops and accelerating gRNA sequence design, evaluation, and validation, according to the study published in BMC Bioinformatics.
“CROPSR provides the scientific community with new methods and a new workflow for performing CRISPR/Cas9 knockout experiments,” said CROPSR developer Hans Müller Paul, a molecular biologist and Ph.D. student with co-author Matthew Hudson, Professor of Crop Sciences at the University of Illinois Urbana-Champaign. “We hope that the new software will accelerate discovery and reduce the number of failed experiments.”
To better meet the needs of crop geneticists, the team built software that lifts restrictions imposed by other packages on design and evaluation of gRNA sequences, the guides used to locate targeted genetic material. Team members also developed a new machine learning model that would not avoid guides for repetitive genomic regions often found in plants, a problem with existing tools. The CROPSR scoring model provided much more accurate predictions, even in non-crop genomes, the authors said.
“The goal was to incorporate features to make life easier for the scientist,” Müller Paul said.
Many crops, particularly bioenergy feedstocks, have highly complex polyploid genomes, with multiple sets of chromosomes. And some gene-editing software tools based on diploid genomes (like those from humans) have trouble with the peculiarities of crop genomes.
“It can sometimes take weeks or months to realize that you don’t have the outcome that you expected,” Müller Paul said.
For example, a trait may be regulated by a collection of genes, particularly one involving plant stress where backup systems are useful. A scientist might design an experiment to knock out one gene and be unaware of another that performs the same function. The problem may not be discovered until the plant matures without altering the trait in any way. It’s a particular issue with crops that require specific weather conditions to grow, where missing a season could mean a year-long delay.
Using a genome-wide approach allowed the scientists to tailor CROPSR for plant use by removing built-in biases found in existing software tools. Because they are based on human or mouse genomes, where multiple copies of genes are less common, those tools penalize gRNA sequences that hit the genome in more than one position, to avoid causing mutations in places where they’re not intended. But with crops, the goal is often to mutate more than one position to knock out all copies of a gene. Previously, scientists sometimes had to design four or five mutation experiments to knock out each gene individually, requiring extra time and effort.
CROPSR can generate a database of usable CRISPR guide RNAs for an entire crop genome. That process is computationally intensive and time-consuming — usually requiring several days — but researchers only have to do it once to build a database that can then be used for ongoing experiments.
So, rather than searching for a targeted gene through an online database, then using current tools to design separate guides for five different locations and doing multiple rounds of experiments, scientists could search for the gene in their own database and see all the guides available. CROPSR would indicate other locations to target in the genome as well. Researchers could select a guide that hits all of the genes, making it much easier and quicker to design the experiment.
“You can just hop into the database, fetch all the information you need, ready to go, and start working,” Müller Paul said. “The less time you spend planning for your experiments, the more time you can spend doing your experiments.”
For CABBI scientists, who often work with repetitive plant genomes, having a gRNA tool that allows them to design functioning guides with confidence “should be a step forward,” he said.
As the name implies, CROPSR was designed with crop genomes in mind, but it’s applicable to any type of genome.
“CROPSR is also based on human genes, as the data availability for crop genes just isn’t there yet,” Müller Paul said, “but we’re looking into some collaborations with other BRCs to provide a more capable prediction based on biophysics to help mitigate some of the issues caused by the lack of data.”
Going forward, he hopes researchers will record their failed results along with successes to help generate the data to train a crop-specific model. If the collaborations pan out, “we could be looking at some very interesting advancements in training machine learning models for CRISPR applications, and potentially to other models as well.”
The study’s other co-authors are Dave Istanto, former CABBI graduate student with Hudson in the U of I Department of Crop Sciences; and Jacob Heldenbrand, former CABBI research programmer with the National Center for Supercomputing Applications at Illinois. Hudson and Müller Paul are also affiliated with the Illinois Informatics Institute and the Carle R. Woese Institute for Genomic Biology.