For scientists, the moniker “seeing is believing” may now become “seeing is discovering,” thanks to new open-source software.
Non-bioinformaticians can now create knowledge-networks, a powerful way for biologists to visualize deep connections between genes and phenotypes, quickly and efficiently. This stems from the integration of Rothamsted Research’s open-source KnetMiner software into the Genestack platform.
These new software tools make it easier for plant breeders and others to mine genomics data to find novel ways to improve the performance of all kinds of crops.
Keywan Hassani-Pak, head of bioinformatics at Rothamsted Research and leader of the KnetMiner project, explains.
“Genotype to phenotype analysis is at the core of what biologists do,” he says. “With KnetMiner we have created software that enables biologists to take their own high-throughput experimental data and to see them in the context of all the public knowledge that is out there. This can help them interpret their own data faster and more effectively.
“For a particular target species … KnetMiner integrates all the relevant genomics and omics information that is present in more than 25 sources under a multitude of formats. KnetMiner brings it together in the form of a heterogeneous knowledge network. We don’t only integrate the data; we also create new relationships based, for example, on co-occurrences of genes and phenotypes in the scientific literature. We are the first in the UK to develop such detailed networks and make them mineable. We are talking about up to a million nodes here.”
Plant scientists and others saw the potential of KnetMiner and approached Rothamsted to help them create a secure system they could use with their own data. KnetMiner was an exciting visualization tool but it could take months for each network to be created for a new species, and it was complex to use. With the benefit of Innovate UK funding, Rothamsted worked with Cambridge-based Genestack to migrate KnetMiner onto the Genestack platform.
“The Rothamsted researchers could spend months collecting all the data that was available for a particular organism, cleaning the data and writing scripts to transfer it into a format that was usable in KnetMiner, and then present it so that other scientists could use the information,” says Misha Kapushesky of Genestack. “We migrated the visualization software and automated the collection process by making it part of our Genestack ecosystem.
“It is now possible to simply ‘point and click’ on data that is in the public domain to create a network and then overlay your own data, using KnetMiner to visualize it. You can build your own network with collaborators in a secure environment. It is no longer a fixed set of data on the Rothamsted website but a dynamic tool that can be made commercially available.”
Genestack now hosts more than 40 plant and crop networks, as well as a prototype human disease network. Although it originated in agri-research, network mining for gene discovery is generic, and Genestack provides an environment for building and distributing these large-scale knowledge networks.
Knowledge networks are a way of showing visually the connection between phenotypes with the genotype of a given species. The nodes are different shapes to represent various biological entities (such as genes, publications, or pathways), which are connected by relevant relationships (such as encodes, published in, interacts with). They are a very good way to show complex and highly interconnected biological data.
“There are a lot of tools out there that will return a list of ranked genes when you are conducting a gene candidate analysis, and of course KnetMiner also does that with its evidence-based gene rank algorithm. But most of them also stop there,” says Hassani-Pak. “KnetMiner is unique as it allows users to see how and why the prediction was made.
“They can fully understand the results because the process is completely transparent and the provenance is visualized. There is no black-box approach here.”
With “Network view,” users are able to leverage information present in the network for new discoveries and hypotheses; this in turn can spur ideas for further analysis.
Hassani-Pak and Kapushesky believe that this approach supports human augmented knowledge discovery, which puts human experts — rather than machines — at the core of the decision-making process.
“The human brain is so powerful; we need to free it from tedious tasks,” says Kapushesky. “By reducing the complexity, it makes it easier for researchers to see the patterns and links that push the frontiers of science further, and the tools also make it possible for others to apply the findings in a commercial environment.”