Broad Institute Puts Genedata’s Screener to Work for High-Throughput Screening Data Analysis
Genedata announced today that the Broad Institute is using its Screener software platform to manage and analyze high-throughput screening data as part of its participation in the National Institutes of Health’s Molecular Libraries Roadmap Initiative.
“We’re screening upwards of 50 assays and analyzing more than 20 million wells of screening data” per year, Dave DeCaprio, associate director of the Chemical Biology Platform at the Broad Institute, said. He told BioInform that the Broad has been using the software since May, and that has reduced the time for data analysis from a “few weeks” to hours.
DeCaprio said that he and his colleagues chose Screener because it “is able to analyze a lot of data,” but only needs a “nominally powered-server.” The software sits on the Broad’s local server farm and “the client component gets served out through a web browser.”
Since follow-up biology and chemistry is “extremely expensive” after high-throughput screening, it’s critical to be able to query data from the screens as quickly as possible in order to identify potential problems, he said.
“One of the things we wanted was the ability to have strong analytics on the data, so we could automatically detect problems,” while at the same time offering an interactive feature so he and his team can intervene while looking at the data, he said. “They could quickly jump into the data and see what it really looked like.”
DeCaprio and his team validated the software against “some existing systems we had and existing approaches” from the “top five vendors,” but he declined to offer vendor or software names.
Open source tools he did not wish to name were also part of the evaluation but he said the team didn’t see anything that would support “what we wanted to do in terms of the visualization capabilities.”
“There are great algorithms you can get,” DeCaprio said. “The combination of algorithms, visualization, and manual curation is, I think, incredibly essential to getting high quality data out. That’s not something we saw anywhere else.”
“The key thing for us was the ability to integrate manual curation of the data with the algorithms,” DeCaprio said.
A user can look at all the results from a six-week screen, “visually spot some problems and make corrections, and the algorithms would adjust the data analysis based on that,” he said.
Small Molecule Test Drive
DeCaprio is responsible for the Broad’s infrastructure for small-molecule screening and further development, which includes informatics, compound management, data analysis, and analytical chemistry, as well as the procedural side of the work, he said.
The platform is a “public screening center,” he said. The Broad Institute’s Probe Development Center is one of nine centers funded under the NIH’s Molecular Libraries Probe Production Centers Network, which kicked off last year with $70 million in funding over four years to accelerate the pace at which small molecule probes are developed.
Typical customers for the Broad’s chemical biology platform are scientists who have identified a potential molecule and would like to put it “in front of 350,000 compounds,” he told BioInform.
These scientists can first apply to the NIH, and if accepted, the Broad Institute takes in their assay, runs the screen, and does follow-up chemistry “to basically get them to a chemical probe” DeCaprio said.
In addition to the NIH application process, scientists can apply directly to the Broad and work on a fee-for service basis.
Some of the Broad’s clients include Princeton University’s Bonnie Bassler and Stanford University’s Jerry Crabtree, several research teams at the Dana-Farber Cancer Institute, and Massachusetts General Hospital.
Customers for the service are generally academics, DeCaprio said.
“We’re happy to work with anybody, but one of our goals is to make all the data publicly available,” he said. “Everything we do goes into the PubChem database,” which most pharmaceutical firms and biotechs would rather avoid, he said.
Capture It
The intake process at the Broad involves first replicating an outside researcher’s work and then running the screen. Unlike traditional labs, he said, The Broad puts robotics to work on a protocol and all steps and changes are captured in a CambridgeSoft electronic lab notebook system that is made available to the researchers who requested the screen, he said.
“We use that to track all the interactions so that all of the metadata about what’s going on [in] experiments” is captured, DeCaprio said.
Primary data for 325,000 compounds comes off a detection instrument and is processed using Screener to “do QC, correct for controls, [and] normalize the data,” he said. Part of the process is automated but some decisions are manual since it is a “complicated process with a lot of points of failure.”
Some data management steps include a scientist manually marking a screen result as “invalid” or ” what I expected to see,” he said. Screener enables this level of interactivity, which is a feature that DeCaprio appreciates.
After a review of the first set of results with the collaborator, the team can decide which follow-on steps are necessary.
The Broad team is using Screener’s Assay Analyzer module, which visualizes the “raw well-level data.” It captures data from plate readers and processes them according to “predefined business rules.”
A separate module, Condoseo, plots the data on dose-response curves, to give scientists information such as IC50 numbers, which sheds light on the potential effectiveness of a compound.
All of the data stays at the Broad Institute in an “open data-sharing environment.” Participating scientists contribute their biology and findings and sign a data-sharing agreement. “They get access to everybody else’s data,” he said.
“Eventually it all goes public, after a year,” he said. In the first year, it is an environment “where people can access it privately.”
While high-throughput screening data sets are “not huge” — especially compared to ” the next-generation sequencing problems we have,” DeCaprio said that HTS data has its own challenges.
For example, results are “heavily dependent” on the conditions and the context of an assay, DeCaprio said, so he and his colleagues focus on the metadata, the “richness of the data,” associated with the results.
“In the small-molecule space you absolutely have to have” metadata, or the results can end up being “useless.” In a separate project with CambridgeSoft he is working on metadata management “to really try and open up that metadata and make it far more searchable,” he said.
The metadata is captured in the ELN, which is integrated with Screener, he explained, adding that the Broad team used Screener’s APIs to build the integration.
Source: Genomeweb
