Cloud Computing in Cheminformatics

This symposium was ably organized by Rudy Potenzone, who put together an excellent roster of speakers covering many aspects of cloud computing. Rudy was unable to attend the event as he had to attend to family business, so I stepped in to chair the session.

Six papers were presented (and one was withdrawn) by a range of speakers: one was about to celebrate ten year’s operating in the cloud; some were already in the cloud when it used to be called “online,” and others were just beginning to provide tools and solutions for delocalized organizations that want to take advantage of the speed of implementation and scalability of cloud-based solutions.

Barry Bunin of Collaborative Drug Discovery (CDD) opened the session talking about “Ten Years of Collaborative Drug Discovery in the Cloud.” CDD provides a fully-fledged solution for drug discovery, providing all the capabilities expected in an in-house system (chemical registration, assay data management, SAR analysis, collaboration), yet delivered in a secure, auditable and hosted cloud-based system. Barry described several collaborative drug discovery programs hosted at CDD, including various permutations of academia, government agencies, CROs and big and small pharma companies. One of CDD’s success factors has been its ability to integrate private with external data in a secure yet collaborative environment which is scalable and which fosters synergies between complementary techniques.

Alex Clark of Molecular Materials Informatics discussed “Cloud-hosted APIs for Cheminformatics Designed for Real Time User Interfaces. The growth in the use of the cloud has been paralleled by the increasing ubiquity of chemically intelligent, yet underpowered, mobile devices. While these can provide a pleasing user experience, the only way they can interact with large volumes of data, or kick off compute-intensive calculations is to outsource the data storage and calculations to the cloud and to access them via some type of web API. The challenge for the developer is to select the best partitioning between what should be accomplished locally on the mobile device and those that need to be sent to the powerful external server. Alex illustrated this with a very nice SAR table app for groups of compounds and data that provides clustering, scaffold analysis and assignment, and allows plotting of R-groups against each other with properties of the compounds color-coded for quick visual analysis.

There was lively discussion during the lengthened intermission, and I used the time to practice saying the next speaker’s name, and then Valery Tkachenko of the Royal Society of Chemistry (RSC) described “Application of Cloud Computing to Royal Society of Chemistry Data Platforms.” The focus of the talk was ChemSpider, and how the RSC has moved it to the cloud. The ChemSpider database now contains over 30 million compounds and provides data to 50 thousand visitors (from 40 thousand unique connections) each day, for 100 – 400 concurrent users at any time, so the compute power and scalability of the cloud are essential to an operation of this scale. As more properties are added to the database or calculated from structures, big data challenges arise in areas such as indexing, navigation, visualization, and Valery described techniques for addressing these. The eventual aim is for ChemSpider to become a chemistry validation and standardization platform.

Evan Bolton of the National Center for Biotechnology Information (but informally known as Mr. PubChem) spoke next on “PubChem in the Cloud.”  PubChem as a data repository for chemical structures and their associated properties is a self-confessed online database, so effectively pre-dates the cloud, and yet it continues to evolve to take advantage of new technologies and methods of access. With 140,000 users every day, PubChem has added a JSON-based API for uploading data, a REST-style version of its Power User Gateway, and JavaScript-based PubChem widgets that provide a rapid way to display some commonly requested PubChem data views. There is also a new PubChemRDF, which can help researchers work with PubChem data on local computing resources using semantic web technologies. 

Next up was Sharang Phatak of Dotmatics, who discussed “Your Data in the Cloud: Facts and Fears.” The talk started with a high-level overview of the increasingly delocalized and dispersed nature of current R&D, and highlighted the major concerns that are often expressed by researchers, CIOs and Intellectual Property lawyers when going to the cloud is raised. These are: is the data comprehensive; are the system and data structure flexible and scalable; is there control; can data be shared collaboratively; and is there secure access via preferred devices, including mobile? Sharang then illustrated how these fears can be dispelled by using a number of Dotmatics’ web-based tools included in the Dotmatics Platform on the Cloud to address common R&D data capture and analysis tasks.

The final speaker in the session was Nic Encina of PerkinElmer who talked about “Moving Mainstream Chemical Research to the Cloud.” While in-house installed electronic laboratory notebooks have become well accepted and widely deployed across much of the biopharma industry, and to a lesser extent in academia, the increasing acceptance of the cloud as a viable platform for collaborative research has led to the demand for easier to deploy yet powerful systems that facilitate user-driven data capture and organization, coupled with social aspects such as annotation and team-based collaboration. Nic described a new cloud-based collaborative scientific platform called Elements which allows researchers to assemble just the tools they need and to organize them how they want in an open, collaborative environment. Individuals can work in the way they prefer, while sharing project and related data through a common infrastructure.

All the speakers are to be thanked for presenting a fascinating series of talks that highlighted both the challenges and the promise of the cloud for cheminformatics; and the audience is to be commended for staying until 5:30pm.

Phil McHale, Symposium Presider 



As well as being a world-renowned scientific publisher, the Royal Society of Chemistry (RSC) has an established presence in the field of cheminformatics hosting various resources of value to the chemistry community. Our multi-award winning ChemSpider database now contains over 30 million chemicals and provides data to many tens of thousands of scientists every day. Our micropublishing platform, ChemSpider SyntheticPages, provides the most up-to-date method for chemists to deposit their synthetic procedures and share them with the community, thereby building reputation and exposure for their work. We encourage the community to take benefit from these resources.

RSC is happy to support the CINF Division with our sponsorship and to encourage further exposure to the riches that chemical information and cheminformatics can deliver.

Antony Williams, CINF Immediate Past Chair 2014, Royal Society of Chemistry