Technical Program

Meet Your New Program Chair: An Interview with Erin Bolstad

Erin Bolstad

Erin Bolstad heads up the US-based consulting wing of ChemAxon, with focus on services and project management for life sciences and the drug discovery process. She has been with ChemAxon since October 2011. Prior to that Erin spent several years in a few various adventures: cheminformatics research associate with focus on connecting several CNS-based research units and extracting relevant chemistry/biology information, a senior scientist working on molecular biology-based antibiotic design, and a postdoc working on structure-based drug design. Erin received a PhD in computational organic chemistry from the University of Montana in 2006. She has published in several journals from her various collaborative and academic works, and holds a patent from her postdoc work on designing inhibitors for dihydrofolate reductase.

Svetlana Korolev:

Erin, congratulations on being the new CINF Program Chair! Please tell us a little bit about your career path and research interests. What brought you to the field of chemical information? How does CINF overall and its technical program specifically complement your professional ambitions? How does your organization, ChemAxon, perceive your volunteering for CINF?

Erin Bolstad:

Thanks Svetlana! My career path has always kind of circled around the “hand in many pots” syndrome, and finding a way to integrate those interests. As a college junior I realized I could not work on a studio art degree and research-based computational chemistry degree in parallel, so I decided to major in computational chemistry with art as a recreational “spare time” interest (it seemed like a more reliable career path than an art major with computational chemistry as a recreational past time). Computational chemistry/drug design was an obvious segue from my interests in art, biology, chemistry and computers. From there it rapidly lead to cheminformatics, to make sense of the incredible amounts of data available on both the biology and chemistry fronts. I’ve since fallen in love with the pursuit of solving complex questions via handling of large-scaled sets of data from the A to Z areas of drug design.

The consulting work in ChemAxon is perfectly in line with this from both a scientific and human perspective, as the large-scale project management and “cat herding” also appeals to my interests. This tied-in perfectly with the CINF Division and Program Chair position, as CINF is heavily geared towards new mechanisms of large data management and research, as well as looking at how these techniques lead to novel drug design. ChemAxon is a global company in cheminformatics services and toolkits, so volunteering for this position falls right in line with our own interests and research. ChemAxon were very supportive of this.


How did it happen that you became the CINF Program Chair? What major activities or goals do you have for your tenure in this position? Over the past decade the term of the Program Chair has been reducing gradually from as long as three years to one year in 2013. Are you planning on reversing this trend?


At the ACS Spring Meeting of 2013, I was asked by a colleague who had served on the Program Committee for some time if I would be interested in the Program Chair position. He knew of my interests and felt it would be a good fit. Despite having zero ACS committee experience at the time, I was excited by the opportunity! Due to the turn-over speed, the 2014 Fall Meeting is the first one I’ve actually organized, and it’s been quite the learning curve.

With current research in the life sciences field relying on big data, cheminformatics becomes more and more of a critical component for research. One of my goals for CINF Program Chair is to reach out to other divisions and look at this interdependency and how to bring it to the forefront of both awareness and research fronts. CINF is not just for “chemical libraries” and “data repositories” specialists as I once naively believed. I’d also like to look at novel ways for approaching the new generations of researchers who rely on several divisions (not just one) and how they can contribute across the board.

Yes, my tentative plan for the Program Chair term is for two years. With only two programs a year, it seems like that would be the most effective way to make an impact. Otherwise, I learn the ropes just in time to train someone else and pass them on. With a two-year term there’s actually time to let experience guide some progress.  


Erin, how do you evaluate the spring meeting in Dallas? What was the most exciting part of it? One noticeable hallmark was a high level of collaboration: all CINF symposia except for two were listed with cosponsorships, including a new partnership with RSC CICAG (Chemical Information and Computer Applications Group), an established relationship with the CSA Trust, and a blend of connections with many other ACS Divisions (CHAL, CHED, COMP, MEDI, ORGN, PHYS) and Committees (ETHX, PROF, YCC). What are the involvements and benefits of such cosponsorships? How do you establish alliances with other organizations?



The spring meeting in Dallas was really a transitional time. While it was organized by the previous Program Chair, it was when I really stepped in as leading the programming for future events. I have to tip my hat to Jeremy Garritano (photo at right) for the excellent Dallas program, and I am looking forward to seeing how these future plans play out!

The questions regarding cosponsorships should be referred to Jeremy, who kindly shared his experience as follows: collaborations can be as simple as listing another ACS Division or Committee as a cosponsor. This gets a symposium cross-listed in the cosponsor’s program as well. Other collaborations can occur where one of CINF’s members helps to organize a session in another Division. In Dallas we had a great example of this where Tony Williams co-organized a symposium with Harry Pence on “Mobile Devices, Augmented Reality, and The Mobile Chemistry Classroom.” This session took place in the CHED program, but CINF was a cosponsor. Most of the time cosponsorship does not involve any financial commitment, so it is a way to cross-promote our program to other ACS Divisions or Committees. Another example is how the ACS Committee on Ethics and the Younger Chemists Committee (among others) agreed to be cosponsors for our session on “Ethical Considerations in Digital Scientific Communication and Publishing.” Often it is as simple as asking the corresponding Program Chair if we can list the other Division or Committee as a cosponsor. Sometimes we will even ask for suggestions of potential speakers in order to draw interest from cosponsor members.


The thematic programming at ACS meetings has gained a strong upwards trend recently. The themes are being proposed for future meetings up to Fall 2017 (/node/539). How far in advance do you plan the Division’s technical program and how does the ACS thematic programming impact it overall? What kind of support does CINF have from the Multidisciplinary Program Planning Group (MPPG)? Which of the upcoming themes are easier (or harder) for our Division to tackle?


During our meetings we put heavy focus on actual symposia topics for one to two years ahead and keep our eye on thematic programming for as far in advance as four years. We had a few sessions that we were all a titter about organizing promptly, but they were shifted to a later year in order to match with the program overall. At the same time we keep an eye on the geographical location. For the upcoming Spring 2015 Denver meeting getting involved in some of the environmental-based symposia is a key objective we’ve been excited about.

We have been lucky to have Guenter Grethe on board as a proactive contributor to the CINF Program Committee meetings. Guenter is a well-experienced member of the MPPG Executive Committee, who has been involved in thematic planning over many years, including his being MPPG Chair in 2009.

Some of the upcoming themes are a little more challenging for CINF simply due to their vague nature such as “Innovation from Discovery to Application,” while others have us falling out of our chairs with excitement like “Computers in Chemistry”: hello! CINF has both the blessing and the curse of being applicable to almost any theme, as just about everything relies on chemical information. Trying to decide when to go all out on thematic programming is a delightful “burden of choice” type exercise for CINF.


Erin, we have seen questions at CHMINF-L about the availability of CINF presentation slides and/or recordings (audio synchronized with slides). ACS has been offering “Presentations on Demand” in the past few years and, moreover, as a member benefit since last year. Unfortunately, only a very small portion of the presentations (from two or three CINF symposia) become available online after the meeting. As CINF Program Chair do you have an influence on the selection of the symposia for “Presentations on Demand”?


The “Presentations on Demand” is a pretty complex issue. There are a lot of technical intricacies involved, that limit the number of recordings ACS can make. Beyond that, once a symposium has been selected, is the matter of the speaker giving permission to ACS for being recorded at oral sessions and then posting the audio and slides on the Internet. Thus, this makes for several steps in the process. Some members may recall that CINF was at the frontline for symposium recordings and did some experimental audio recordings (MP3) of technical sessions in 2006-2008 as a pilot funded by an ACS Innovative Projects Grant, and this turned out to be too labor-intensive and time-consuming to be continued by a division volunteer. We would certainly like to see more CINF talks available online.  In the meantime, the Division can suggest which presentations we’d like to have recorded (hot topics, thematic relevance, awards symposia, etc.), but ACS has the final say. We’ve also had a renewed push to make presentation slides (with the speaker’s permission) available immediately after ACS Meetings. Dallas was a renewal of this initiative and we ended up having most of the slides available within a week of the Spring 2014 ACS Meeting closing. We’ll be trying to make this happen even faster with the upcoming meeting in San Francisco.


Erin, let me ask you a personal question about your "dream" conference program. What is the ideal conference for Erin Bolstad? Which venue (city) would you pick for such occasion? Is it going to be long or short program considering a number of days? What theme would you organize it for?


There’s a lot of human dynamics to work around for a “dream” program. A geographical destination that everyone wants to go to, but attendees can afford to attend (Hawaii is thus right out, alas). San Francisco is pretty close to an ideal geographical location, with San Diego trailing a close second. I’m biased being a local, but Seattle would be also super fun.

I think a lot of people burn out over the course of four long, intense days; so a high-quality shorter program might keep people’s interest till the very end. Put in an open forum in the middle, followed by a lightning round of up and coming software and methods from researchers and commercial developers as a general “state of the field” update, then settling down for an afternoon of deeper analysis with more talks. Also, there should be an open coffee bar and settle down an infinite number of freshly-baked cookies because it helps with scientific inquiry.

I like diversity and part of an ongoing personal initiative for me is CINF outreach, so I’d want to attract people from other realms of research to present their work from an informatics perspective: both academic and industrial. Informatics is everywhere! Come and see!


We’ve seen many interesting calls for papers for the upcoming ACS National Meeting in San Francisco, August 10-14, 2014. Please share with us some highlights of the CINF technical program planned for the Fall Meeting.


The next meeting is where we’re starting to ramp up some broad perspective and then with spring of 2015 looking at some experimental programming (stay tuned!). For the Fall 2014 ACS National Meeting we have several nifty symposia: an entire day around the theme of global challenges and communication in scientific research, several symposia with a biological tilt (biosimilars, natural products, epigenetic drug discovery), and sessions on how new technologies are playing into cheminformatic research (like Google Glass, the Maker movement, and 3D printing). Start planning now for San Francisco!


Erin, thank you so much for your time and the privilege of introducing you to the readers of this Bulletin. Best wishes for you as Program Chair leading the CINF Division into the future.     


CINF slides or links to the speaker presentations given at
the Spring 2014 ACS National Meeting are at:


Translational Cancer Bioinformatics

The rise in use of "–omics" techniques to develop effective cancer therapies, in particular the Cancer Genome Atlas project (TCGA,, has demonstrated the significant role of computational and informational science in the study and treatment of cancer. It is with the tremendous significance of such topics in mind that this symposium was organized. Four speakers, Drs. Carlos J. Camacho (Computational and Systems Biology, University of Pittsburgh), Wenyuan  Li (Molecular and Computational Biology, University of Southern California), Iwona Weidlich (CODDES LLC, Rockville, MD), and Shuxing Zhang (Department of Experimental Therapeutics, MD Anderson Cancer Center), delivered excellent presentations covering diverse aspects of ongoing research topics in this area.

Dr. Carlos Camacho (University of Pittsburg) discussed New chemistry and powerful interactive technologies to discover PPI antagonists,” focusing on his group’s development of computational tools for rational design of protein-protein inhibitors of critical cancer targets. He also discussed implementation and use of Pocket Query (, a web-based tool for identification of hot spots and binding pockets defined by clusters of residues at the interface of protein-protein interactions. This method can be used for virtual screening and computational design of protein-protein interaction inhibitors. The methodology has been extended to the development of Anchor Query (, an interactive tool for rational design of protein-protein interaction inhibitors through the use of defined pharmacophores and conformational searching. This methodology has been applied to the MDM2/p53 system and other cancer targets.

Dr. Wenyuan Li (University of Southern California, Los Angeles) presented his work in Dr. Jasmin Zhou’s group (, “Integrative analysis of multidimensional cancer genomics data,” focusing on  the development  of software tools and analytical methods to analyze the multi-dimensional ovarian cancer data from the TCGA project, including the copy number variation, DNA methylation, gene expression, and microRNA expression data. Using their methodology, termed Sparse Multi-Block PLS regression, they have successfully identified pathways and associations that would have been overlooked with only a single type of data.  Their software is useful for recognizing hidden patterns and biological implications in multi-dimensional “-omics” data.

Dr. Iwona Weidlich (CODDES), with a title of “New application to estimate the diversity of molecular databases,” discussed Diversity Genie ( ), a set of computational tools useful for analysis of small organic molecule datasets to understand and characterize the diversity in the chemical space. The package can also be employed to sort, merge, and handle large sets of small organic molecules, including conversion between different data formats (e.g., SMILES, InChI, SDF, etc.) and filtering based on chemical and structural properties along with visualization.

Dr. Shuxing Zhang (MD Anderson Cancer Center) completed the symposium with “Computational analysis of pleckstrin homology (PH) domains for cancer drug development,” (, a very specific example of rationally designing inhibitors for targeted cancer therapies using integrated cheminformatics, bioinformatics, and systems biology approaches.

The pleckstrin homology (PH) domain is critical in more than 250 families of proteins involved in intracellular signaling and molecular recognitions. For instance, the PH domain plays a critical role in recruiting oncogene proteins (e.g., Akt) to the membranes for their activation contributing to cancer cell growth. As the 3D folds of PH domains are highly conserved and individual PH domains possess different affinities and specificities for a variety of phosphoinositides, genomics and bioinformatics analyses, along with structure-based methods, and can play a significant role in the rational design of selective inhibitors of these crucial cancer signaling proteins. The integrated approaches developed by this group have been rigorously cross-validated with  a large set of PH domain structures, and it was successfully applied to the prediction of several PH domain proteins, followed by discovery of potent PH domain inhibitors (

In summary, the symposium covered diverse topics from development of useful software for identifying genomic mutations in cancer pathways, to analysis and manipulation of small organic molecules, and to the rational design of promising therapeutics for targeted cancer therapies.

Rachelle Bienstock and Shuxing Zhang, Symposium Organizers



Oral presentations captured at the ACS Spring National Meeting in Dallas will be available to ACS Members after April 28, 2014 at

The following presentation was recorded at the “Translational Cancer Bioinformatics Data, Methods and Applications” symposium:

CINF23 New application to estimate the diversity of molecular databases. Iwona Weidlich


CINF series of free webinars is archived at:

Please stay tuned for announcements of the upcoming CINF webinars at CHMINF-L


Neglected and Rare Disease Drug Discovery Needs Open Data

This short symposium consisting of four presentations was organized by Sean Ekins, Antony Williams and Joel Freundlich. The following three presentations (and one no show) were brought to the spring meeting attendees on Sunday afternoon, March 16, 2014 in Dallas.

The first talk was given by Sean Ekins (Collaborative Drug Discovery) entitled “Looking back at mycobacterium tuberculosis mouse efficacy testing to move new drugs forward.”  Sean Ekins described the collaborative efforts with co-authors, including co-chairs of this session, Antony Williams (Royal Society of Chemistry) and Joel Freundlich (Rutgers University),  to curate and analyze a dataset of mouse in vivo information for tuberculosis research. He described various machine learning models based on 773 molecules, and presented external testing and molecular descriptor analyses. In addition, Sean highlighted the development of open source fingerprints used in the new version of TB Mobile and to build models that could be shared openly.

The second talk by Antony Williams described “Royal Society of Chemistry developments to support open drug discovery.”  In particular, Tony discussed their cheminformatics support of the Indian Open Source Drug Discovery effort working on tuberculosis. In addition, he highlighted their involvement with PharmaSea to help to identify the classes of antibiotics by searching the oceans. Finally, Tony reviewed the acquisition by RSC of MarinLit, a database of marine natural products research. This complements their natural product portfolio which includes Natural Product Reports and Natural Product Updates, and represents over 27,000 molecules to be added to ChemSpider.

The final talk by Evan Bolton (NIH) explored “How can PubChem be leveraged for neglected and rare disease drug discovery?” Evan listed NIH resources for rare diseases like the National Center for Advancing Translational Sciences (NCATS), the Therapeutics for Rare and Neglected Diseases (TRND), Genetic and Rare Diseases Information Center (GARD) and other programs.  He pointed out that it is not easy to get disease information in PubChem and that they are considering how to improve it. Evan proposed that scientists working on open source rare and neglected disease research could upload their data in PubChem.

This session may be one that could be expanded in future to track the developments in data and tools for rare and neglected diseases.

Sean Ekins, Symposium Co-Organizer


Collaborative Computational Technologies for Biomedical Research; Technologies for the Pharmaceutical Industry Series.

Editors: Sean Ekins, Maggie A. Z. Hupcey, Antony J.Williams.

Hardcover ISBN: 9780470638033 July 2011, 576 pages, $146, Wiley.









Ethical Considerations in Digital Scientific Communication and Publishing

Poor editing, sloppy bookkeeping, fudgy analysis, falsification? Misunderstanding, variable technical practice, outride fraud? Sightings in the published scientific literature of apparent data manipulation raise eyebrows and many such questions. A community blog discussion on a controversy around inconsistently-reported elemental analyses last summer suggests that several layers of action by multiple parties might be involved in such issues, some intentional, some perhaps not, and very few publically disclosed ( There is very little understanding through the copies of record of this research to indicate to the reading public what the process might have been regarding the review that the articles underwent and how both the editors and authors approached any confusion over data representation and adjustment.

One way or another, it appears that the community is becoming more aware of potential concerns with the responsible reporting of data and other ethical issues with the scientific publication process. How much of this awareness might be arising from more transparent community discussion via blog and twitter-spheres, less transparency of handling data from measurement through analysis and eventually as publication quality figures, and/or greater pressures on the research and publication systems globally is complicated to sort out. Engaging the chemistry research community in conversations is certainly an important part of the process. To members of the American Chemical Society (ACS) Ethics Committee (ETHX) and Division of Chemical Information (CINF) it seemed timely to organize a symposium on the ethical challenges that arise in the course of preparing for presentation. 

ETHX and CINF teamed up with the ACS Divisions of Chemistry and the Law (CHAL), Professional Affairs (PROF), and Publications to bring together a diverse group of editorial professionals at the recent ACS National Meeting in Dallas, TX to discuss a range of challenges, old and new, and strategies to re-enforce responsible conduct in the publication process. The international speaker set included senior staff and science editors from several core chemistry publishers including the ACS, the Royal Society of Chemistry (RSC) based in the UK, and Wiley-VCH based in Germany and publishing the journals of the Gesellschaft Deutscher Chemiker (the German Chemical Society, GDCh). Also represented were several other publishing and supporting organizations concerned with ethical issues, some with particular focus on data representation, including the Cambridge Crystallographic Data Centre (CCDC), and the American Physiological Society (APS), the Committee on Publication Ethics (COPE), and CrossRef. Abstracts and some slides are available on the CINF site at: /node/557. Following are my own reflections from my notes; any omissions or misrepresentations are my error.

Considering the above case, curation of data is an important concern in the publication process, including best practices for representation, processes for checking and validation, and communicating with authors on subsequent corrective action. Some data types such as crystallographic data are regularly deposited with articles as part of the review process and subsequently curated in databases. The CCDC receives and curates standardized files of coordinates from a variety of publishers, validates the structures in-house and sends back identified errors to authors for correction. The accumulated database is searchable with software that enables further analysis, visualization, curation, and system development that are sustained through distinct subscription streams. Another example is the representation of visual types of data in figures, such as protein separation gels used in physiological research. The APS has found that figure manipulation accounts for a large majority of problems that raise ethical flags in the course of article submission, including general presentation issues: splicing images, adjusting contrast or dropping backgrounds, and poor resolution. An in-house discipline-trained scientist has been assigned to analyze the types of problems and detection scenarios, develop a communication process with authors, and publication guidelines including a policy of transparency concerning image rearrangement as well as a list of “don’ts” to curb misplaced efforts up front. These organizations are focused on engaging authors and the research community through community curation scenarios and graduate education outreach.

Data manipulation and similar ethical concerns have accompanied the exchange of scientific information for centuries. An article on fraud in science in a Wiley journal quotes 19th century British scientist Charles Babbage of the Royal Statistical Society classifying misconduct: “hoaxing, forging, trimming, and cooking,” (DOI: 10.1111/j.1740-9713.2007.00215.x pg 24). The most common types of ethical concerns brought forward by the journal editors presenting in this session included questions of authorship, prior publication and self-plagiarism.  Apparently, not all authors are always aware that their names have been included on submitted manuscripts, and the ACS Publications Division now issues letters to all authors listed on manuscripts to verify submission and that all are aware. Prior publication can be particularly confusing as it is acceptable in some situations such as theses, earlier communication articles and in some fields, preprint repositories. Specific policies concerning prior publication often lie with the editorial offices of specific journals to contend with the shifting research priorities appropriate to the subject coverage and it is important for both the editors and authors to be in communication and up front about handling prior publication of research results. Other concerns include duplicate submission, almost impossible to detect before publication; self-plagiarism, particularly problematic in review articles intended to build across the art of a research area; and “dry-labbing,” reporting of procedures not actually performed that becomes evident with reporting of unrealistic reaction conditions or inappropriate results. 

Many organizations are interested in the questions of ethics arising in the course of scholarly publication.  A memo from the Office of Science and Technology Policy (OSTP) in 2010 directed US funding agencies to “establish principles for conveying scientific and technological information to the public” ( In Europe, promotion of responsible conduct is discussed by the European Association for Chemical and Molecular Sciences (EuCheMS, The ACS Committee on Ethics serves as an educational resource and clearinghouse on ethical considerations and coordinates communication and programming activities (see the committee overview in this issue of CIB). Publishers take their editorial role in these concerns very seriously. They associate through various professional organizations to streamline and network best practices and collectively support the process of addressing challenges. Various tools are available to support the work of editors, reviewers and the overall publication process, including the CrossCheck screening tool developed by CrossRef to flag overlapping text against a growing corpus of full text scholarly literature for further review and implemented by publishers as part of the editorial workflow ( It is not a plagiarism detector or a comprehensive sweep of all published literature, missing most supplemental information and interpretation of image data, including equations. Collaboratively-sourced tools for developing clear and consistent guidelines and process flowcharts are also available from the publisher member-based COPE ( Community-based discussions of specific example cases are collated and anonymized into a knowledge bank of lessons learned available for the broader public. A recent analysis of the cases suggests a shift in focus towards more discussion of conflict of interest, correction of the literature, data, misconduct or questionable behavior, and peer review.

It is increasingly important to support clear and consistent processes for all parties involved in review, handling identified concerns, reaching resolution and developing understanding of the broader issues and responsibilities. RSC has assigned a dedicated staff position to follow the overview of cases and address consistency, with a goal of striving for agreement among all parties, including in cases of retraction, where authors sign retraction notices before they are posted. The ACS also supports authors through a suite of educational materials related to issues of ethics, including episodes of the Publishing Your Research 101 video series on ethical considerations, copyright and the review process (  

Representation issues are all around us in the irrepressibly malleable digital environment. How much manipulation is inadvertent? How much of this activity is concerned with attempting to look professional, how much is involved in striving for the right story, how much wrapped up in building reputation, how much is simply confusion over proper procedure? Creativity in research that builds usefully on the scientific corpus is inherently a juggling act between consistency and aberration.  Researchers are entrusted with due diligence in their experimental design, analysis and documentation. Feeding back into the corpus involves additional juggling of representation and expression of data and rationale. The importance of the moment of publication in defining a line of inquiry and the critical role of trust in upholding the integrity of the scientific record speaks to the long-standing oversight of the editorial process in scientific publication. As the acceleration of research output outpaces traditional publication outlets and the digital environment opens new opportunities for data sharing and communication, both curated and wild, the supportive processes of ensuring responsible conduct in research and publication are put to the test. The questions of what is involved in the editorial process in the era of digital scientific publishing and what are the ethical considerations that arise continue to evolve with the practices of both science and publishing.

The author would like to thank the presenters, the co-sponsors of the symposium, with particular acknowledgement of the organizing efforts of Heather Tierney, for bringing this endeavor to successful fruition. Supporting sources are referred in-line. 

Leah McEwen, Symposium Co-Organizer


Oral presentations captured at the ACS Spring National Meeting in Dallas will be available to ACS Members after April 28, 2014 at

The following four presentations were recorded at the “Ethical Considerations in Digital Scientific Communication and Publishing” symposium:


CINF52 Tools for identifying potential misconduct: The CrossCheck service from CrossRef. Rachael Lammey

CINF53 Mapping the terrain of publication ethics. Charon Pierson

CINF55 Ethics in scientific publication: Observations of an editor and recommended best practices for authors. Kirk Schanze

CINF57 Role of the journal editor in maintaining ethical standards in the changing publishing environment. Jamie Humphrey.


Cloud Computing in Cheminformatics

This symposium was ably organized by Rudy Potenzone, who put together an excellent roster of speakers covering many aspects of cloud computing. Rudy was unable to attend the event as he had to attend to family business, so I stepped in to chair the session.

Six papers were presented (and one was withdrawn) by a range of speakers: one was about to celebrate ten year’s operating in the cloud; some were already in the cloud when it used to be called “online,” and others were just beginning to provide tools and solutions for delocalized organizations that want to take advantage of the speed of implementation and scalability of cloud-based solutions.

Barry Bunin of Collaborative Drug Discovery (CDD) opened the session talking about “Ten Years of Collaborative Drug Discovery in the Cloud.” CDD provides a fully-fledged solution for drug discovery, providing all the capabilities expected in an in-house system (chemical registration, assay data management, SAR analysis, collaboration), yet delivered in a secure, auditable and hosted cloud-based system. Barry described several collaborative drug discovery programs hosted at CDD, including various permutations of academia, government agencies, CROs and big and small pharma companies. One of CDD’s success factors has been its ability to integrate private with external data in a secure yet collaborative environment which is scalable and which fosters synergies between complementary techniques.

Alex Clark of Molecular Materials Informatics discussed “Cloud-hosted APIs for Cheminformatics Designed for Real Time User Interfaces. The growth in the use of the cloud has been paralleled by the increasing ubiquity of chemically intelligent, yet underpowered, mobile devices. While these can provide a pleasing user experience, the only way they can interact with large volumes of data, or kick off compute-intensive calculations is to outsource the data storage and calculations to the cloud and to access them via some type of web API. The challenge for the developer is to select the best partitioning between what should be accomplished locally on the mobile device and those that need to be sent to the powerful external server. Alex illustrated this with a very nice SAR table app for groups of compounds and data that provides clustering, scaffold analysis and assignment, and allows plotting of R-groups against each other with properties of the compounds color-coded for quick visual analysis.

There was lively discussion during the lengthened intermission, and I used the time to practice saying the next speaker’s name, and then Valery Tkachenko of the Royal Society of Chemistry (RSC) described “Application of Cloud Computing to Royal Society of Chemistry Data Platforms.” The focus of the talk was ChemSpider, and how the RSC has moved it to the cloud. The ChemSpider database now contains over 30 million compounds and provides data to 50 thousand visitors (from 40 thousand unique connections) each day, for 100 – 400 concurrent users at any time, so the compute power and scalability of the cloud are essential to an operation of this scale. As more properties are added to the database or calculated from structures, big data challenges arise in areas such as indexing, navigation, visualization, and Valery described techniques for addressing these. The eventual aim is for ChemSpider to become a chemistry validation and standardization platform.

Evan Bolton of the National Center for Biotechnology Information (but informally known as Mr. PubChem) spoke next on “PubChem in the Cloud.”  PubChem as a data repository for chemical structures and their associated properties is a self-confessed online database, so effectively pre-dates the cloud, and yet it continues to evolve to take advantage of new technologies and methods of access. With 140,000 users every day, PubChem has added a JSON-based API for uploading data, a REST-style version of its Power User Gateway, and JavaScript-based PubChem widgets that provide a rapid way to display some commonly requested PubChem data views. There is also a new PubChemRDF, which can help researchers work with PubChem data on local computing resources using semantic web technologies. 

Next up was Sharang Phatak of Dotmatics, who discussed “Your Data in the Cloud: Facts and Fears.” The talk started with a high-level overview of the increasingly delocalized and dispersed nature of current R&D, and highlighted the major concerns that are often expressed by researchers, CIOs and Intellectual Property lawyers when going to the cloud is raised. These are: is the data comprehensive; are the system and data structure flexible and scalable; is there control; can data be shared collaboratively; and is there secure access via preferred devices, including mobile? Sharang then illustrated how these fears can be dispelled by using a number of Dotmatics’ web-based tools included in the Dotmatics Platform on the Cloud to address common R&D data capture and analysis tasks.

The final speaker in the session was Nic Encina of PerkinElmer who talked about “Moving Mainstream Chemical Research to the Cloud.” While in-house installed electronic laboratory notebooks have become well accepted and widely deployed across much of the biopharma industry, and to a lesser extent in academia, the increasing acceptance of the cloud as a viable platform for collaborative research has led to the demand for easier to deploy yet powerful systems that facilitate user-driven data capture and organization, coupled with social aspects such as annotation and team-based collaboration. Nic described a new cloud-based collaborative scientific platform called Elements which allows researchers to assemble just the tools they need and to organize them how they want in an open, collaborative environment. Individuals can work in the way they prefer, while sharing project and related data through a common infrastructure.

All the speakers are to be thanked for presenting a fascinating series of talks that highlighted both the challenges and the promise of the cloud for cheminformatics; and the audience is to be commended for staying until 5:30pm.

Phil McHale, Symposium Presider 



As well as being a world-renowned scientific publisher, the Royal Society of Chemistry (RSC) has an established presence in the field of cheminformatics hosting various resources of value to the chemistry community. Our multi-award winning ChemSpider database now contains over 30 million chemicals and provides data to many tens of thousands of scientists every day. Our micropublishing platform, ChemSpider SyntheticPages, provides the most up-to-date method for chemists to deposit their synthetic procedures and share them with the community, thereby building reputation and exposure for their work. We encourage the community to take benefit from these resources.

RSC is happy to support the CINF Division with our sponsorship and to encourage further exposure to the riches that chemical information and cheminformatics can deliver.

Antony Williams, CINF Immediate Past Chair 2014, Royal Society of Chemistry