Informatics and Chemical Biology: Identifying Targets and Biological Pathways

CINF symposium at the Fall 2017 ACS Meeting in Washington, DC

Rachelle J Bienstock

With the increasing availability of genomic information and biological expression data, one of the current challenges in drug discovery is linking biological pathway data with small-molecule drug data. How can drug pathway target information and metabolic pathway information be linked to small ligand information? These are some of the issues and questions addressed by the CINF symposium “Informatics and Chemical Biology: Identifying Targets and Biological Pathways”, at the Fall 2017 ACS Meeting in Washington, DC.

David Sheen, NIST, (National Institute of Standards and Technology), started off the symposium by discussing incompatibilities with metabolic data reported by different groups, and the need to have improved databases and data harmonization methods to address varying uncertainties in reported experimental metabolic data. This will enable comparison of data from different laboratories and sources. NIST maintains, the quality assurance program in metabolomics, to encourage exchange of spectral data and comparison of uncertainties in measurement, and is conducting literature interlibrary comparison studies. Reproducibility analysis for spectral data is an issue as well.

Dr. Karina Martinez Mayorga, Insituto de Quimica, UNAM, reported using the PLIF (Chemical Computing Group Protein Ligand Interaction Fingerprints) method for screening biased ligands for opioid receptors. There are approximately 800 opioid receptors that are members of the G-protein coupled receptor (GPCR )family and they are significant targets for pain management. Databases of these interaction fingerprints, combined with methodologies to identify structural traits for selective agonists, will lead to the successful development of drugs with fewer side effects.

Dr. Doug Selinger, Plex (, discussed development of a search engine for chemical biology and drug discovery. The search engine begins with a query molecule and expands to more compounds with similar chemical structures and biological transcriptional profiles. Compound-compound and compound-target relationships are used in search algorithms to rank compounds and targets. Data sources include: Open Targets, PubChem, Entrez Gene, chemical similarity, and ChEMBL bioactivities. Plex as a search engine, searches data (1.7 billon rows of data), not Web pages. The search engine can search compounds, targets, or pathways; InChIs, SMILES, and structures can be drawn directly into the search bar. The more datasets included in the search engine, the better the search engine gets at providing answers.

Dr. Anne Wassermann, Merck Informatics, discussed the chemical probe databases: libraries of small molecules with known targets, which permit the development of correlations between chemical and mechanistic properties. She discussed generating target hypotheses for molecules through the use of biologically annotated libraries. The Chemical Probes Portal ( is one example of a publicly available database of probes. Merck is working on Web applications that can be used to relate phenotypes and protein targets and biological pathways.

Way2drug (, a cheminformatics platform for drug repurposing, was discussed by Vladimir Poroikov, Institute Biomed Chemistry, Moscow. This platform provides drug-target interaction predictions, toxicity predictions, and predictions of the effects of drugs on gene expression. The PASS (Prediction of Activity Spectra for Substances) dataset includes information on predicted biological activity spectra, MPDS ( provides molecular property predictors, and MetaTox gives metabolic predictions. Way2drug has links to the Kyoto Encyclopedia of Genes and Genomes (KEGG), PDB, and Thompson Reuters Integrity databases.

Safety and toxicity are among the most significant drug development issues. Matthew Clark, Elsevier, discussed the development of bioassays as predictors of adverse events in clinical trials. FDA submissions, a large number of journals, and Open PHACTS were used as data, looking for relationships between bioactivity and toxicity, the goal being the development of methods for corroborating evidence from pathway analysis for prediction of important targets.

The session concluded with presentations by two groups on deep learning neural network (DNN) applications for small molecule drug discovery. Dr. Abraham Heifets, Atomwise,, gave a presentation on developing predictive models for drug mechanism of actions using deep convolutional neural networks. Deep neural networks are constrained neural networks. AtomNet is a structure-based DNN for molecule bioactivity prediction, which uses a nearest-neighbor structure-based binding algorithm. Atomwise is working on developing these methods, and Abraham presented some results and benchmarks based on their efforts. Antonio de la Vega de Leon, The University of Sheffield, gave a presentation on deep neural networks to predict the activity in a specific screen, and to suggest which target hits the compound. The machine learning algorithm was based on assay description, biological pathway data, and ChEMBL bioactivity data.

Neural network methods show significant promise in their ability to make extensions and predictions based on learning sets and data. With more data available, improved learning sets, and better algorithms to develop correlations, predictions of phenotype, pathways, and targets with small molecule structure and chemical properties data will greatly improve.