Technical Program with Abstracts

ACS Chemical Information Division (CINF)
245th ACS National Meeting, Spring 2013
New Orleans, LA (April 7 - 11)

CINF Symposia

J. Garritano, Program Chair

[Created Mon Mar 18 2013, Subject to Change]

Sunday, April 7, 2013 8:30 am - 11:50 am

Advances in Visualizing and Analyzing Biomolecular Screening Data - AM Session Data-Mining Public Bioactivity Data
Morial Convention Center
Room: 349
Cosponsored by COMP
Deepak Bandyopadhyay, Jun Huan, Organizers
Deepak Bandyopadhyay, Jun Huan, Presiding
8:30   Introductory Remarks
8:35 1 Characterizing the diversity and biological relevance of the MLPCN assay manifold and screening set

Jun Huan, jhuan@ittc.ku.edu, EECS, Univ. of Kansas, lawrence, ks 66049, United States

The NIH Molecular Libraries Probe Production Centers Network (MLPCN) aims to remediate key deficiencies in drug discovery and chemical biology, through pursuit of therapeutically feasible but unprofitable drug targets, undruggable genes of biochemical interest, and development of chemically diverse, biologically relevant screening sets. This paper evaluates the novelty of MLPCN targets, their propensity for undergoing modulations of biochemical or therapeutic relevance, the degree of chemical diversity inherent in the MLPCN screening set, and biogenic bias of the set. Our analyses suggest that MLPCN targets cover biologically interesting pathway space that is distinct from established drug targets, but may include genes whose overly complex protein interactions may obfuscate pathway effects and enable therapeutically undesirable side-effect risks. We find the MLPCN screening set to be chemically diverse, and it has greater biogenic bias than comparable collections of commercially available compounds. Biogenic enhancements such as incorporation of more metabolite-like chemotypes are suggested.

9:00 2 New ways to mine disparate screening data in PubChem

Evan Bolton, bolton@ncbi.nlm.nih.gov, PubChem, NCBI / NLM / NIH, United States

PubChem is an open repository for chemical biology information. PubChem contains ~2.5 million biologically tested substances (representing 1.8 million unique small molecules) and ~200 million biological experiment result outcomes. This large corpus of information requires innovative approaches to swiftly find and summarize desired information. While PubChem has a number of pre-existing capabilities to mine biological screening data, such as summary counts and heat-map style displays, this talk will detail new innovations that provide dramatically expanded capabilities to rapidly navigate and relate chemical and biological data within the resource.

9:25 3 PubChem DataDicer: A data warehouse for rapid querying of bioassay data

Lewis Y Geer, lewisg@ncbi.nlm.nih.gov, Lianyi Han, Siqian He, Yanli Wang, Evan E Bolton, Stephen H Bryant. NLM/NCBI, NIH, Bethesda, Maryland 20894, United States

The amount of publically available bioassay data has increased to the range of 200M endpoints. At the same time, this data can be linked to a large amount of information in a variety of databases, such as NCBI Gene and PubChem Compound. The breadth and depth of this data presents challenges for researchers attempting to extract useful data for their research. The PubChem DataDicer centralizes this information in a single data warehouse allowing the researcher to rapidly locate assay endpoints with similar characteristics, such as shared pathways, targets and chemical properties, for further analysis. We have also investigated the creation of a RESTful web API for programmatic access to this data warehouse.

9:50 4 PubChem widgets

Lianyi Han, hanl@ncbi.nlm.nih.gov, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, United States

Modern interactive web and mobile applications for chemistry and biology often need to integrate information from multiple resources, such as biochemical analysis, patents, and publications. This typically requires an underlying data warehouse containing billions of chemical and bioactivity records coupled with web services that deliver "Asynchronous JavaScript and XML" (AJAX) and JSONP(or "JSON with padding") content to applications. PubChem Widgets provide a rapid development tool to create content-rich and interactive UIs without requiring the development of such a data warehouse. These widgets show commonly requested PubChem data views, such as 1) patents associated with a PubChem compound or substance; 2) bioactivity outcomes for a PubChem compound, substance, or bioassay; 3) Literature available for a compound, substance, or bioassay. These widgets are easily embedded into your own web application or HTML pages, and can also be used to access annotation data from native desktop and mobile applications. Beta release available: http://pubchem.ncbi.nlm.nih.gov/widget/docs/widget_help.html.

10:15   Intermission
10:35 5 Automated structure-activity relationship mining: Connecting chemical structure to biological profiles

Mathias Wawer1, mwawer@broadinstitute.org, David Jaramillo1, Kejie Li1, Sigrun Gustafsdottir1, Vebjorn Ljosa4, Nicole Bodycombe1, Melissa Parkin3, Katherine Sokolnicki4, Mark-Anthony Bray4, Ellen Winchester3, George Grant3, Cindy Hon1, Jeremy Duvall2, Joshua Bittker2, Vlado Dancik1, Rajiv Narayan5, Aravind Subramanian5, Wendy Winckler3, Todd Golub5, Anne Carpenter4, Stuart Schreiber1, Alykhan Shamji1, Jürgen Bajorath6, Paul Clemons1. (1) Chemical Biology Program, Broad Institute, Cambridge, MA 02142, United States, (2) Chemical Biology Platform, Broad Institute, United States, (3) Genomics Platform, Broad Institute, United States, (4) Imaging Platform, Broad Institute, United States, (5) Cancer Program, Broad Institute, United States, (6) Department of Life Science Informatics, B-IT, LIMES, University of Bonn, Bonn, Germany

Understanding structure-activity relationships (SARs) of small molecules is important for the development of probes and novel therapeutic agents in chemical biology and drug discovery. We developed computational methods to automatically mine and visualize SARs for small-molecule screening and profiling data. We applied these methods to data from novel gene-expression and imaging assays collected for more than 22,000 small molecules. The collection contains novel compounds originating from diversity-oriented synthesis (DOS) as well as known bioactive molecules. The DOS compound collection covers a diverse chemical space while including structural analogs and stereoisomers. We automated the discovery of rules that connect chemical features of these compounds to their biological profiles, allowing us to prioritize groups of compounds for further study.

11:00 6 Using the bi-clustering SPE for the visualization and analysis of massive amounts of compound-target activity data

Dmitrii Rassokhin1, drassokh@its.jnj.com, Dimitris Agrafiotis2, Eric Yang2. (1) Department of Translational Informatics, Janssen Pharmaceutical Companies of Johnson &Johnson, Spring House, 19477-0776 19477-0776, United States, (2) Department of Neuroscience Informatics, Janssen Pharmaceutical Companies of Johnson &Johnson, Spring House, PA 19477-0776, United States

We have developed an algorithm termed Bi-Clustered Stochastic Proximity Embedding (Bi-SPE), which is an extension of the SPE mapping algorithm originally proposed by Agrafiotis et al., and successfully used it for the visualization of very large compound bioactivity data sets. We have shown that using the compound/target distance metric derived directly from certain types of bioactivity measurements, such as enzyme inhibition constants and the half maximal inhibitory concentrations, it is possible to simultaneously cluster together both compounds with similar bioactivities, and targets that are modulated by common compounds. We assert that the result of this bi-clustering provides an interesting visual representation of the space of molecule/target interactions. We have also shown that Bi-SPE can be used as a collaborative filtering machine learning algorithm to accurately predict unknown compound/target interactions from the ones present in the training data set.

11:25 7 BioAssay Research Database: A platform to support the collection, management, and analysis of chemical biology data

Rajarshi Guha1, guhar@mail.nih.gov, David Lahr2, Joshua Bittker2, Thomas D.Y. Chung3, Mark Southern7, Simon Chatwin2, Jeremy J Yang4, Oleg Ursu4, Christian G Bologa4, Tudor I Oprea4, Eric Dawson5, Shaun R Stauffer5, Craig W Lindsley5, Uma Vempati6, Hande Kucuk6, Stephan C Schurer6, Stephen Brudz2, Paul A Clemons2, Andrea de Souza2, Noel Southall1, Dac-Trung Nguyen1, John Braisted1, Tyler Peryea1. (1) NIH Center for Advancing Translational Science, Rockville, MD 20850, United States, (2) Broad Institute, Cambridge, MA 02143, United States, (3) Conrad Prebys Center for Chemical Genomics, Sanford-Burnham Medical Research Institute, La Jolla, CA 92037, United States, (4) Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, United States, (5) Vanderbilt University, Nashville, TN 37232, United States, (6) University of Miami, Miami, FL 33101, United States, (7) Scripps Research Institute, Jupiter, FL 33458, United States

The BioAssay Research Database(BARD) was conceived to enable scientists to effectively use the National Institutes of Health(NIH) Molecular Libraries Program(MLP) data. The project is a collaboration between several institutions across the US, and has recently released an infrastructure that supports collection and annotation of bioassay screening data using a well-defined vocabulary, dissemination of data via web and desktop clients, and a mechanism to help develop novel views and analyses of data in the associated databases. In this presentation, we describe the design and implementation of the technical infrastructure that underlies BARD. We will highlight contextualization of assay results (via links to external resources coupled with full-text indexing) and describe how BARD functionality can be extended by the community using plug-ins. As an exemplar of such community-driven extensions, we will describe how the BadApple promiscuity method was integrated into BARD via a plugin developed at the University of New Mexico.

Sunday, April 7, 2013 8:20 am - 11:45 am

Library Cafes, Intellectual Commons and Virtual Services, Oh My! Charting New Routes for Users into Research Libraries - AM Session Transforming Libraries
Morial Convention Center
Room: 350
Cosponsored by CHED
Leah Solla, Olivia Bautista Sparks, Teri Vogel, Organizers
Leah Solla, Teri Vogel, Presiding
8:20   Introductory Remarks
8:25 8

Transformation of academic branch libraries

Nevenka Zdravkovska, nevenka@umd.edu, Engineering and Physical Sciences Library, University of Maryland, College Park, MD 20742, United States

Nevenka Zdravkovska, author of the book "Academic Branch Libraries in Changing Times" (Chandos Publishing, 2011) will present a short overview of the transformation of the branch academic libraries over the years, with a special emphasis on branch libraries in the science disciplines.

Presentation (pdf)

8:50 9 Library spaces for scientific computing discovery and learning

Andrea Twiss-Brooks, atbrooks@uchicago.edu, Division of Science Libraries, University of Chicago, Chicago, IL 60637, United States

The University of Chicago Library is committed to creating hospitable physical and virtual environments for study, teaching, and research and to collaborating with other members of the University to enrich research and learning. The Library has engaged with various stakeholders to create spaces for furthering these goals. Collaborations include a partnership with the Research Computing Center to create the Data Visualization Laboratory in The Kathleen A. Zar Room http://rcc.uchicago.edu/resources/data_visualization.html, the creation of a computer equipped classroom for all University of Chicago library staff in a former computer lab area, and planning with various campus units to create additional technology equipped classrooms in the John Crerar Library. Currently existing spaces have already been used to host a number of workshops, research seminars, training sessions, and other events. Descriptions of the collaborations, facilities and plans will be provided.

9:15 10

Heart of the university or how to stay stuck in the middle with you

Susanne J Redalje, curie@uw.edu, Lauren Ray. Libraries, University of Washington, Seattle, Washington 98195-2900, United States

Only rarely do librarians consider their users clown or jokers as suggested by the Stealers Wheel. We do, however, want to stay stuck in the middle with them and continue to be as relevant to their research and educational needs as we were when Harvard University President Charles William Eliott considered the Library as the heart of the University. Budgetary issues, technological advancements and new ways of teaching and communicating make this a challenge. Like many Universities, the University of Washington is involved in several initiatives to address this problem. In the fall of 2010, the Research Commons was opened in the space formerly housing the Natural Sciences branch library. This is the first of several such spaces designed to connect with user individually or as groups. It provides flexible spacing and technology to help meet users research, teaching and learning needs. The Libraries has worked closely with the Graduate College to provide relevant programming aimed at graduate students. One particularly successful program includes lightning talks based on an interdisciplinary topic allowing graduate students experience in presenting research and get feedback on their presentations. Assessment of the space shows users are very happy with it and also suggests directions for the future.

Presentation (pdf)

9:40 11

From traditional library organization to functional structure: How does it benefit library users?

Erja Kajosalo, kajosalo@mit.edu, Libraries, Massachusetts Institute of Technology, Cambridge, MA 02139, United States

In 2010, MIT Libraries underwent a reorganization that changed the way subject librarians cooperated to meet information needs across the Institute. Interdisciplinary "communities of practice" replaced a more traditional reporting structure that was previously organized around (and confined by) the geographic layout of our campus libraries. With the benefit of two years' worth of hindsight, this presentation highlights a few examples of how a more agile organizational model has resulted in new opportunities-- and new challenges-- for serving our communities and for developing liaison librarian competencies.

Presentation (pdf)

10:05   Intermission
10:20 12 Holistic approaches to service: Connecting researchers to libraries through relationship building

Kiyomi D. Deards, Kdeards2@unl.edu, University of Nebraska-Lincoln, Lincoln, NE 68588-4100, United States

Flexibility and a focus on the needs of library users is a frequent refrain in today's rapidly changing society. How are those needs determined? How are users approached for input? Do you dislike knocking on doors when you're not expected? Come discuss opportunities for informal interactions that lead to collaborations, acquisitions of resources, instructional invitations and more. The importance of relationship building and quality of contacts versus quantity of contacts will also be explored.

10:45 13 Ask for research alterations: Emerge with a custom fit

Jill E Wilson, jew248@cornell.edu, Leah R McEwen. Engineering, Mathematics, and Physical Sciences Libraries, Cornell University, Ithaca, NY 14853, United States

Libraries offer suites of collections, resources and services which can appear to be one size fits all, ready-to-wear approaches to doing research. Users “shop” library websites and services for information to discover the best “fit” their research needs. Often, after many tryings-on, our researchers push back with sentiments such as “Why can't I just get what I need to get from one place?” or “I just can't find what I am looking for right away.” Librarians at Cornell's Physical Sciences Library demonstrate the impact their libraries have on users experience with the notion that we custom tailor for each unique need and provide alterations to the cycle of scholarly communication for best results. We respond with tailored outreach and services including specialized finding tools, an interactive virtual presence and custom workshops on professional development through working partnerships with our graduate students. Research snags may appear with first fitting; however our strength is not just the collections we offer, but how we fit them into our patrons' lives.

11:10 14

Let's work together: ACS Publications author outreach initiatives and opportunities for libraries

Sara Rouhi, S_Rouhi@acs.org, Publications Division, American Chemical Society, Washington, DC 20036, United States

The changing digital landscape offers new opportunities for libraries and publishers to reach out to users in new ways. ACS Publications will share a number of ongoing initiatives that libraries can use to help educate their patrons on topics ranging from ethics and copyright in scholarly publishing to manuscript composition and the process of peer-review. We'll also share our ongoing engagement efforts with young scientists through the ACS Summer Institute. Also look for an update on the ACS Style Guide Online and the debut of a new scholarly research tool from the ACS.

Presentation (pdf)

Sunday, April 7, 2013 2:00 pm - 5:15 pm

Advances in Visualizing and Analyzing Biomolecular Screening Data - PM Session Tools, Techniques, Platforms and Software
Morial Convention Center
Room: 349
Cosponsored by COMP
Deepak Bandyopadhyay, Jun Huan, Organizers
Deepak Bandyopadhyay, Jun Huan, Presiding
2:00 15 3D phylogenetic trees for visualization and analysis of complex datasets

Ruben Abagyan1,2, k6wright@ucsd.edu, Eugene Raush2, Maxim Totrov2. (1) Skaggs School of Pharmacy &Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, United States, (2) MolSoft, San Diego, CA 92121, United States

Visualizing a large number of objects defined by a distance matrix or as points in a multidimensional space efficiently has always been challenging. Three main methods have been developed: showing the objects as phylogenetic trees in two dimensions, showing the objects as network diagrams, and showing three principal coordinates (or just three arbitrarily chosen parameters from multidimensional coordinates). Here we present a new method which combines the phylogenetic trees with the placement of the objects in three dimensional space. The new method has been implemented in the ICM suite of programs from MolSoft as an interactive object which is dynamically linked with the underlying data. The new three dimensional trees are efficient in visualizing complex relationships between large collections of objects of any nature once the distance matrix can be established. We illustrate this representation on a large collection of chemicals, screening data and biological sequences.

2:25 16 On-line graph mining and visualization of protein-ligand interactome

Clara Ng1, lxi0003@hunter.cuny.edu, Lei Xie1,2, lxi0003@hunter.cuny.edu. (1) Department of Computer Science, City University of New York, New York, NY, United States, (2) Graduate Center, City University of New York, New York, NY, United States

Recent high-throughput screens have generated a lot of protein-ligand interaction data; for example, over one million compounds are associated with the 4422 proteins in ChEMBL. Recent attempts to mine and visualize this large protein-ligand interaction dataset have mapped chemicals into a high-dimensional feature space and visualized it using dimensionality reduction techniques. We propose a different approach to exploring the protein-ligand interactome efficiently, effectively, and intuitively. We link all chemicals and targets into an all-against-all chemical similarity network and target similarity network, respectively. The networks are connected as a bipartite graph through protein-ligand interactions. Efficient graph clustering and mining algorithms are applied to identify chemical and protein patterns underlying binding promiscuity and specificity. Although the chemical/protein similarity network is computationally intensive, it need only be built once and updated regularly. As demonstrated in case studies for anti-infectious drug discovery, our method may facilitate drug repurposing, side-effect prediction, and polypharmacology drug design.

2:50 17 Encoded Library Technology data analysis: Finding the grain of sand you want without getting a sunburn

Kenneth E Lind, kenneth.e.lind@gsk.com, Neil R Carlson, Ninad V Prabhu, Jeff A Messer. MDR Boston, GlaxoSmithKline, Waltham, MA 02451, United States

Encoded Library Technology (ELT) is a part of GSK's integrated Hit ID strategy. ELT involves creation of large combinatorial libraries whose members (sometimes over a billion!) are encoded by a unique combination of DNA tags. Binders to a molecular target are selected from these libraries and identified using next-generation DNA sequencing. We have developed a platform for translating sequence data back to the encoded chemical warhead, detecting features that are enriched in the selection, and summarizing and annotating the selection experiment. Each week our platform processes over 100 million DNA sequences - larger than the entire human genome. Data visualization is integrated into the TIBCO Spotfire platform, allowing scientists to view summaries of the large data sets, determine the most important chemical space, and then drill down to specific results to prioritize compounds for synthesis and assays. We will describe method details and present examples to highlight our analysis and visualization tools.

3:15   Intermission
3:35 18 Exploring the chemical space of screening results

Ed Champness, ed@optibrium.com, Matt Segall, Chris Leeding, James Chisholm, Iskander Yusof, Hector Martinez, Nick Foster. Optibrium Ltd., Cambridge, United Kingdom

When faced with the results from a screening campaign it is essential to use this data to quickly focus on the best chemistries for progression. In this presentation we will describe two techniques for visualising a 'chemical space' to guide this exploration. We will demonstrate how these can be used to identify activity 'hotspots' and focus on these for detailed analysis of structure-activity relationships. This approach can also help to spot singletons and outliers that may represent false positives or negatives for further investigation. Furthermore, it is well understood that high quality chemistry will have not only good activity, but also appropriate absorption, distribution, metabolism, elimination and toxicity (ADMET) properties. We will show how data from multiple sources can be combined to select compounds for further study with an appropriate balance of activity, ADMET properties and structural diversity to mitigate downstream risk.

4:00 19 How to highlight hits: Advances in visual data analytics tools for HTS data

Jesse A. Gordon, jesse.gordon@dotmatics.com, Jess Sager. Application Science, Dotmatics, Ltd., Woburn, MA 01801, United States

We face a huge dataset from a screening run and we want to analyze the results to pick compounds for the next screening run. How do we sift through the millions of data points to figure out which are meaningful hits, and then organize those hits into a database from which we can intelligently predict good prospects for the next screening run? We face a series of challenges in HTS data analysis which will be outlined in this presentation followed by solutions offered through modern chemoinformatics and visual data analytics tools. We look at the difference between the "Old Way" -- grid after grid in Excel with manual calculations -- and the "New Way" -- clicking on visually distinctive points highlighted in red on automatically-generated curves.

4:25 20 Integrated cheminformatics software for visualizing and analyzing high-throughput screening data

Denis Fourches, fourches@email.unc.edu, Alexander Tropsha. Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products, University of North Carolina, Chapel Hill, NC 27599, United States

With the growing number of academic laboratories conducting high-throughput screening (HTS) comes the need for accessible and easily customizable software capable of visualizing and analyzing HTS results. Herein, we report on the development of the HTS Navigator software (freely available for academia). It allows loading and processing of output files for both individual and batches of plates from different readers' formats, visualization of the overall heat map colored by activity, automatic detection of hits as well as compounds with mono- and dual- selectivity for screened targets, and different types of baseline corrections. HTS Navigator includes basic cheminformatics capabilities such as chemical structure storage and visualization, fast similarity search and neighborhood analysis for retrieved hits, hierarchical clustering in both chemistry and activity spaces, and the detection of activity cliffs. The Navigator is coupled to the ADDAGRA software for visualizing compound clusters and outliers in both multidimensional chemistry and HTS spaces.

4:50 21 Integrating design, analysis, and visualization into the drug discovery workflow

W. Patrick P. Walters1, pat_walters@vrtx.com, Carlos Faerman1, Jonathan Weiss1, Xiaodan Zhang1, Roslyn Potter1, Jun Feng1, Guy Bemis1, Susan Roberts2, Jason Yuen2, Trevor Kramer2, Jonathan Christopher3, Jeff Orr3, Brian Goldman1. (1) Computational Sciences, Vertex Pharmaceuticals, Cambridge, MA 02139, United States, (2) Global Information Services, Vertex Pharmaceuticals, Cambridge, MA 02139, United States, (3) Global Information Services, Vertex Pharmaceuticals, San Diego, CA 92121, United States

Drug discovery is a complex process that involves the simultaneous optimization of multiple parameters. Effective discovery teams must be able to analyze data, identify trends, and decide on a direction for compound optimization. This analysis typically requires the synthesis of not only internal data, but also information culled from external sources such as patents and papers. Many informatics systems provide the ability to query internal data, but few can integrate design tools with a combined analysis of internal and external data. Rather than simply combining all of the data into a single data warehouse, we have used facilities such as application programming interfaces(APIs) to create links to a wide array of relevant external sources. This presentation will use a few case studies to present some of our recent work on a unified informatics infrastructure that allows facile access to internal data as well as a variety of literature sources.

Sunday, April 7, 2013 2:30 pm - 5:00 pm

Library Cafes, Intellectual Commons and Virtual Services, Oh My! Charting New Routes for Users into Research Libraries - PM Session Online Tools
Morial Convention Center
Room: 350
Cosponsored by CHED, COMP
Leah Solla, Teri Vogel, Olivia Bautista Sparks, Organizers
Leah Solla, Teri Vogel, Presiding
2:30   Introductory Remarks
2:35 22

Cambridge Structural Database: Moving with the times

Susan Henderson, henderson@ccdc.cam.ac.uk, Ian J Bruno. CCDC, Cambridge, Cambridgeshire CB2 1EZ, United Kingdom

The Cambridge Crystallographic Data Centre (CCDC) compiles the world's repository of small molecule organic and organometallic crystal structures, known as the Cambridge Structural Database (CSD). Containing over 600,000 X-ray diffraction analyses, the CSD is a unique resource of invaluable structural information. We have developed and licence a comprehensive range of locally installed desktop tools that enable the database to interrogated and the results visualised and analysed, allowing our users to make maximum benefit of the data. This talk will summarise the changes we are making in response to changing user requirements and technologies, both in terms of how the data are accessed and in terms of the underlying structure of the database.

Presentation (pdf)

3:00 23

ChemEd DL WikiHyperGlossary: A social semantic information literacy service for digital documents

Robert E. Belford1, rebelford@ualr.edu, Dan Berleant2, Michael A. Bauer2, Jon L. Holmes3, John W. Moore3. (1) Department of Chemistry, University of Arkansas at Little Rock, Little Rock, AR 72204, United States, (2) Department of Information Science, University of Arkansas at Little Rock, Little Rock, AR 72204, United States, (3) Department of Chemistry, University of Wisconsin-Madison, Madison, WI, United States

ChemEd DL WikiHyperGlossary (WHG) automates the markup of digital text documents and web pages, inserting hyperlinks pointing to an associated glossary and returning the content in a JavaScript overlay. Both editable and non-editable definitions can be returned, and a glossary architecture designed to enhance reading comprehension by coupling social to canonical definitions will be presented. The overlay also connects documents to databases (UniProtKB, Models 360), with tabs going to search services (ChemSpider) and software agents like Jmol and JChemPaint. The later enables researchers to create new search tabs based on their edits, which leads to a molecular-editor enabled knowledge framework targeting the effects of the researcher's edits. Digital archives like Project Gutenberg enable the WHG to introduce social-semantic features into historic documents, thereby directly connecting past works of science like Antoine Lavoisier's “Elements of Chemistry” to modern informatics resources. The WHG development site is http://hyperglossary.org/.

Video presentation (.avi)

3:25 24

Navigating scientific resources using wiki-based resources

Antony J Williams1, williamsa@rsc.org, Valery Tkachenko1, Alexey Pshenichnov1, Sean Ekins3, Aileen Day2, Martin Walker4. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) Cheminformatics, Royal Society of Chemistry, Cambridge, United Kingdom, (3) Collaborations in Chemistry, Fuquay Varina, NC, United States, (4) Potsdam University, Potsdam, NY, United States

There is an overwhelming number of new resources for chemistry that would likely benefit both librarians and students in terms of improving access to data and information. While commercial solutions provided by an institution may be the primary resources there is now an enormous range of online tools, databases, resources, apps for mobile devices and, increasingly, wikis. This presentation will provide an overview of how wiki-based resources for scientists are developing and will introduce a number of developing wikis. These include wikis that are being used to teach chemistry to students as well as to source information about scientists, scientific databases and mobile apps.

Presentation

3:50 25 CuLLR me collaboration: Models and tools for user-driven eLibraries

Dianne Dietrich, dd388@cornell.edu, Leah R McEwen. Engineering, Mathematics, and Physical Sciences Libraries, Cornell University, Ithaca, NY 14853, United States

Keeping up with physical scientists demands that more online information be discoverable. The Physical Sciences eLibrary at Cornell blends the most crucial components of the brick-and-mortar model with forward looking developments in the greater discovery landscape to build new tools for faceted browsing and locating specific material types. We are leveraging new infrastructure that allows subject librarians to layer annotations and additional metadata on top of library catalog records. We collaborate closely with the researchers and scientists in an iterative development cycle to build a responsive eLibrary that enables us to go beyond traditional services. In this talk we will provide an overview of our process and discuss how we are translating the delivery mechanisms of the library web presence at Cornell.

Sunday, April 7, 2013 6:30 pm - 8:30 pm

CINF Scholarship for Scientific Excellence - EVE Session
Morial Convention Center
Room: 343
Guenter Grethe, Organizers
, Presiding
  26 iBIOMES: Managing and sharing large biomolecular simulation datasets in a distributed environment with iRODS

Julien C Thibault1, julien.thibault@utah.edu, Thomas E Cheatham2,3, Julio C Facelli1,3. (1) Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah 84112, United States, (2) Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah 84112, United States, (3) Center for High-Performance Computing, University of Utah, Salt Lake City, Utah 84112, United States

During this presentation we will introduce the architecture of iBIOMES (Integrated BIOMolEcular Simulations), a distributed system for biomolecular simulation data management allowing storage and indexing of large datasets generated by Molecular Dynamics (MD) simulations, along with ab initio calculation results. The system architecture is based on iRODS, a data handling system developed by RENCI, and influenced by the experience gained from the Storage Resource Broker (SRB) system. iRODS provides the tools to register, move, and lookup files that are distributed over the network and stored in different types of disk (e.g. HPC servers, files servers, archive tapes). Registered files can be queried and retrieved based on system or user-defined metadata. We created customized interfaces on top of iRODS to facilitate the data registration process for biomolecular simulation datasets (e.g. AMBER, Gaussian). The process is highly customizable through XML descriptors, enabling users to choose which piece of data should be displayed to summarize the registered experiments. Data registration does not require physical transfer of the data, which makes it a great solution for researchers who want to expose existing datasets. Input and output files can be made available for download within a collaborative network to allow replication of results or comparison between methods (e.g. different force-fields). Finally data summarization and management are facilitated through a rich web interface that offers different visualization components for 3D structures and analysis data (e.g. time series plots, heatmaps). iBIOMES represents one of the first efforts to create an infrastructure for researchers to manage their MD data locally, expose their data to the community, and create collaborative networks.

27 Probing the substrate selectivity of the serotonin and dopamine transporter using structure based techniques

Amir Seddik1, amir.seddik@univie.ac.at, Harald H. Sitte2, Gerhard F. Ecker1. (1) Department of Medicinal Chemistry, University of Vienna, Vienna, Austria, (2) Department of Pharmacology, Medical University of Vienna, Vienna, Austria

Previous studies revealed that (S)-fenfluramine (SFF) shows high selectivity for SERT over DAT. In this study, this compound is therefore used as probe ligand to explore the molecular basis of substrate selectivity at these two neurotransmitter transporters. A set of nine high affinity phenylethylamines (PEAs) was docked into the substrate binding site of a SERT homology model. Energy minimization, common scaffold clustering and consensus scoring resulted in a final pose which was superposed with the highest-ranked SFF-DAT complex. Results showed a similar pose in both transporters, whereby SFF's CF3 group was placed inside a pocket. SFF's low affinity for DAT could thus not be explained by steric hindrance. However, local alignment indicates a more lipophilic SERT pocket and a halogen bond donating Thr439, both of which might explain SERT selectivity of (S)-fenfluramine. We acknowledge financial support provided by the Austrian Science Fund, grants F03502 and W1232.

28 New cheminformatics microscopes: Combining semantic web technologies, cheminformatical representations, and chemometrics for understanding and predicting chemical and biological properties

Egon L Willighagen, egon.willighagen@maastrichtuniversity.nl, Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, The Netherlands

Cheminformatics is a computational microscopy with which we study chemical properties. My research develops new microscopes based on cheminformatics, using semantic web and chemometrics technologies. This resulted in and contributed to many computational methods to handle chemical structures and predict their chemical, physical, and biological properties. These methods include computational software like the Chemistry Development Kit, visualization tools like Jmol, JChemPaint, and Bioclipse, and information retrieval technologies like OSCAR4, as well as data exchange standards like the Chemical Markup Language, the CHEMINF ontology, and other semantic solution aimed at reducing information loss, and new public chemical knowledge based, such as the Blue Obelisk Data Repository and the NanoWiki with toxicological properties of nanomaterials. These and other tools have used in the combination with statistical and machine learning methods to predict properties of various chemical properties, showing the importance of statistical and visual validation of found patterns.

29 Discovery of TLR2 antagonists by virtual screening

Manuela S Murgueitio1, m.murgueitio@fu-berlin.de, Sandra Santos-Sierra2, Gerhard Wolber1. (1) Institute of Pharmacy, Pharmaceutical Chemistry, Freie Universität Berlin, Berlin, Berlin 14195, Germany, (2) Institute of Clinical Pharmacology, Medizinische Universität Innsbruck, Innsbruck, Tirol A-6020, Austria

Toll-like receptors (TLRs) represent the first barrier in innate immune response and act as key players in the development of chronic inflammatory and autoimmune diseases. Thus, interest for identifying small organic molecules modulating TLRs has risen. In this study we present a virtual screening approach for the identification of novel TLR2 antagonists, combining ligand- and structure-based design. First, we performed a shape- and feature-based similarity search against commercially available compound collections, using TLR2 agonists from literature and two TLR2 antagonists previously identified in-house as query structures. Second, molecular interaction fields (MIFs) of the TLR2 binding site were calculated to derive a structure-based 3D pharmacophore that was then used for virtual screening. A selection of virtual screening hits was biologically tested in a cell-based assay for TLR2 inhibition, leading to several compounds with antagonistic activity (IC50) in the micromolar range.

Monday, April 8, 2013 8:30 am - 11:55 am

Advances in Visualizing and Analyzing Biomolecular Screening Data - AM Session Experimental Insights, Case Studies, and New Methods
Morial Convention Center
Room: 349
Cosponsored by COMP
Deepak Bandyopadhyay, Jun Huan, Organizers
Deepak Bandyopadhyay, Jun Huan, Presiding
8:30   Introductory Remarks
8:35 30 Dispensing processes profoundly impact biological assays and computational and statistical analyses

Sean Ekins1, ekinssean@yahoo.com, Joe Olechno2, Antony J Williams3. (1) Collaborations in Chemistry, Fuquay-Varina, NC 27526, United States, (2) Labcyte Inc, Sunnyvale, CA 94089, United States, (3) Royal Society of Chemistry, Wake Forest, NC 27587, United States

Dispensing processes profoundly influence estimates of biological activity of compounds. In this study using published inhibitor data for the tyrosine kinase EphB4, we show that IC50 values obtained via disposable tip-based serial dilution and dispensing versus acoustic dispensing differ by orders of magnitude with no correlation or ranking of datasets. Importantly, the computed EphB4 pharmacophores derived from this data differ for each dataset. Acoustic dispensing correctly highlights multiple hydrophobic features in the pharmacophore and correlates with calculated LogP values. Significantly, the acoustic dispensing-derived pharmacophore correctly identified active compounds in a test set. The subsequent analysis of crystal structures for other published EphB4 inhibitors and automated development of pharmacophores, indicated they were comparable to those developed with acoustic dispensing data. In short, dispensing processes are another important source of error in high-throughput screening that impacts computational and statistical analyses. These findings have far-reaching implications in biological research and in drug discovery.

9:00 31 On the compound annotation and cleaning the GSK screening collection initiative: The utility of an Inhibition Frequency Index (IFI)

Subhas J Chakravorty, subhas.j.chakravorty@gsk.com, James A Chan, Juan Luengo, Nicole M Greenwood, Ioana Popa-Burke, Ricardo Macarron. CSC, Sample Technologies, GSK, Upper Providence, PA 19426, United States

High throughput screening (HTS) constitutes a critical tool for the identification of lead molecules from primary screening assays for novel targets. GlaxoSmithKline (GSK) has continuously invested in the development and curation of its HTS collection to maximize the number of quality starting points for drug discovery and reduce the number of false positives from primary screens. An Inhibition Frequency Index (IFI) has been defined as a measure of promiscuity of individual compounds in HTS primary assays based upon activities tabulated over time in GSK's exhaustive screening assay tables. In this talk, we will present our analysis of the IFI profile across the GSK HTS collection. We will characterize the IFI profile with respect to desired physical properties, will discuss obvious substructures that may be less attractive as starting points, and will describe new classes of nuisance compounds revealed by our IFI analysis. In addition, we will examine the IFI of promiscuity filters described in the literature. There are many reasons why any particular molecule might display promiscuity: physical properties of the compound, properties of the target or target class, details of the assay and the assay technology and methodology. All of these factors must be considered when deciding whether to remove or retain a compound in a curated HTS collection.

9:25 32 Analyzing screening and similarity searching outcome in light of multiple approaches to the same target

Tina Garyantes, garyante@optonline.net, MAXSAR Biopharma, Warren, NJ 07059, United States

Traditional candidate discovery tends to be a linear process, with sequential optimization of compound parameters and hand-offs between teams, starting with a very basic analysis of primary screening data. Often the “best” series as identified by early assays are not the “best” series for late optimization. This talk will ask how we can improve lead series and potentially identify drug candidates by improved analysis early in the lead ID process. We will look at the value of and methods for analyzing multiple assays in parallel. An example of parallel optimization will be discussed where a phenotypic assay and a targeted assay were run in parallel. Data will be shown that supports the conclusion that running the parallel assays directs the team into different chemical space than a more traditional sequential approach. In addition, a novel method for analyzing the success of series expansion will be presented in this context.

9:50   Intermission
10:10 33 Sharing chemical information from screens without revealing structures

S. Joshua J. Swamidass1, swamidass@gmail.com, Matthew Matlock1, Dimitris K. Agrafiotis2. (1) Pathology and Immunology, Washington University, St Louis, Missouri 63108, United States, (2) Johnson &Johnson Pharmaceutical Research &Development, LLC, Spring House, Pennsylvania 19477, United States

We propose a new, secure method of sharing useful chemical information from small-molecule screens, without revealing complete structures of the screen's molecules. Recently, several groups have developed and published new methods of analyzing screening data, including advanced hit-picking, economic optimization, and visualizations. Applying these methods to private screening data requires strategies to share data without revealing chemical structures. This problem has been previously examined in the ADME prediction context, and mostly dismissed as impossible. In contrast, we present a new strategy for encoding molecules---based on anonymized scaffold networks---that seems to safely share enough chemical information to be useful in analyzing screening data, while also sufficiently blinding chemical structures. We present method details, and analyses of useful information conveyed and structure security. This approach enables sharing screening data across institutions, and may securely enable collaborative analysis that can yield better insight into screening technology as a whole.

10:35 34 Characterization and visualization of compound combination responses in a high throughout setting

Rajarshi Guha, guhar@mail.nih.gov, Lesley Mathews, John Keller, Paul Shinn, Craig Thomas, Anton Simeonov, Marc Ferrar. Preclinical Innovation, NIH Center for Advancing Translational Science, Rockville, MD 20850, United States

Many disease treatments make use of a single therapeutic agent. However, single agent therapies often produce unwanted side-effects and resistance. Combination therapies have been developed as a means to reduce side effects and avoid resistance, and are now successfully applied in diseases such as a cancer, AIDs and malaria. We have recently developed a high-throughput screening platform to test pairwise compound combinations, which can rapidly and systematically identify additive, synergistic and antagonistic drug combinations. This approach can easily generate hundreds of dose response matrices in a single study and can increase significantly when applied to multiple cell lines. We will present some methods to numerically compare combinations in terms of their response matrices, and visualize combination response comparisons within and across multiple cell lines. We will also describe how these techniques can be used to investigate putative polypharmacological effects that play a role in compound combination responses.

11:00 35 Characterizing activity landscapes using network-like similarity graphs to mine antibacterial data

Veerabahu Shanmugasundaram1, veerabahu.shanmugasundaram@pfizer.com, Steven Heck1, Justin Montgomery1, Preeti Iyer2, Dilyana Dimova2, Jürgen Bajorath2. (1) WorldWide Medicinal Chemistry, Pfizer, Groton, CT 06349, United States, (2) Life Science Informatics, University of Bonn, Bonn, Germany

Understanding structure-activity relationships(SAR) of a set of bioactive compounds is key to medicinal chemistry design. Computational techniques like statistical, pharmacophore and structure-based modeling can provide insights into SAR, but can also be misled by false assumptions. For example, one common assumption is that a series of similar compounds has a common binding mode or mechanism of action. Other assumptions include additivity of SAR from systematic changes, and the similarity principle: “similar molecules have similar biological effect”. Characterizing activity landscapes and early detection of activity cliffs are crucial to understanding global and local SAR characteristics, critical for ligand-based virtual screening or lead-optimization campaigns. Further, in data-sets with a wealth of historical information, visual examination of SAR could provide novel insights and reveal new directions. We adapted Network-like similarity graphs(NSGs, Bajorath and co-workers) to mine a Pfizer antibacterial data-set, and compare and contrast NSG-based visual data-mining results with a few traditional approaches.

11:25 36 From hits to leads: Data visualization of chemical scaffolds beyond traditional SAR exploration

Tyler Peryea, tyler.peryea@nih.gov, John Braisted, Ajit Jadhav, Rajarshi Guha, Noel Southall, Dac-Trung Nguyen. National Center for Advancing Translational Sciences, Division of Preclinical Innovation, Rockville, Maryland 20850, United States

Turning hits from an HTS campaign into potential leads is a critical part of early stage therapeutic discovery. Often, this amounts to distilling thousands of HTS hits into a small number of manageable candidate series (or singletons in some cases) for lead optimization. While the process is fairly straightforward, the tools involved can range anywhere from ad-hoc scripts to custom built solutions. We will describe methods that take a set of suitable seed compounds (e.g., the result of activity selection), extract a set of relevant scaffolds, and place the scaffolds in the context of high-quality external data sources. We couple the scaffold driven analytics with visualizations of scaffold structural properties and associated activities that allow efficient and intuitive exploration of candidate series. We will finally describe a software tool that implements these methods and highlight its utility on HTS data from the Molecular Libraries Program.

11:50   Concluding Remarks

Monday, April 8, 2013 8:10 am - 11:30 am

Scholarly Communication: New Models, New Media, New Metrics - AM Session
Morial Convention Center
Room: 350
David Martinsen, William Town, Colin Batchelor, Organizers
David Martinsen, Presiding
8:10   Introductory Remarks
8:15 37 Evolution of ACS DivCHED CCCE ConfChem: Gopher servers to the social semantic web

Robert E Belford1, rebelford@ualr.edu, Nitin Agarwal2, Steven Leimberg2, Jon L. Holmes3. (1) Department of Chemistry, University of Arkansas at Little Rock, Little Rock, AR 72204, United States, (2) Department of Information Science, University of Arkansas at Little Rock, Little Rock, AR 72204, United States, (3) Department of Chemistry, University of Wisconsin-Madison, Madison, WI, United States

2013 is the 20th anniversary of the online ConfChem conferences run by the ACS DivCHED Committee on Computers in Chemical Education (CCCE). This may be the oldest online conference in the chemical sciences. A brief history of ConfChem's evolution from ASCII text files being discussed over a listserver to the current platform will be provided. Archiving ConfChems and Newsletters has been a challenge for the CCCE. We will report on a project using the Drupal content management system to tackle this problem, while enhancing discovery within the archives by connecting past and present conferences through a socially generated tag filtration process, that bundles individual ConfChem and Newsletter papers along a variety of folksonomy defined themes. The folksonomy will lend a knowledge framework to facilitate discovery and innovation, enabling ConfChem to make the leap from Web 2.0 to Web 3.0, and function as a scholarly communication within the social-semantic web paradigm.

8:45 38 Data enhancing the RSC Archive

Colin Batchelor1, batchelorc@rsc.org, Ken Karapetyan2, Alexey Pshenichnov2, David Sharpe1, Jon Steele1, Valery Tkachenko2, Antony Williams2. (1) Royal Society of Chemistry, Cambridge, United Kingdom, (2) Royal Society of Chemistry, Wake Forest, North Carolina 27587, United States

The Royal Society of Chemistry has an archive of published journals and books stretching back to 1841. In the past decade we have digitized this archive and semantically enriched our frontfile data with chemical structures linked to our free online chemical compound database, ChemSpider. In this talk we will survey our recent efforts to extract all kinds of data - chemical structures, experimental and bibliographic data - from both our backfile and frontfile. We will also discuss our future work to extract chemical reactions to host in our ChemSpider Reactions database and will discuss the potential applications of optical structure recognition technologies for converting structure images to structures as well as using similar techniques to convert experimental spectral data into interactive data formats. A key aspect of this project is the delivery of a crowdsourcing platform for the interactive annotation and validation of the extracted data.

9:15 39 NIST-journal cooperation to improve the quality of published experimental data: Pre-acceptance evaluation and on-line tools

Robert D. Chirico1, robert.chirico@nist.gov, Michael Frenkel1, Joseph W. Magee1, Vladimir V. Diky1, Kenneth Kroenlein1, Chris D. Muzny1, Andrei F. Kazakov1, Ilmutdin M. Abdulagatov1, Gary R. Hardin1, Theodoor W. de Loos2, John P. O'Connell3, Clare McCabe4, Joan F. Brennecke5, Paul M. Mathias6, Anthony R. H. Goodwin7, Jiangtao Wu8, Kenneth N. Marsh9, Ronald D. Weir10, William E. Acree, Jr.11, Agilio Pádua12, W. M. (Mickey) Haynes1, Daniel G. Friend1, Andreas Mandelis13, Vicente Rives14, Christoph Schick15, Sergey Vyazovkin16, Ella Chen17. (1) Applied Chemicals and Materials Division, National Institute of Standards and Technology, Boulder, Colorado, United States, (2) Department of Process and Energy, Delft University of Technology, Delft, The Netherlands, (3) Department of Chemical Engineering, University of Virginia, Charlottesville, Virginia, United States, (4) Department of Chemical and Biomolecular Engineering and Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States, (5) Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana, United States, (6) Fluor Corporation, Aliso Viejo, California, United States, (7) Schlumberger Technology Corporation, Sugar Land, Texas, United States, (8) Center for Thermal and Fluid Science, Xi’an Jiaotong University, Xian, Shaanxi, China, (9) School of Mechanical and Chemical Engineering, University of Western Australia, Crawley, Australia, (10) Department of Chemistry and Chemical Engineering, Royal Military College of Canada, Kingston, Ontario, Canada, (11) Department of Chemistry, University of North Texas, Denton, Texas, United States, (12) Laboratoire Thermodynamique et Interactions Moléculaires, Université Blaise Pascal and CNRS, Clermont-Ferrand, France, (13) Faculty of Applied Science and Engineering, University of Toronto, Toronto, Ontario, Canada, (14) Departamento de Quimica Inorganica, Universidad de Salamanca, Salamanca, Spain, (15) Institute of Physics, Universität Rostock, Rostock, Germany, (16) Department of Chemistry, University of Alabama at Birmington, Birmingham, Alabama, United States, (17) Physical and Theoretical Chemistry, Elsevier, Amsterdam, The Netherlands

In 2008, the Journal of Chemical and Engineering Data, Fluid Phase Equilibria, The Journal of Chemical Thermodynamics, International Journal of Thermophysics, and Thermochimica Acta agreed to implement a new process for submission of manuscripts that include experimental thermodynamic and transport property data. For articles reporting new property data, NIST provides an initial report of relevant data sources from the NIST Archive (a Literature Report). This report is provided to Editors, who at their discretion, forward it to reviewers and/or authors. After peer review, but before acceptance, the experimental data are captured at NIST with Guided Data Capture (GDC) software and compared against the NIST Data Archive using the dynamic-data-evaluation algorithms of the NIST ThermoData Engine (TDE) software. A Data Report is generated and typographical or data-consistency problems are resolved before acceptance. These procedures are mandatory. A review of successes and challenges will be described, together with new online support tools.

9:45   Intermission
10:00 40 Reproducibility in cheminformatics and computational chemistry research: Certainly we can do better than this

Gregory A. Landrum, Gregory.Landrum@novartis.com, Novartis Institutes for BioMedical Research, Basel, Switzerland

Reproducibility is a central principle in scientific research. According to the American Chemical Society's “Ethical Guidelines to Publication of Chemical Research”: An author's central obligation is to present an accurate and complete account of the research performed, absolutely avoiding deception, including the data collected or used, as well as an objective discussion of the significance of the research. Data are defined as information collected or used in generating research conclusions. The research report and the data collected should contain sufficient detail and reference to public sources of information to permit a trained professional to reproduce the experimental observations [1]. This presentation will explore some of the implications of this for the publication of new computational methods and do a survey of the current state of affairs in the cheminformatics/computational chemistry literature. We will close with some suggestions, drawn from scientific journals in other areas, about how we can do better.
[1] http://pubs.acs.org/userimages/ContentEditor/
1218054468605/ethics.pdf

10:30 41 Reproducible research applied to cheminformatics experiments

Paul J Kowalczyk, paul.kowalczyk@scynexis.com, Department of Computational Chemistry, SCYNEXIS, Research Triangle Park, NC 27709-2878, United States

Gentleman and Temple Lang1 define reproducible research as “¼research papers with accompanying software tools that allow the reader to directly reproduce the results and employ the methods that are presented in the research paper.” We demonstrate how one might report cheminformatics experiments as instances of reproducible research, i.e.,how one might author and distribute integrated dynamic documents that contain the text, code, data and any auxiliary content needed to recreate the computational results. We show how the contents of these documents, including figures and tables, can be recalculated each time the document is generated. Open-source tools are used for all document generation: the R software environment2 is used to process chemical structures and mine and analyze biological and chemical data; the knitr3 package is used to generate reports (PDF); the markdown4 package is used to generate valid (X)HTML content; and the beamer5 package is used to create slides for presentation. Specific examples are presented for the visualization, analysis and mining of publicly available antimalarial datasets, with particular attention paid to automatically generating PDF reports, slides for presentations and valid (X)HTML content. All text, code, data and auxiliary content will be made freely available. 1 Gentleman, Robert &Duncan Temple Lang, “Statistical Analyses and Reproducible Research” (May 2004) Bioconductor Project Working Papers. Working Paper 2.http://biostats.bepress.com/bioconductor/paper2. 2. R Development Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/. 3. http://yihui.name/knitr/ 4. http://daringfireball.net/projects/markdown/ 5. https://bitbucket.org/rivanvx/beamer/wiki/Home

11:00 42 Mythbusting scientific knowledge transfer with nanoHUB.org: Collaborative research and dissemination with quantifiable impact on research and education

Gerhard Klimeck, gecko@purdue.edu, Network for Computational Nanotechnology, Purdue University, West Lafayette, IN 47907, United States

Gordon Moore's 1965 prediction of continued semiconductor device down-scaling and circuit up-scaling has become a self-fulfilling prophesy. Open source code development and sharing of the process modeling software SUPREM and the circuit modeling software SPICE ultimately transitioned into all electronic design software packages that power today's 280 billion dollar semiconductor industry. Can we duplicate such multi-disciplinary software, leading to true economic impact? What technologies might advance such a process? How can we deliver such software to a broad audience? How can we teach the next-generation engineers and scientists on the latest research software? This presentation will show how nanoHUB.org addresses these questions. By serving a community of 240,000 users in the past 12 months with an ever-growing collection of 3,100 resources, including over 260 simulation tools, nanoHUB.org has established itself as “the world's largest nanotechnology user facility” [1]. [1] Quote by Mikhail Roco, Senior Advisor for Nanotechnology, National Science Foundation.

Monday, April 8, 2013 1:30 pm - 5:30 pm

FoodInformatics: Applications of Chemical Information to Food Chemistry - PM Session
Morial Convention Center
Room: 349
Cosponsored by AGFD
Jose Medina-Franco, Karina Martinez Mayorga, Organizers
Jose Medina-Franco, Presiding
1:30   Introductory Remarks
1:35 43 Soft and fuzzy approach to food informatics

Gerald M Maggiora, gerry.maggiora@gmail.com, Pharmacology &Toxicology/Bio5 Institute, University of Arizona, Tucson, AZ 85721, United States and Cancer &Cell Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, United States

Since its inception at the beginning of the computer age, chemical informatics has played a growing role in many facets of chemical research. Most of its applications are in pharmaceutical research, but it is beginning to have an impact in other fields such as materials and food science. In chemical informatics, molecules are represented by descriptors associated with their structural features or properties, which typically are specified in terms of numerical or categorical variables. Some descriptors, however, cannot be described by such variables because of their inherent uncertainty or vagueness. For example, what are the “values” associated with a variable describing taste? Such variables, called linguistic variables, take on values such as sweet, sour, and bitter, which can be represented by fuzzy sets. The talk will explore how these novel variables are defined and how they can be applied in food informatics and related fields.

2:00 44 Exploring the chemical space of flavors and fragrances with the chemical universe database

Jean-Louis Reymond1, jean-louis.reymond@ioc.unibe.ch, Lars Ruddigkeit1, Mahendra Awale1, Guillaume Godin2. (1) Department of Chemistry and Biochemistry, University of Berne, Berne, Switzerland, (2) rue des Jeunes 1, Firmenich SA, Geneva, Switzerland

The chemical space describes the ensemble of all organic molecules. Recently we reported the Chemical Universe Database GDB-13 enumerating 977 million molecules of up to 13 atoms of C, N, O, S and Cl. Herein we report the analysis and visualization of analogs of typical flavors and fragrance compounds found in the database. Analog searching was performed using a searchable version of GDB-13 based on MQN-similarity. The analysis illuminates a vast yet well defined chemical space that offers many opportunities to broaden the range of flavors and fragrances to new structural types.

2:25 45 Tracing pharmacophore determinants of natural- and nutritional-like components in epigenetics and metabolism

Alberto Del Rio, alberto.delrio@gmail.com, Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna, Bologna, Bologna 40126, Italy

The pharmacology of natural and nutraceutical components is crucial in many cellular processes. Several clinical, physiopathological and epidemiological studies highlight the detrimental or beneficial role of natural/nutritional factors in conjunction with epigenetic and metabolic alterations. Furthermore there is growing evidence that metabolism is linked to epigenetic changes, especially in cancer pathologies. Examples of natural/nutritional molecules are: insulin, flavanol-rich compounds, short-chain fatty acids, indoles, and other dietary components that can be converted by cell metabolism into chemical intermediates implicated in epigenetic alterations. A deeper understanding on how metabolism and epigenetic are influenced by these components requires a molecular-level knowledge encompassing several aspects like the polypharmacological role of these compounds. In this context, pharmacophore-based techniques are described as a valuable chemoinformatic tool for tracking-down molecular determinants of nutriepigenomics and nutrimetabolomics molecular mechanisms.

2:50 46 Reverse pharmacognosy: From molecules to active ingredients

Quoc Tuan Do, quoctuan.do@greenpharma.com, Sylvain Blondeau, Philippe Bernard. Chemoinformatics, Greenpharma S.A.S., Orleans, Loiret 45100, France

A huge amount of data has been generated by decades of pharmacognosy supported by the rapid evolution of chemical, biological and computational techniques. How can we cope with this overwhelming mass of information? Reverse pharmacognosy was introduced with this aim in view. It proceeds from natural molecules to organisms that contain them via biological assays in order to identify an activity. In silico techniques and particularly inverse screening are key technologies to achieve this goal efficiently. Reverse pharmacognosy allows us to identify which molecule(s) from an organism is(are) responsible for the biological activity and the biological pathway(s) involved. An exciting outcome of this approach is that it not only provides evidence of the biological properties of plants but can also be applied to compounds from other sources. Thus, reverse pharmacognosy allows to accelerate the R&D of active molecules and ingredients.

3:15 47 Flavor network: Exploring the principles of food pairing

Sebastian E Ahnert1, sea31@cam.ac.uk, Yong-Yeol Ahn2, Albert-Laszlo Barabasi3. (1) Department of Physics, University of Cambridge, Cambridge, United Kingdom, (2) School of Informatics and Computing, University of Indiana, Bloomington, IN 47408, United States, (3) Department of Physics, Northeastern University, Boston, MA, United States

The cultural diversity of culinary practice, as illustrated by the variety of regional cuisines, raises the question of whether there are any general patterns that determine the ingredient combinations used in food today or principles that transcend individual tastes and recipes. We introduce a flavor network that captures the flavor compounds shared by culinary ingredients. Western cuisines show a tendency to use ingredient pairs that share many flavor compounds, supporting the so-called food pairing hypothesis. By contrast, East Asian cuisines tend to avoid compound sharing ingredients. Given the increasing availability of information on food preparation, our data-driven investigation opens new avenues towards a systematic understanding of culinary practice. In light of this we also discuss a variety of datasets on food ingredients and flavour compounds and how to combine them using large-scale data analysis.

3:40   Intermission
3:50 48 USP reference standards as value-added information sources in the Food Chemicals Codex (FCC)

Christina L. Cole, clc@usp.org, Department of Foods, Dietary Supplements, and Herbal Medicines, United States Pharmacopeial Convention, Rockville, MD 20852, United States

The FCC is a compendium of internationally recognized standards for the purity and identity of food ingredients, featuring ~1100 monographs for food-grade chemicals, processing aids, foods, flavoring agents, vitamins, and functional food ingredients. With public and stakeholder guidance, FCC establishes vetted food ingredient specifications and supporting test procedures that help manufacturers, suppliers, and regulators safeguard the food supply; USP Reference Standards are often incorporated into these monograph methods. The use of USP Reference Standards facilitates rapid and unbiased decisions on the quality and identity of food ingredients; enhances the reliability of analytical test results; and serves as a value-added information source wihtin the context of associated FCC monographs. This talk will focus on this "extra" information contained by the Reference Standard, and how it complements other FCC and USP activities such as the Food Fraud Database and recent workshops on the identity and characterization of functional food ingredients and probiotics.

4:15 49 Reaxys as an information resource for food chemistry

David Evans1, david.evans@reedelsevier.ch, Juergen Swienty-Busch2. (1) Reed Elsevier Properties SA, Neuchâtel, Switzerland, (2) Elsevier Information Systems GmbH, Frankfurt, Germany

There is a wide and varied literature related to food sciences, including chemistry and safety. This literature includes discussion on the chemistry of the major and minor food components, food additives, contaminants, and their corresponding metabolism and toxicology. This presentation will detail how the Reaxys database supports the search and retrieval of information pertinent to food chemistry. This includes reaction and substance / chemical property information. In order to ensure we cover relevant journals, and the appropriate data from within those journals, we have implemented an updated excerption strategy. We will discuss the identification, and excerption of relevant data from the appropriate literature with particular reference to the food chemistry literature.

4:40 50 Profiling the trace metal composition of wine as a function of storage temperature and packaging type

Helene Hopfer1,2, hhopfer@ucdavis.edu, Jenny Nelson3, Carolyn L Doyle1,2, Hildegarde Heymann1, Alyson E. Mitchell2,4, Susan E. Ebeler1,2. (1) Viticulture and Enology, University of California, Davis, CA 95616, United States, (2) Food Safety and Measurement Facility, University of California, Davis, CA 95616, United States, (3), Agilent Technologies Inc., Santa Clara, CA 95051, United States, (4) Food Science and Technology, University of California, Davis, CA 95616, United States

Trace metal patterns in grapes and wines are mostly studied to determine the geographical origin and authenticity using highly sensitive instrumentation such as inductively coupled plasma mass spectrometry (ICP-MS). However, ICP-MS can also be used to study possible metal contamination of wines during the winemaking and storage processes. In the present study we looked at the changes in the trace metal composition of Cabernet Sauvignon wine that had been stored in different packaging configurations (various closures and glass alternatives) for a period of 6 months. The effect of storage temperature and packaging type was studied by monitoring 15 elements using a quantitative ICP-MS method. Significant changes in the elements Cr, V, Sn and Pb were found among the different packaging types and the storage temperatures. Multivariate statistical analysis tools were used to evaluate the changing metal profiles in the wines with the various packaging treatments.

5:05 51 Mining the protein space to determine prevalence of fragments identical with allergenic epitopes - chicken egg protein fragments as an example

Piotr Minkiewicz, minkiew@uwm.edu.pl, Monika Protasiewicz, Małgorzata Darewicz, Karolina Hurman, Anna Iwaniak. Department of Food Biochemistry, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland

The aim of this study was to find proteins containing fragments identical with epitopes experimentally detected in allergenic chicken egg proteins. Epitope sequences were taken from BIOPEP database. WU-BLAST program was used for the UniProt protein sequence database screening. Both short fragments containing 5-8 amino acid residues and longer ones containing at least 9 residues were present in protein sequences. Longer ones were present only in the homologs of the chicken egg proteins. The shorter fragmens were found also in other proteins. The existence of common epitopes in proteins of different species, may lead to occurrence of cross-reactivity. The presence of such fragments indicates at least the existence of individuals whose antibodies interact with set of proteins containing the same epitopes. The approach involving protein database screening may be useful in searching the new allergenic proteins including epitopes common to known allergens.

Monday, April 8, 2013 1:00 pm - 5:30 pm

Scholarly Communication: New Models, New Media, New Metrics - PM Session
Morial Convention Center
Room: 352
Cosponsored by YCC
William Town, Colin Batchelor, David Martinsen, Organizers
Colin Batchelor, William Town, Presiding
1:00   Introductory Remarks
1:05 52 Supplementary journal article materials: Summary of the NISO/NFAIS recommendations

David P Martinsen, d_martinsen@acs.org, Publications DIvision, American Chemical Society, Washington, DC 20036, United States

As journals migrated from print to the Web, the nature of supplemental materials began to change. In the print world, supplemental materials were usually text or graphics which were too expensive to be formated and printed, and so were distributed on microfiche. Datasets were also included as printouts of tables of numbers. The digital world brought the promise of overcoming the limitations of print to more easily publish and distribute supplemental materials in a more usable form. The result has been a wide degree of variation among publishers, varying expectations among authors, editors, reviewers and readers, and an increasing volume of supplemental materials which places an increasing burden on all parts of the publication process. NISO and NFAIS convened a working group to examine the current status of supplemental journal article materials and to recommend best practices for publishing these materials. A summary of the recommendations will be presented.

1:35 53 Digital research that is discoverable, citable, and linked to primary research literature: The Data Citation Index

Daphne Grecchi, Daphne.Grecchi@thomsonreuters.com, Scientific &Scholarly Research, Thomson Reuters, Philadelphia, PA 19130, United States

The worldwide growth in data repositories and the requirement by funding agencies and publishers to have researchers put their data in them have increased the need for a comprehensive view of research data and its use. Digital scholarly data plays an important role in research, advancing important scientific discoveries through validated data points. Data Citation Index, available on Thomson Reuters Web of Knowledge focuses on the deployment of a citation resource that makes research data discoverable, citable and seamlessly linked to the primary research literature. Now quality research data from data repositories across disciplines and around the world can be searched and assessed from within a single point of access, where data can be viewed within the context of the scholarly research it supports. This presentation reviews how the Data Citation Index connects digital research to powerful new discovery tools and how the inclusion of data and digital scholarship maximizes the benefits of powerful citation search capabilities and navigation features available within Web of Knowledge.

2:05 54 From inception to collaboration to publication: A complete integrated research management platform for researchers

Judy Chen, j_chen@acs.org, Editorial Office Operations, American Chemical Society, Washington, District of Columbia 20036, United States

As a result of the vast amount of information and resources available on the web coupled with rapid technological advances, a paradigm shift toward the digital world is occurring. Mobile devices add another dimension in the shift, making the digital world readily accessible from anywhere. For scientists, this shift has already resulted in the transition from physical libraries and paper journals to websites and electronic tools to conduct scholarly research and to communicate with colleagues and collaborators. While each person has his/her own research workflow, the components that make up this process are similar. We present here a complete integrated research management and collaboration suite, which features a reference management system, calendars, and task management for effectively organizing research work. This is combined with an online social profile and many other features to facilitate the complex and entropic sharing process.

2:35   Intermission
2:50 55 Evolving with our community: The RSC's approach to the challenges and opportunities of scientific communication

Richard Kidd, kiddr@rsc.org, James Milne. Royal Society of Chemistry, Cambridge, United Kingdom

We will be reporting on the RSC's approach to evolving business models, including the support to UK institutions and researchers to help prepare for the transition to OA resulting from the recent Finch recommendations and RCUK policy. We will also report on the expansion of the publishing portfolio, and on how we are developing the RSC's support for primary data. The RSC's aim is to test and evolve new technological, social and business models in parallel to provide support for the research community.

3:20 56 We're not in Kansas anymore

Roger Schenck, rschenck@cas.org, Marketing, Chemical Abstracts Service, Columbus, Ohio 43202, United States

As the only organization in the world whose objective is to find, collect, and organize all publicly disclosed substance information, CAS is challenged to keep pace with rapidly changing publication models in the scientific community. This presentation will cover ways in which CAS is handling the growing number of ahead-of-print articles, web-only publications, open access journals, the synthetic information and experimental data increasingly reported in supplementary material, and even the new visual journals where articles are actually videos of experiments. The talk will conclude with examples of how CAS is working with primary publishers and patent authorities to deliver chemistry research to the scientific community in a more integrated model.

3:50 57 Challenging, cajoling, and rewarding the community for their contributions to online chemistry

Antony J Williams1, tony27587@gmail.com, Valery Tkachenko1, Alexey Pshenichnov1, Will Russell2, Jack Rumble2, David Leeming2. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) Cheminformatics, Royal Society of Chemistry, Cambridge, United Kingdom

Chemistry online is represented in various ways including publications, presentations, blog posts, wiki-contributions, data depositions, curations and annotations. Encouraging participation from the community to participate in and comment on the information delivered via these various formats would likely provide for a rich dialog exchange in some cases and improved data quality in others. At the Royal Society of Chemistry we have a number of platforms that are amenable to contribution. This presentation will provide an overview of our experiences in engaging the community to interact with our various forms of content and discuss new approaches we are utilizing to encourage crowdsourced participation.

4:20   Intermission
4:30 58 Science comedian's guide to communicating science to general audiences

Brian Malow, brian.malow@naturalsciences.org, North Carolina Museum of Natural Sciences, Raleigh, NC 27601, United States

This presentation will provide an overview of the lessons learned in communicating science for over 15 years to both general audiences and specialized groups (including NASA, NIST and the ACS). Currently working on science communications at the North Carolina Museum of Natural Sciences, Brian Malow has made a number of science videos for Time Magazine's website and contributed a number of audio essays to Neil deGrasse Tyson's radio show. He is Earth's self-proclaimed premier science comedian and this presentation is sure to both amuse and educate.

Monday, April 8, 2013 1:15 pm - 5:15 pm

Food for Thought: Alternative Careers in Chemistry - PM Session
Morial Convention Center
Room: 350
Cosponsored by PROF, YCC
Donna Wrublewski, Patricia Meindl, Organizers
Patricia Meindl, Presiding
1:15   Introductory Remarks
1:20 59 From studying block copolymers to chemical information: A journey of an alternative chemistry career as an academic science librarian

Vincent F Scalfani, vfscalfani@ua.edu, University Libraries, The University of Alabama, Tuscaloosa, AL 35487, United States

Unbeknownst to many chemistry graduate students, crucial skills relevant to a career in science informatics, information technology and librarianship are acquired daily while working on their dissertations - both in and outside of the laboratory. For example, planning synthetic routes, analyzing data trends, and writing technical papers all require information seeking skills. This presentation will highlight how to transition from the laboratory to the library, as well as how Chemists can bring a fresh perspective to the Library. My personal journey of transitioning from studying block copolymer nanomaterials as a Ph.D. Chemistry student to working as the new Science Librarian at the University of Alabama will be used as one example of how to make this transition. In addition, projects started in my first year at the University of Alabama Rodgers Science and Engineering Library will be discussed.

1:50 60 Successful careers in science: Why moving away from the bench brings you closer to advancing research

Lily Khidr, l.khidr@elsevier.com, Elsevier, New York City, New York 10010, United States

This lecture will delineate why it is advantageous for active research scientists to consider the road to becoming an Editor and Publisher, what the competitive process of achieving these high-profile positions entails, and the value-add of the job within an evolving scientific and academic landscape. Dr. Khidr is an academic research scientist that has served as an Editor at both Nature and Science Magazines, and currently holds the position of Publisher at Elsevier in New York City.

2:20 61 Cheminformatics career at the Royal Society of Chemistry, UK

Colin Batchelor, batchelorc@rsc.org, Royal Society of Chemistry, United Kingdom

I am a Senior Cheminformatics Analyst at the Royal Society of Chemistry in Cambridge, UK, working on text-mining, semantic web technology and ChemSpider. My doctoral work, however, was on applying multichannel quantum defect theory to the ionization dynamics of small molecules in the gas phase. In this talk I will discuss what this entails in practice, how my background in theoretical chemistry and journal publishing prepared me for it and what it's like to work in cheminformatics for a learned society.

2:50 62 Patent law as a non-traditional career in chemistry

Sarah P Hasford, shasford@mcguirewoods.com, McGuireWoods, LLP, Tysons Corner, VA 22102, United States

This presentation will focus on a patent attorney's role in protecting innovation and discuss the ups and downs of practicing patent law for those who may be interested in exploring a patent law career for themselves.

3:20   Intermission
3:35 63 Role of personal interests, motivation, and timing in the transitioning to a new career

Svetla Baykoucheva, sbaykouc@umd.edu, White Memorial Chemistry Library, University of Maryland, College Park, MD 20742, United States

This paper shows how someone with an educational and research background in chemistry and the life sciences (BS and MS in Chemistry, PhD in Microbiology) could maintain for many years parallel interests in citation indexing that led to a seamless transition to a new career in information science and librarianship. From working at the lab bench and publishing in scientific journals, to joining a scientific publisher (ACS) as a librarian, and finally, to going back to academic life (University of Maryland, College Park) to manage a chemistry library and teach chemical information were career turns that required strong motivation and depended to a large degree on timing.

4:05 64 From the bench to the board

Rebecca Boudreaux, rebecca@oberonfuels.com, Oberon Fuels, La Jolla, CA 92038, United States

What does it take to go from chemist to company co-founder? How can you develop from scientist to startup expert?
This talk will discuss transitioning from the bench to the board, and draw on my background in chemistry, business, and leadership. As a PhD-trained chemist, I?ve used my scientific expertise to launch ventures designed to treat cancer and develop a cleaner alternative to diesel, among others. I will show how my educational background played a key role in helping me develop the skills necessary to pursue these opportunities, and offer advice for those interested in a similar career.

4:35 65 Political "science": Opportunities for chemists in science policy

Ticora V Jones, ticjones@usaid.gov, Office of Science &Technology, US Agency for International Development, Washington, DC, United States

This talk will describe the opportunities for chemists to involve themselves in the arena of science policy making as an alternative career. The experience of a past science policy fellow with Legislative and Executive branch experience will be discussed. The talk will include a discussion of how fellowships translated into careers in government and a discussion of how applicants can prepare competitive applications.

5:05   Concluding Remarks

Monday, April 8, 2013 8:00 pm - 10:00 pm

Sci-Mix - EVE Session
Morial Convention Center
Room: Hall D
Jeremy Garritano, Organizers
, Presiding
  1 Characterizing the diversity and biological relevance of the MLPCN assay manifold and screening set

Jun Huan, jhuan@ittc.ku.edu, EECS, Univ. of Kansas, lawrence, ks 66049, United States

The NIH Molecular Libraries Probe Production Centers Network (MLPCN) aims to remediate key deficiencies in drug discovery and chemical biology, through pursuit of therapeutically feasible but unprofitable drug targets, undruggable genes of biochemical interest, and development of chemically diverse, biologically relevant screening sets. This paper evaluates the novelty of MLPCN targets, their propensity for undergoing modulations of biochemical or therapeutic relevance, the degree of chemical diversity inherent in the MLPCN screening set, and biogenic bias of the set. Our analyses suggest that MLPCN targets cover biologically interesting pathway space that is distinct from established drug targets, but may include genes whose overly complex protein interactions may obfuscate pathway effects and enable therapeutically undesirable side-effect risks. We find the MLPCN screening set to be chemically diverse, and it has greater biogenic bias than comparable collections of commercially available compounds. Biogenic enhancements such as incorporation of more metabolite-like chemotypes are suggested.

4 PubChem widgets

Lianyi Han, hanl@ncbi.nlm.nih.gov, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, United States

Modern interactive web and mobile applications for chemistry and biology often need to integrate information from multiple resources, such as biochemical analysis, patents, and publications. This typically requires an underlying data warehouse containing billions of chemical and bioactivity records coupled with web services that deliver "Asynchronous JavaScript and XML" (AJAX) and JSONP(or "JSON with padding") content to applications. PubChem Widgets provide a rapid development tool to create content-rich and interactive UIs without requiring the development of such a data warehouse. These widgets show commonly requested PubChem data views, such as 1) patents associated with a PubChem compound or substance; 2) bioactivity outcomes for a PubChem compound, substance, or bioassay; 3) Literature available for a compound, substance, or bioassay. These widgets are easily embedded into your own web application or HTML pages, and can also be used to access annotation data from native desktop and mobile applications. Beta release available: http://pubchem.ncbi.nlm.nih.gov/widget/docs/widget_help.html.

  6 WITHDRAWN
16 On-line graph mining and visualization of protein-ligand interactome

Clara Ng1, lxi0003@hunter.cuny.edu, Lei Xie1,2, lxi0003@hunter.cuny.edu. (1) Department of Computer Science, City University of New York, New York, NY, United States, (2) Graduate Center, City University of New York, New York, NY, United States

Recent high-throughput screens have generated a lot of protein-ligand interaction data; for example, over one million compounds are associated with the 4422 proteins in ChEMBL. Recent attempts to mine and visualize this large protein-ligand interaction dataset have mapped chemicals into a high-dimensional feature space and visualized it using dimensionality reduction techniques. We propose a different approach to exploring the protein-ligand interactome efficiently, effectively, and intuitively. We link all chemicals and targets into an all-against-all chemical similarity network and target similarity network, respectively. The networks are connected as a bipartite graph through protein-ligand interactions. Efficient graph clustering and mining algorithms are applied to identify chemical and protein patterns underlying binding promiscuity and specificity. Although the chemical/protein similarity network is computationally intensive, it need only be built once and updated regularly. As demonstrated in case studies for anti-infectious drug discovery, our method may facilitate drug repurposing, side-effect prediction, and polypharmacology drug design.

17 Encoded Library Technology data analysis: Finding the grain of sand you want without getting a sunburn

Kenneth E Lind, kenneth.e.lind@gsk.com, Neil R Carlson, Ninad V Prabhu, Jeff A Messer. MDR Boston, GlaxoSmithKline, Waltham, MA 02451, United States

Encoded Library Technology (ELT) is a part of GSK's integrated Hit ID strategy. ELT involves creation of large combinatorial libraries whose members (sometimes over a billion!) are encoded by a unique combination of DNA tags. Binders to a molecular target are selected from these libraries and identified using next-generation DNA sequencing. We have developed a platform for translating sequence data back to the encoded chemical warhead, detecting features that are enriched in the selection, and summarizing and annotating the selection experiment. Each week our platform processes over 100 million DNA sequences - larger than the entire human genome. Data visualization is integrated into the TIBCO Spotfire platform, allowing scientists to view summaries of the large data sets, determine the most important chemical space, and then drill down to specific results to prioritize compounds for synthesis and assays. We will describe method details and present examples to highlight our analysis and visualization tools.

19 How to highlight hits: Advances in visual data analytics tools for HTS data

Jesse A. Gordon, jesse.gordon@dotmatics.com, Jess Sager. Application Science, Dotmatics, Ltd., Woburn, MA 01801, United States

We face a huge dataset from a screening run and we want to analyze the results to pick compounds for the next screening run. How do we sift through the millions of data points to figure out which are meaningful hits, and then organize those hits into a database from which we can intelligently predict good prospects for the next screening run? We face a series of challenges in HTS data analysis which will be outlined in this presentation followed by solutions offered through modern chemoinformatics and visual data analytics tools. We look at the difference between the "Old Way" -- grid after grid in Excel with manual calculations -- and the "New Way" -- clicking on visually distinctive points highlighted in red on automatically-generated curves.

26 iBIOMES: Managing and sharing large biomolecular simulation datasets in a distributed environment with iRODS

Julien C Thibault1, julien.thibault@utah.edu, Thomas E Cheatham2,3, Julio C Facelli1,3. (1) Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah 84112, United States, (2) Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah 84112, United States, (3) Center for High-Performance Computing, University of Utah, Salt Lake City, Utah 84112, United States

During this presentation we will introduce the architecture of iBIOMES (Integrated BIOMolEcular Simulations), a distributed system for biomolecular simulation data management allowing storage and indexing of large datasets generated by Molecular Dynamics (MD) simulations, along with ab initio calculation results. The system architecture is based on iRODS, a data handling system developed by RENCI, and influenced by the experience gained from the Storage Resource Broker (SRB) system. iRODS provides the tools to register, move, and lookup files that are distributed over the network and stored in different types of disk (e.g. HPC servers, files servers, archive tapes). Registered files can be queried and retrieved based on system or user-defined metadata. We created customized interfaces on top of iRODS to facilitate the data registration process for biomolecular simulation datasets (e.g. AMBER, Gaussian). The process is highly customizable through XML descriptors, enabling users to choose which piece of data should be displayed to summarize the registered experiments. Data registration does not require physical transfer of the data, which makes it a great solution for researchers who want to expose existing datasets. Input and output files can be made available for download within a collaborative network to allow replication of results or comparison between methods (e.g. different force-fields). Finally data summarization and management are facilitated through a rich web interface that offers different visualization components for 3D structures and analysis data (e.g. time series plots, heatmaps). iBIOMES represents one of the first efforts to create an infrastructure for researchers to manage their MD data locally, expose their data to the community, and create collaborative networks.

28 New cheminformatics microscopes: Combining semantic web technologies, cheminformatical representations, and chemometrics for understanding and predicting chemical and biological properties

Egon L Willighagen, egon.willighagen@maastrichtuniversity.nl, Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, The Netherlands

Cheminformatics is a computational microscopy with which we study chemical properties. My research develops new microscopes based on cheminformatics, using semantic web and chemometrics technologies. This resulted in and contributed to many computational methods to handle chemical structures and predict their chemical, physical, and biological properties. These methods include computational software like the Chemistry Development Kit, visualization tools like Jmol, JChemPaint, and Bioclipse, and information retrieval technologies like OSCAR4, as well as data exchange standards like the Chemical Markup Language, the CHEMINF ontology, and other semantic solution aimed at reducing information loss, and new public chemical knowledge based, such as the Blue Obelisk Data Repository and the NanoWiki with toxicological properties of nanomaterials. These and other tools have used in the combination with statistical and machine learning methods to predict properties of various chemical properties, showing the importance of statistical and visual validation of found patterns.

29 Discovery of TLR2 antagonists by virtual screening

Manuela S Murgueitio1, m.murgueitio@fu-berlin.de, Sandra Santos-Sierra2, Gerhard Wolber1. (1) Institute of Pharmacy, Pharmaceutical Chemistry, Freie Universität Berlin, Berlin, Berlin 14195, Germany, (2) Institute of Clinical Pharmacology, Medizinische Universität Innsbruck, Innsbruck, Tirol A-6020, Austria

Toll-like receptors (TLRs) represent the first barrier in innate immune response and act as key players in the development of chronic inflammatory and autoimmune diseases. Thus, interest for identifying small organic molecules modulating TLRs has risen. In this study we present a virtual screening approach for the identification of novel TLR2 antagonists, combining ligand- and structure-based design. First, we performed a shape- and feature-based similarity search against commercially available compound collections, using TLR2 agonists from literature and two TLR2 antagonists previously identified in-house as query structures. Second, molecular interaction fields (MIFs) of the TLR2 binding site were calculated to derive a structure-based 3D pharmacophore that was then used for virtual screening. A selection of virtual screening hits was biologically tested in a cell-based assay for TLR2 inhibition, leading to several compounds with antagonistic activity (IC50) in the micromolar range.

31 On the compound annotation and cleaning the GSK screening collection initiative: The utility of an Inhibition Frequency Index (IFI)

Subhas J Chakravorty, subhas.j.chakravorty@gsk.com, James A Chan, Juan Luengo, Nicole M Greenwood, Ioana Popa-Burke, Ricardo Macarron. CSC, Sample Technologies, GSK, Upper Providence, PA 19426, United States

High throughput screening (HTS) constitutes a critical tool for the identification of lead molecules from primary screening assays for novel targets. GlaxoSmithKline (GSK) has continuously invested in the development and curation of its HTS collection to maximize the number of quality starting points for drug discovery and reduce the number of false positives from primary screens. An Inhibition Frequency Index (IFI) has been defined as a measure of promiscuity of individual compounds in HTS primary assays based upon activities tabulated over time in GSK's exhaustive screening assay tables. In this talk, we will present our analysis of the IFI profile across the GSK HTS collection. We will characterize the IFI profile with respect to desired physical properties, will discuss obvious substructures that may be less attractive as starting points, and will describe new classes of nuisance compounds revealed by our IFI analysis. In addition, we will examine the IFI of promiscuity filters described in the literature. There are many reasons why any particular molecule might display promiscuity: physical properties of the compound, properties of the target or target class, details of the assay and the assay technology and methodology. All of these factors must be considered when deciding whether to remove or retain a compound in a curated HTS collection.

32 Analyzing screening and similarity searching outcome in light of multiple approaches to the same target

Tina Garyantes, garyante@optonline.net, MAXSAR Biopharma, Warren, NJ 07059, United States

Traditional candidate discovery tends to be a linear process, with sequential optimization of compound parameters and hand-offs between teams, starting with a very basic analysis of primary screening data. Often the “best” series as identified by early assays are not the “best” series for late optimization. This talk will ask how we can improve lead series and potentially identify drug candidates by improved analysis early in the lead ID process. We will look at the value of and methods for analyzing multiple assays in parallel. An example of parallel optimization will be discussed where a phenotypic assay and a targeted assay were run in parallel. Data will be shown that supports the conclusion that running the parallel assays directs the team into different chemical space than a more traditional sequential approach. In addition, a novel method for analyzing the success of series expansion will be presented in this context.

  33 WITHDRAWN
36 From hits to leads: Data visualization of chemical scaffolds beyond traditional SAR exploration

Tyler Peryea, tyler.peryea@nih.gov, John Braisted, Ajit Jadhav, Rajarshi Guha, Noel Southall, Dac-Trung Nguyen. National Center for Advancing Translational Sciences, Division of Preclinical Innovation, Rockville, Maryland 20850, United States

Turning hits from an HTS campaign into potential leads is a critical part of early stage therapeutic discovery. Often, this amounts to distilling thousands of HTS hits into a small number of manageable candidate series (or singletons in some cases) for lead optimization. While the process is fairly straightforward, the tools involved can range anywhere from ad-hoc scripts to custom built solutions. We will describe methods that take a set of suitable seed compounds (e.g., the result of activity selection), extract a set of relevant scaffolds, and place the scaffolds in the context of high-quality external data sources. We couple the scaffold driven analytics with visualizations of scaffold structural properties and associated activities that allow efficient and intuitive exploration of candidate series. We will finally describe a software tool that implements these methods and highlight its utility on HTS data from the Molecular Libraries Program.

41 Reproducible research applied to cheminformatics experiments

Paul J Kowalczyk, paul.kowalczyk@scynexis.com, Department of Computational Chemistry, SCYNEXIS, Research Triangle Park, NC 27709-2878, United States

Gentleman and Temple Lang1 define reproducible research as “¼research papers with accompanying software tools that allow the reader to directly reproduce the results and employ the methods that are presented in the research paper.” We demonstrate how one might report cheminformatics experiments as instances of reproducible research, i.e.,how one might author and distribute integrated dynamic documents that contain the text, code, data and any auxiliary content needed to recreate the computational results. We show how the contents of these documents, including figures and tables, can be recalculated each time the document is generated. Open-source tools are used for all document generation: the R software environment2 is used to process chemical structures and mine and analyze biological and chemical data; the knitr3 package is used to generate reports (PDF); the markdown4 package is used to generate valid (X)HTML content; and the beamer5 package is used to create slides for presentation. Specific examples are presented for the visualization, analysis and mining of publicly available antimalarial datasets, with particular attention paid to automatically generating PDF reports, slides for presentations and valid (X)HTML content. All text, code, data and auxiliary content will be made freely available. 1 Gentleman, Robert &Duncan Temple Lang, “Statistical Analyses and Reproducible Research” (May 2004) Bioconductor Project Working Papers. Working Paper 2.http://biostats.bepress.com/bioconductor/paper2. 2. R Development Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/. 3. http://yihui.name/knitr/ 4. http://daringfireball.net/projects/markdown/ 5. https://bitbucket.org/rivanvx/beamer/wiki/Home

46 Reverse pharmacognosy: From molecules to active ingredients

Quoc Tuan Do, quoctuan.do@greenpharma.com, Sylvain Blondeau, Philippe Bernard. Chemoinformatics, Greenpharma S.A.S., Orleans, Loiret 45100, France

A huge amount of data has been generated by decades of pharmacognosy supported by the rapid evolution of chemical, biological and computational techniques. How can we cope with this overwhelming mass of information? Reverse pharmacognosy was introduced with this aim in view. It proceeds from natural molecules to organisms that contain them via biological assays in order to identify an activity. In silico techniques and particularly inverse screening are key technologies to achieve this goal efficiently. Reverse pharmacognosy allows us to identify which molecule(s) from an organism is(are) responsible for the biological activity and the biological pathway(s) involved. An exciting outcome of this approach is that it not only provides evidence of the biological properties of plants but can also be applied to compounds from other sources. Thus, reverse pharmacognosy allows to accelerate the R&D of active molecules and ingredients.

59 From studying block copolymers to chemical information: A journey of an alternative chemistry career as an academic science librarian

Vincent F Scalfani, vfscalfani@ua.edu, University Libraries, The University of Alabama, Tuscaloosa, AL 35487, United States

Unbeknownst to many chemistry graduate students, crucial skills relevant to a career in science informatics, information technology and librarianship are acquired daily while working on their dissertations - both in and outside of the laboratory. For example, planning synthetic routes, analyzing data trends, and writing technical papers all require information seeking skills. This presentation will highlight how to transition from the laboratory to the library, as well as how Chemists can bring a fresh perspective to the Library. My personal journey of transitioning from studying block copolymer nanomaterials as a Ph.D. Chemistry student to working as the new Science Librarian at the University of Alabama will be used as one example of how to make this transition. In addition, projects started in my first year at the University of Alabama Rodgers Science and Engineering Library will be discussed.

66 From virtual screening to real taste modulators: Bitter blockers and sweetness enhancers

Quoc Tuan Do2, quoctuan.do@greenpharma.com, Terry L. Peppard1, John Scire1, Philippe Bernard2. (1) Robertet Flavors, Inc., Piscataway, New Jersey 08854, United States, (2) Chemoinformatics, Greenpharma S.A.S., Orleans, Loiret 45100, France

In order to find new bitterness blockers and sweetness enhancers, a virtual screening strategy was implemented using ligand-based (pharmacophore, similarity) and protein-based (docking) approaches. A database of known blockers and enhancers were gathered from the scientific literature and from Robertet Flavors in-house data along with important targets involved in the sensing of the two tastes eg T1R, T2R... Several candidates were identified and the most promising ones in terms of potential activity, safety, patentability and industrialization were further evaluated on a panel of tasters according to DIN 10955: 2004-06 standards.

67 Navigation through chemogenomics data with SPID

Austin B Yongye1, José L Medina-Franco2, jose.medina.franco@gmail.com. (1) Torrey Pines Institute for Molecular Studies, Port St. Lucie, Florida 34987, United States, (2) Department of Physicochemistry, Universidad Nacional Autonoma de Mexico, Mexico City, Mexico

Chemogenomics data sets play a central role in current drug discovery endeavors including polypharmacology and drug repurposing projects. In this work, we present a general method to systematically analyzing the structure-activity relationships of a large screening profile data with emphasis on identifying structural changes that have a significant impact on the number of proteins to which a compound binds. At the core of this approach is the Structure-Promiscuity Index Difference (SPID) metric that captures differences in the number of proteins bound related to changes in molecular structure. The SPID measure is inspired by the Structure-Activity Landscape Index (SALI) measure commonly used in activity landscape modeling. We discuss applications of this approach to mine a public data set of more than 15,000 compounds from different sources screened across 100 sequence-unrelated proteins.

68 Inferring odor detection threshold (ODT) using chemical structure based properties

Jae Hong Shin, shin37@indiana.edu, Sebastian E. Ahnert, David J. Wild, Yong-Yeol Ahn. School of Informatics and Computing, Indiana University, Bloomington, IN 47408, United States

The odor detection threshold (ODT) of a molecule is the lowest concentration of the molecule that can be detected by human olfactory perception. Although large amounts ODT measurement data exist, it is not yet clear whether it is possible to computationally predict ODT values from the physico-chemical properties of molecules. In this study, we aim to build a model that predicts ODT values using molecular physico-chemical descriptors. We use a random forest regression model for 350 odor molecules with physico-chemical molecular descriptors and other metadata. We obtained the correlation coefficient R2=0.76, and 2-fold cross validated R2= 0.64 between the observed and predicted ODTs. When metadata is removed in order to build a pure molecular descriptor based model, the correlation coefficient, R2=0.63, and 2-fold cross validation values of R2=0.40 are obtained. Finally, we apply this model in order to build a generalized predictive model for a very large odor threshold data set containing 1885 ODT values.

70 ChEMBL tools and services: Creating bridges between cheminformatics and bioinformatics

Mark Davies, mdavies@ebi.ac.uk, Louisa J. Bellis, A. Patricia Bento, Jon Chambers, Anna Gaulton, Anne Hersey, Yvonne Light, George Papadatos, John P. Overington. ChEMBL Group, EMBL-European Bioinformatics Institute, Cambridge, United Kingdom

ChEMBL (http://www.ebi.ac.uk/chembl) is a database of bioactive drug-like small molecules, which has seen rapid growth in content since its first release three years ago. The focus of the talk will be to provide an overview of new freely available tools and services developed by the ChEMBL group, which can be used to link chemical data to biological resources. An example of such a service is UniChem, which is an independent InChi-based cross-referencing service, used to create links to external resources (e.g. PDBe). We have also developed some new domain-focused portals for integration of ChEMBL data with comparative genomics data (e.g. Ensembl) and consequent differences in ADME properties. Further advances in creating links from the ChEMBL database have been made with the first official release of the ChEMBL-RDF data model. This transformation has made it possible to link to and query data stored in other RDF models (e.g. Gene Expression Atlas).

72 About the impact of open access bioassay data on cheminformatic approaches

Barbara Zdrazil, barbara.zdrazil@univie.ac.at, Gerhard F. Ecker. Department of Medicinal Chemistry, University of Vienna, Vienna, Austria A-1090, Austria

As a consequence of open innovation initiatives, modern drug discovery now includes the use of open access databases for the retrieval of small compound bioactivity data. However, we recently showed for human P-glycoprotein, that an uncritical interpretation of such data will lead to datasets of poor quality due to the existence of a broad range of various assay types and setups used for determining the bioactivities [1]. Thus, a broad annotation of bioassay data will be needed, especially considering timely multi-targeted approaches in drug design. Going further, we are now studying the neurotransmitter sodium symporter family of proteins, trying to systematically structure available assays (e.g.: 311 different assay-ID's in ChEMBL database with approx. 6000 reported IC50 or Ki values targeting human serotonin transporter) and find out how different assays can be combined with each other. The goal is to build up high-quality predictive datasets further useful in cheminformatics. The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under Grant Agreement n° 115191 (Open PHACTS), resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies' in kind contribution. [1] Zdrazil B, Pinto M, Vasanthanathan P, Williams AJ, Zander Balderud L, Engkvist O, Chichester C, Hersey A, Overington JP, and Ecker GF, Annotating human P-glycoprotein bioassay data, Mol. Inf. 2012, 31(8), 599-609.

80 PubChem BioAssay: A public database for chemical biology data

Yanli Wang, ywang@ncbi.nlm.nih.gov, National Library of Medicine (NLM), National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, Maryland 20894, United States

The PubChem BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for archiving biological test results for small molecules and RNAi reagents. The Bioactivity data in PubChem are generated by HTS screenings, medical chemistry studies, chemical biology experiments as well as by literature extraction projects. The database currently contains 600,000 bioassay depositions, 2.7 million substances, eight thousand protein targets, thirty thousand gene targets, and 190 million bioactivity outcomes. Managing the rich and extremely diverse information at this scale, tracking data deposition and update, providing easy access and data analysis tools to the community all present great challenges to the PubChem project. This talk provides an overview of the development of the BioAssay resource and describes bioassay data models as well as the information system for storing, retrieving and analyzing the bioactivity data.

84 Open PHACTS: Meaningful linking of preclinical drug discovery knowledge

Egon L Willighagen1, egon.willighagen@maastrichtuniversity.nl, Christian Brenninkmeijer2, Chris T Evelo1, Lee Harland3, Alasdair J.G. Gray2, Carole Goble2, Andra Waagmeester1, Antony J Williams4. (1) Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, The Netherlands, (2) University of Manchester, Manchester, United Kingdom, (3) Connected Discovery Ltd, London, United Kingdom, (4) ChemSpider, Royal Society of Chemistry, Wake Forest, United States

Recently, semantic web technologies have been adopted by the life sciences community for this purpose. However, while these new technologies provide us with methods, they do not provide us with an exact solution. Open PHACTS uses these methods to solve problems in linking preclinical knowledge from databases like Uniprot, ChEMBL, and WikiPathways. Problems that are discussed and for which our solutions will be presented include: 1. approaches to map data between the databases using the Vocabulary of Interlinked Dataset, including identifier mapping with BridgeDB, appropriate choices of mapping predicates, and ontologies to cover provenance, such as the Provenance Authoring and Versioning ontology; 2. deal with different units for experimental data using the Quantities, Units, Dimensions and Data (QUDT) ontology for (on the fly) quantity conversion; and 3. how all this is linked to user-oriented graphical user interfaces.

89 Making hidden data discoverable: How to build effective drug discovery engines?

Sebastian Radestock, s.radestock@elsevier.com, Jürgen Swienty-Busch. Elsevier Information Systems GmbH, Frankfurt am Main, Hessen 60486, Germany

In a complex IT environment comprising dozens if not hundreds of databases and likely as many user interfaces it becomes difficult if not impossible to find all the relevant information needed to make informed decisions. Historical data get lost, not normalized data cannot be compared and maintenance becomes a nightmare. We will discuss a new approach to address this issue by showing various examples and use cases on how in-house data and public data can be integrated in various ways to address the unique and individual needs of companies to keep the competitive edge.

92 Chemical science that underpins the Reaxys database

Juergen Swienty-Busch1, j.swienty-busch@elsevier.com, Pieder Caduff2, David Evans2. (1) Elsevier Information Systems GmbH, Frankfurt, Germany, (2) Reed Elsevier Properties SA, Neuchatel, Switzerland

The chemical literature is increasing year on year. New journals are launched, and existing journals broaden and deepen their coverage. Researchers are increasingly pressurized to maintain an overview of the literature while also finding those data most relevant to them. Providing relevant and accurate information is of fundamental importance. Reaxys strives to provide chemistry researchers with timely, accurate, organized and relevant information. We will discuss the recent advances we have made in order to support the daily workflow of a research chemist. These include automated systems for the identification of chemically relevant articles for excerption, taxonomies to support the organization of data, innovative quality assurance tools, and new technologies for the classification of substances and reactions.

96 Intuitive and integrated browsing of reactions, structures, and citations: The Roche experience

Fausto Agnetti1, Michael Bensch1, Hermann Biller1, Martin Blapp1, Ben Cheikh2, Gerd Blanke1, Joerg Degen1, Bernard Dienon1, Thomas Doerner1, Gunther Doernen1, Frieda Farshchian1, Werner Gotzeina1, Peter Hilty1, Ralf Horstmoeller1, Thomas Jeker1, Brian Jones1, Michael Kappler2, mick.kappler@roche.com, Aslam Momin2, Antonio Regoli1, Denis Ribaud1, Bernard Starck1, Daniel Stoffler1, Klaus Weymann1, Padmanabha Udupa2. (1) Pharma Research and Early Development, F. Hoffmann-La Roche Ltd., Basel, Basel-Stadt 4070, Switzerland, (2) Pharma Research and Early Development, Hoffmann-La Roche Inc., Nutley, New Jersey 07110, United States

Roche has integrated propriety reaction information within the Elsevier Reaxys product, which will run on Roche's infrastructure and inside the Roche firewall to provide high performance and security. The incorporation and discoverability of proprietary information along with public information significantly improves productivity. With this development, Roche researchers are able to launch a single search in Reaxys across integrated internal data and experimental data published in journals and patents, with results unified and organized in a context directly relevant to the researcher workflow. Key points of ELN integration, data modeling, and reaction canonicalization will be discussed.

101 Withdrawn
109 Novel in silico prediction algorithms for the design of stable and more effective proteins

Francisco G Hernandez-Guzman, francisco.hernandez@accelrys.com, Velin Spassov, Lisa Yan. Department of LS Modeling and Simulations, Accelrys, San Diego, CA 92121, United States

Understanding the effects of mutation on protein stability and protein binding affinity is an important component of successful protein design. In silico approaches to predict the effects of amino acid mutations can be used to guide experimental design and help reduce the cost of bringing biotherapeutics or new protein molecules (e.g. enzymes) to market. We have developed a number of novel methods for fast computational mutagenesis of proteins which can be applied to calculate the energy effect of mutation on protein stability, and on protein-protein binding affinity with an optional pH dependency calculation. Here, we will present those methods and associated validation results. Furthermore, we will provide a case study using a set of engineered antibodies that have altered pH-selective binding. These demonstrate how binding to either neonatal receptor (FcRn) or to their target antigens can be modified to tune their half-life in the host system.

110 Advanced structural modeling of biologics with BioLuminate

David A Pearlman, Tyler Day, Kathryn Loving, David Rinaldo, Noeris Salam, Dora Warshaviak, Kai Zhu, Woody Sherman, woody.sherman@schrodinger.com. Schrodinger, New York, NY 10036, United States

The field of biologics continues to grow in importance in the pharmaceutical industry. To address the increasing need for computational tools to model biologics we have developed BioLuminate, which contains a broad range of task-driven applications tailored specifically to the field of biologics. Our objective was to blend an easy-to-use interface with state of the art molecular simulations and de novo prediction tools. In this presentation, we describe the philosophy behind the design of BioLuminate and then focus on distinguishing features of the product, such as protein-protein docking with Piper, de novo antibody loop modeling with Prime, estimation of residue mutation effects, prediction of stabilizing mutations, determination of aggregation hotspots, and other distinguishing features of the product. We conclude by describing the primary challenges in the field and our research efforts to address them.

118 Mining frequent itemsets: Constructing topological pharmacophores using pharmacophore feature pairs

Paul J Kowalczyk, paul.kowalczyk@scynexis.com, Department of Computational Chemistry, SCYNEXIS, Research Triangle Park, NC 27709-2878, United States

We have adopted association rule mining to the task of topological (2D) pharmacophore construction. Association rule mining is a popular and well researched statistical approach for discovering interesting relationships between variables in large datasets. This approach finds joint values of variables that appear most frequently in a dataset. In this study, these variables are topological pharmacophore feature pairs (e.g., hydrogen bond donors, hydrogen bond acceptors, hydrophobes, aromatic rings, positive centers, negative centers) and the corresponding bond distances between them. Measures of significance and interest are used to score these joint pharmacophore feature pairs, with high scores identifying candidate topological pharmacophores. We demonstrate the construction of topological pharmacophores using publicly available antimalarial datasets. We also show how these topological pharmacophores may be leveraged as data mining and data visualization tools. The construction of topological pharmacophores by means of association rule mining and protocols for data visualizations are made freely available as scripts written in the Python and R programming languages.

119 Lexichem: Not another chemical nomenclature app

Edward O Cannon, ed.cannon@eyesopen.com, OpenEye Scientific Software, Santa Fe, NM 87508, United States

A novel, fast, easy to use desktop application has been developed for Lexichem[1], OpenEye's chemical nomenclature software[2]. The desktop application offers the ability to extract chemical names and structures from patents, to easily visualize chemical structures by dragging and dropping files plus numerous other features.[EdwardCannon_ACSNewOrleansImage1.png] [1] E. O. Cannon, “ New Benchmark for Chemical Nomenclature Software , J. Chem. Inf. Model., 2012, 52 (5),pp 1124-1131 [2] Headquarters, OpenEye Scientific Software, 9 Bisbee Court, Suite D, Santa Fe, NM 87508

Tuesday, April 9, 2013 8:15 am - 11:55 am

Linking Bioinformatic Data and Cheminformatic Data - AM Session
Morial Convention Center
Room: 349
Ian Bruno, John Overington, Organizers
Ian Bruno, Presiding
8:15   Introductory Remarks
8:20 69 Integrating chemical and biological structural information

Gary Battle, battle@ebi.ac.uk, Jose Dana, Saqib Mir, Tom Oldfield, Sameer Velankar, Gerard Kleywegt. The European Bioinformatics Institute, The Protein Data Bank in Europe, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom

The Protein Data Bank (PDB) is the single worldwide repository of 3D structures of biological macromolecules and includes over 14,000 distinct ligands bound to proteins or nucleic acids. These structures are central to our understanding of biochemical processes and define the link between chemistry and biological macromolecules. The Protein Data Bank in Europe (PDBe; pdbe.org) is a core resource at the European Bioinformatics Institute (EBI) and a founding member of the Worldwide Protein Data Bank (wwPDB; wwpdb.org). This talk will review ongoing efforts at the PDBe to provide consistent mappings between macromolecular structure data and important biological and chemical data resources at the EBI. We will also discuss freely available web tools for mining and analysing the wealth of structural information available in the PDB using familiar biological or chemical terminology and classifications.

8:45 70 ChEMBL tools and services: Creating bridges between cheminformatics and bioinformatics

Mark Davies, mdavies@ebi.ac.uk, Louisa J. Bellis, A. Patricia Bento, Jon Chambers, Anna Gaulton, Anne Hersey, Yvonne Light, George Papadatos, John P. Overington. ChEMBL Group, EMBL-European Bioinformatics Institute, Cambridge, United Kingdom

ChEMBL (http://www.ebi.ac.uk/chembl) is a database of bioactive drug-like small molecules, which has seen rapid growth in content since its first release three years ago. The focus of the talk will be to provide an overview of new freely available tools and services developed by the ChEMBL group, which can be used to link chemical data to biological resources. An example of such a service is UniChem, which is an independent InChi-based cross-referencing service, used to create links to external resources (e.g. PDBe). We have also developed some new domain-focused portals for integration of ChEMBL data with comparative genomics data (e.g. Ensembl) and consequent differences in ADME properties. Further advances in creating links from the ChEMBL database have been made with the first official release of the ChEMBL-RDF data model. This transformation has made it possible to link to and query data stored in other RDF models (e.g. Gene Expression Atlas).

9:10 71 Pharmacological profiling of drugs by linking chemoinformatics and bioinformatics data

Olivier Taboureau, otab@cbs.dtu.dk, Department of Systems Biology - Center for Biological Sequences Analysis, Technical University of Denmark, Lyngby, Denmark

The pharmacological profiling of drugs is crucial in drug discovery. With the increasing availability of data from the”-omics” technologies and the development of computational approaches to analyze this massive amount of data, it is now possible, in academia, to evaluate the drug safety and the drug pharmacology not only at the molecular level but also at the biological systems level. Integration of chemical biology data and monitoring the perturbations at the pathway, cellular, tissue and systems level would improve the global understanding of the compound effects in human health. Furthermore, clinical effects might be critical for the identification of genes that are important modulators of drug response, namely pharmacogenetics. With the integration of several and diverse biological data, we will discuss how the linking of chemoinformatics and bioinformatics can contribute to the translational informatics research by providing a deeper understanding of the drugs effects in drug discovery.

9:35 72 About the impact of open access bioassay data on cheminformatic approaches

Barbara Zdrazil, barbara.zdrazil@univie.ac.at, Gerhard F. Ecker. Department of Medicinal Chemistry, University of Vienna, Vienna, Austria A-1090, Austria

As a consequence of open innovation initiatives, modern drug discovery now includes the use of open access databases for the retrieval of small compound bioactivity data. However, we recently showed for human P-glycoprotein, that an uncritical interpretation of such data will lead to datasets of poor quality due to the existence of a broad range of various assay types and setups used for determining the bioactivities [1]. Thus, a broad annotation of bioassay data will be needed, especially considering timely multi-targeted approaches in drug design. Going further, we are now studying the neurotransmitter sodium symporter family of proteins, trying to systematically structure available assays (e.g.: 311 different assay-ID's in ChEMBL database with approx. 6000 reported IC50 or Ki values targeting human serotonin transporter) and find out how different assays can be combined with each other. The goal is to build up high-quality predictive datasets further useful in cheminformatics. The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under Grant Agreement n° 115191 (Open PHACTS), resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies' in kind contribution. [1] Zdrazil B, Pinto M, Vasanthanathan P, Williams AJ, Zander Balderud L, Engkvist O, Chichester C, Hersey A, Overington JP, and Ecker GF, Annotating human P-glycoprotein bioassay data, Mol. Inf. 2012, 31(8), 599-609.

10:00   Intermission
10:15 73 Biological target identification through combination of 3D molecular similarity and lexical similarity of clinical effects

Emmanuel R Yera, ajain@jainlab.org, Ann E Cleves, Ajay N Jain, ajain@jainlab.org. Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA 94158, United States

We have previously demonstrated a probabilistic framework for combining information regarding protein-ligand interactions in order to identify off-targets for drugs through a combination of 3D molecular similarity, 2D molecular similarity, and docking computations. We have extended our framework to include a novel lexical method for computing the similarity between small molecules based on data derived from patient package inserts (PPI). Small molecules that are pharmacologically described in a similar fashion often share underlying protein targets (e.g. antagonism of the muscarinic receptor can cause dry mouth and urinary retention). By combining information from molecular similarity and from lexical similarity of a particular drug to a set of drugs sharing a known biological target, it is possible to gain synergy from the combination of orthogonal information sources in order to propose new putative targets for the drug. The results of a systematic application to a large set of drugs will be presented along with a critical analysis examining what can be learned about drug pharmacology based on different molecular similarity methods and natural language descriptions of pharmacology.

10:40 74 In silico prediction of gene expression profiles for drug-like compounds based on their structural formulae

Alexey Lagunin, alexey.lagunin@ibmc.msk.ru, Sergey Ivanov, Anastassia Rudik, Dmitry Filimonov, Vladimir Poroikov. Bioinformatics, Orekhovich Institute of Biomedical Chemistry of Russian Academy of Medical Sciences, Moscow, Russian Federation

Drug-induced gene expression profile is a major determinant of drug action on cell. Experimentally determined profiles are used to solve different problems in drug development and clinical practice such as drug repositioning and resistance, toxicity and drug-drug interactions (DDI). The information about drugs' influence on gene expression is freely available from the Comparative Toxicogenomics Database (http://ctdbase.org/). We used this data for training and validation of computer system to perform qualitative prediction of gene expression profiles of drugs based on their structural formulae. SAR models were created using PASS software that we develop and update for about 20 years (http://www.pharmaexpert.ru/PASSOnline). A freely available web-service for prediction of drug-induced gene expression profiles has been developed (http://www.pharmaexpert.ru/GE). Predicted gene expression profiles can be used for analysis of drug resistance, drug synergistic effects and DDI. The work was partially supported by RFBR/NIH grant No 12-04-91445-NIH_A/RUB1-31081-MO-12, and RFBR grant No 12-07-00597-а.

11:05 75 CAS’ bioactivity and target indicators provide new insights for scientists working at the interface of chemistry and biology

Roger Schenck, rschenck@cas.org, Department of Marketing, Chemical Abstracts Service, Columbus, Ohio 43202, United States

CAS has mined its intellectually-assigned controlled vocabulary terms to create bioactivity indicators (e.g., antibiotic, antidepressant) and protein target indicators (e.g., alpha-amylase, prostate-specific antigen) that link molecular substances with biological effects and protein targets. Scientists working at the interface of chemistry and biology can search the CAS databases for drug leads to quickly discover other therapeutic indications and associated protein targets. With more than 260 bioactivity indicators and 5800 target indicators assigned to millions of substances, medicinal chemists can efficiently assess the biological relevance of a large group of molecules. This presentation will illustrate how these relationships are developed, how false positives are avoided, and end with some examples of how these new terms are used in SciFinder®.

11:30 76 Jikitou biomedical question answering system: Using multiple resources to answer biomedical questions

Michael A. Bauer1,2, mabauer@ualr.edu, Robert E. Belford3, Daniel Berleant1, Roger A. Hall1. (1) Department of Information Science, University of Arkansas at Little Rock, Little Rock, AR 72204, United States, (2) Joint Bioinformatics Program, University of Arkansas for Medical Sciences, Little Rock, AR 72204, United States, (3) Department of Chemistry, University of Arkansas at Little Rock, Little Rock, AR 72204, United States

Intelligent information retrieval systems that summarize relevant textual information while incorporating multiple sources of information can assist researchers in dealing with information and data challenges at the interface of biological and chemical sciences. Question answering (QA) is a specialized type of information retrieval with the aim of returning short answers to queries posed as natural language questions. We have developed a QA system, Jikitou (www.jikitou.com), which answers natural language questions with sentences taken from Medline abstracts that are parsed using a multiple agent search strategy. The answers that are returned are sent to the WikiHyperGlossary (hyperglossary.org) where terms associated with a glossary are linked to additional sources of information such as the UniprotKB protein information database, ChemSpider or the RCSB Protein Data Bank. Jikitou combines multiple natural language processing techniques, data resources and technologies to create a unique system to help researchers navigate the huge and growing biomedical textome.

Tuesday, April 9, 2013 8:30 am - 11:55 am

Public Databases Serving the Chemistry Community - AM Session
Morial Convention Center
Room: 350
Antony Williams, Sean Ekins, Organizers
Sean Ekins, Presiding
8:30   Introductory Remarks
8:35 77 PubChem: A community driven resource

Evan Bolton, bolton@ncbi.nlm.nih.gov, PubChem, NCBI / NLM / NIH, United States

PubChem is an open repository for chemical biology information and recently celebrated its 8th year of existence. Despite humble beginnings, PubChem continues to receive broad community support through a continued influx of new information and new information resource types. As the needs of the community have changed, so too has PubChem adapted. This talk will provide an overview of recent significant changes to the PubChem system and detail new ways PubChem is providing for and adapting to the needs of the community.

9:05 78 NCI/CADD chemical structure Web services

Markus Sitzmann, sitzmann@helix.nih.gov, Alexey V. Zakharov, Laura Guasch Pàmies, Marc C. Nicklaus. Chemical Biology Laboratory, Center for Cancer Research, Frederick National Laboratory for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, Frederick, MD 21702, United States

Over the course of the last 15 years, the NCI/CADD group has made publicly available a series of small molecule-centric Web services and databases at their Web server, http://cactus.nci.nih.gov, e.g., the Chemical Structure Lookup Service (CSLS), the Chemical Identifier Resolver (CIR), and the NCI Enhanced Database Browser. We present an overview over recent work on a tighter integration of these resources, on improvements of the programmatic accessibility as well as on a better usability with mobile and touchscreen devices. Furthermore, we will discuss recent enhancement of our Chemical Structure DataBase (CSDB) which is used as central data repository for all of our services. The most recent versions of CSDB indexes approx. 300 million chemical structure records representing about approx. 120 million unique chemical structures. We will also present a new Web service allowing for the prediction of physicochemical and biological properties of small molecules.

9:35 79 ChemSpider: Disseminating data and enabling an abundance of chemistry platforms

Antony J Williams1, williamsa@rsc.org, Valery Tkachenko1, Ken Karapetyan1, Alexey Pshenichnov1, Dmitry Ivanov1, Colin Batchelor2, Jon Steele2, David Sharpe2. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) Cheminformatics, Royal Society of Chemistry, Cambridge, United Kingdom

ChemSpider is one of the chemistry community's primary public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data to many tens of websites and software applications at this point. This presentation will provide an overview of the expanding reach of the ChemSpider platform and the nature of solutions that it helps to enable. We will also discuss some of the future directions for the project that are envisaged and how we intend to continue expanding the impact for the platform.

10:05   Intermission
10:20 80 PubChem BioAssay: A public database for chemical biology data

Yanli Wang, ywang@ncbi.nlm.nih.gov, National Library of Medicine (NLM), National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, Maryland 20894, United States

The PubChem BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for archiving biological test results for small molecules and RNAi reagents. The Bioactivity data in PubChem are generated by HTS screenings, medical chemistry studies, chemical biology experiments as well as by literature extraction projects. The database currently contains 600,000 bioassay depositions, 2.7 million substances, eight thousand protein targets, thirty thousand gene targets, and 190 million bioactivity outcomes. Managing the rich and extremely diverse information at this scale, tracking data deposition and update, providing easy access and data analysis tools to the community all present great challenges to the PubChem project. This talk provides an overview of the development of the BioAssay resource and describes bioassay data models as well as the information system for storing, retrieving and analyzing the bioactivity data.

10:50 81 Chemistry-related resources at the Protein Data Bank in Europe

Gary Battle, battle@ebi.ac.uk, Gerard Kleywegt, Sameer Velankar, Tom Oldfield, Swanand Gore, Saqib Mir, Jose Dana. The European Bioinformatics Institute, Protein Data Bank in Europe, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom

The 3-dimensional structures of protein-ligand complexes determined using X-ray diffraction provide a window into the world of protein structure and function. The Protein Data Bank is the single worldwide repository of 3D structures of biological macromolecules and includes over 14,000 distinct ligands bound to proteins or nucleic acids. Information on the geometry of these small molecules and their interactions with proteins are crucial to our understanding of biochemical processes and are vital for structure-based drug design. The Protein Data Bank in Europe (PDBe; pdbe.org) is a core resource at the EBI and a founding member of the Worldwide Protein Data Bank (wwPDB; wwpdb.org). This talk will review the freely available chemistry-related resources provided by PDBe. We will also discuss recent initiatives to assess and improve the quality of ligands in the PDB archive and continuing efforts to help chemists understand how to retrieve and interpret 3D structural information.

11:20 82 Architecture for an open science molecular compound database

Egon L Willighagen, egon.willighagen@maastrichtuniversity.nl, Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, The Netherlands

The past few years has seen a tremendous leap forward in public compound databases. However, the exact Open nature of “public” database is not always crystal clear, on, for example, how the Open Data parts of public databases can be used, modified, and redistributed, the three corner stones of Open Science. We present an architecture where semantic web technologies, the InChI, and cheminformatics tools are used to create a Panton Principles-compliant compound database. Standards proposed in the Open PHACTS community will be use to specify links between this new resource and other databases, and to provide compound properties. All this input will be available with provenance on the origin of that data, as separate downloadable files, and using ontologies to provide explicit meaning. Using ontologies like ChEBI and CHEMINF, applications in the areas of metabolomics and toxicology will be presented.

11:50   Concluding Remarks

Tuesday, April 9, 2013 1:55 pm - 5:30 pm

Linking Bioinformatic Data and Cheminformatic Data - PM Session
Morial Convention Center
Room: 349
Ian Bruno, John Overington, Organizers
John Overington, Presiding
1:55 83 Linking chemical biology information within PubChem

Evan Bolton, bolton@ncbi.nlm.nih.gov, PubChem, NCBI / NLM / NIH, United States

PubChem is an open repository for chemical biology information. PubChem contains a wealth of information including more than 100 million substance descriptions, 35 million unique small molecules, and 200 million biological testing result outcomes from more than 200 contributors. PubChem chemical structures link to 40% of the known biomedical literature and more than 10% of all biologically tested molecules have links to the patent literature. Given the combinatoric count of “links” available in PubChem enabling access to such information by utilization of ontologies (e.g., ChEBI, GO, BAO, etc.) and classification schemes (MeSH, LipidMaps, KEGG BRITE, etc.) is providing new ways to effectively navigate such information. This talk will detail some of the new ways PubChem is organizing and providing links to the chemical biology community.

2:20 84 Open PHACTS: Meaningful linking of preclinical drug discovery knowledge

Egon L Willighagen1, egon.willighagen@maastrichtuniversity.nl, Christian Brenninkmeijer2, Chris T Evelo1, Lee Harland3, Alasdair J.G. Gray2, Carole Goble2, Andra Waagmeester1, Antony J Williams4. (1) Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, The Netherlands, (2) University of Manchester, Manchester, United Kingdom, (3) Connected Discovery Ltd, London, United Kingdom, (4) ChemSpider, Royal Society of Chemistry, Wake Forest, United States

Recently, semantic web technologies have been adopted by the life sciences community for this purpose. However, while these new technologies provide us with methods, they do not provide us with an exact solution. Open PHACTS uses these methods to solve problems in linking preclinical knowledge from databases like Uniprot, ChEMBL, and WikiPathways. Problems that are discussed and for which our solutions will be presented include: 1. approaches to map data between the databases using the Vocabulary of Interlinked Dataset, including identifier mapping with BridgeDB, appropriate choices of mapping predicates, and ontologies to cover provenance, such as the Provenance Authoring and Versioning ontology; 2. deal with different units for experimental data using the Quantities, Units, Dimensions and Data (QUDT) ontology for (on the fly) quantity conversion; and 3. how all this is linked to user-oriented graphical user interfaces.

2:45 85 Extracting more value from data silos: Using the semantic web to link chemistry and biology for innovation

Derek Scuffell1, derek.scuffell@syngenta.com, Philip Ashworth2. (1) R&D, Syngenta, Bracknell, Berks RG42 6ET, United Kingdom, (2) Top Quadrant, London, United Kingdom

In order to maximize the chances of finding novel crop protection molecules, that are safe for the environment, it is necessary to bring together biological and chemical information from both inside and outside of an organisation. The integarted use of biological data can help eliminate false positive molecular candidates and improve the chances of finding the correct candidates for development. Information about the Biological activity of compounds is captured in disparate systems within Syngenta and in the public domain. This presenation will show how highly curated bioactivity data from ChEMBL was linked to the Syngenta corporate chemical catalogue, along with other Syngenta research data and commercial patents indexes, using the Resource Description Framework (RDF). The resulting linked data was then used to support mode of action, spectrum and selectivity competency questions used in herbicide discovery. This is great example of using a semantic web approach to link biological activity data with cheminfomatics data to ease research. Data Domains covered: bioactivity, small molecule, chemical properties and chemotype.

3:10 86 Roundtripping between small-molecule and biopolymer representations

Noel M O'Boyle1, noel@nextmovesoftware.com, Evan Bolton2, Roger A Sayle1. (1) NextMove Software, Cambridge, United Kingdom, (2) National Center for Biotechnology Information, Bethesda, Maryland MD 20894, United States

Existing cheminformatics toolkits provide a mature set of tools to handle small-molecule data, from generating depictions, to creating and reading linear representations (such as SMILES and InChI). However, such tools do not translate well to the domain of biopolymers where the key information is the identity of the repeating unit and the nature of the connections between them. For example, a typical all-atom 2D depiction of all but the smallest protein or oligosaccharide obscures this key structural information.
We describe a suite of tools which allow seamless interconversion between appropriate structure representations for small molecules and biopolymers (with a focus on polypeptides and oligosaccharides). For example: SMILES: OC[C@H]1O[C@@H](O[C@@H]2[C@@H](CO)OC([C@@H]([C@H]2O)NC(=O)C)O)[C@@H]([C@H]([C@H]1O)O[C@@]1(C[C@H](O)[C@H]([C@@H](O1)[C@@H]([C@@H](CO)O)O)NC(=O)C)C(=O)O)O Shortened IUPAC format: NeuAc(a2-3)Gal(b1-4)GlcNAc
I will discuss the challenge of supporting a variety of biopolymer representations, handling chemically-modified structures, and handling biopolymers with unknown attachment points (e.g. from mass spectrometry).

3:35   Intermission
3:50 87 Representing and registeristing antibody-drug conjugates

Keith T Taylor, keith.taylor@accelrys.com, Burton L Leland, William L Chen, Young-Mi Kwon. Accelrys Inc, San Ramon, California 98543, United States

Biologics are providing a large and growing contribution to drug pipelines. The majority of biologic based therapies include significant chemical modifications. Antibody-drug conjugates (ADC) are the major focus. ADCs have the most challenging representation needs, with modified residues, custom linkers to the drug payload, and variable levels of glycosylation. In particular the payload is attached statistically to the available sites in the antibody. Variable attachment to cysteines brings an added complication; the payload is attached to reduced cysteines while the unloaded cysteines retain disulfide bridges. These issues will be discussed and a representation that allows the capture of ADCs with variable loading will be described. This representation enables the registration and retrieval of ADCs, supports substructure searches, and captures the sequence and chemical modifications, and the correct formula and formula weight. Activity profiles for a series of ADCs can be compared facilitating optimization.

4:15 88 Mining chemical and biological data for trends: Visualizing structured numeric data from ELNs

Philip J Skinner, philip.skinner@perkinelmer.com, Phil McHale, Amy Kallmerton, Megean Schoenberg, Anis Khimani, Kate Blanchard, Michael Swartz. Informatics, PerkinElmer, Cambridge, MA 02140, United States

Scientists have always kept notebooks - they are the natural place to record information about experiments. However, structured data such as biological assay results captured on drug development compounds have traditionally been kept in other databases outside of notebooks. Such results normally follow an assay centric hierarchy which conflicts with the experiment hierarchy within notebooks. To provide true property based, cross-experiment search, results must be transcribed or mapped through the creation of SQL queries. This limits the flexibility- adding new assays require programing, ad-hoc querying is impaired. This presentation will describe a new methodology and architecture that allows the benefit of a structured assay based hierarchy within a familiar experiment based hierarchy. Merging this assay information with chemical property data, and the ease of exposing this data to visualization and data mining tools without the use of SQL, will be explored.

4:40 89 Making hidden data discoverable: How to build effective drug discovery engines?

Sebastian Radestock, s.radestock@elsevier.com, Jürgen Swienty-Busch. Elsevier Information Systems GmbH, Frankfurt am Main, Hessen 60486, Germany

In a complex IT environment comprising dozens if not hundreds of databases and likely as many user interfaces it becomes difficult if not impossible to find all the relevant information needed to make informed decisions. Historical data get lost, not normalized data cannot be compared and maintenance becomes a nightmare. We will discuss a new approach to address this issue by showing various examples and use cases on how in-house data and public data can be integrated in various ways to address the unique and individual needs of companies to keep the competitive edge.

5:05 90 Applying the law of parsimony in molecular design

Alberto Del Rio, alberto.delrio@gmail.com, Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna, Bologna, Bologna 40126, Italy

Molecular design has not only the ambition to explain and answer fundamental biological questions but also to find new chemical tools that can be effectively used to interfere with physiological and pathological networks. Several techniques belonging to different disciplines, like a broad range of computer-aided techniques, chemical data mining, bioinformatics and methods from statistics to artificial intelligence, are currently used to guide and speed-up the early-stage development of new bioactive components. This massive amount of techniques, aiming at linking chemical and biological data, has multiplied research efforts, but raised the question whether new applications may result in over-complicated and compartmentalized research paradigms. On the other hand, some examples highlight the usage of parsimonious models but their application in all the steps of molecular-design is far from being trivial. Here we present examples and suggest practical possibilities to emphasize the adoption of the lex parsimonae principle for linking biological and chemical data in molecular design.

Tuesday, April 9, 2013 2:00 pm - 5:25 pm

Public Databases Serving the Chemistry Community - PM Session
Morial Convention Center
Room: 350
Antony Williams, Sean Ekins, Organizers
Antony Williams, Presiding
2:00   Introductory Remarks
2:05 91 Local and remote tracking of molecular dynamics data for global dissemination

Julien C Thibault1, julien.thibault@utah.edu, Thomas E Cheatham2,3, Julio C Facelli1,3. (1) Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah 84112, United States, (2) Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah 84112, United States, (3) Center for High-Performance Computing, University of Utah, Salt Lake City, Utah 84112, United States

With the recent advances in hardware, it is now possible to run complex Molecular Dynamics simulations and to reach time scales that are biologically significant. Each run can easily generate TeraBytes of data on disk, usually distributed among multiple remote resources, requiring new methods for data storage, management, and tracking. Our current efforts include the development of new tools to index and present biomolecular simulation data at different levels of granularity: the local directory where data is stored, at the storage resource level, and eventually at the global level, involving multiple resources distributed across institutions. At the directory level, file parsers for popular MD and QM packages (e.g. AMBER, Gaussian) can be used to generate experiment summaries and file descriptions as text, XML, or HTML. These descriptors are stored at the root of the directory containing the data for a particular experiment. They provide a quick summary of the experiment that was run and the files that are present in the folder. For the resource level indexing we developed a simple web interface (iBIOMES Lite) automatically populated by existing descriptors generated at the lower level. Actual data files are not made readable, except for analysis summaries, such as plots or 3D structure snapshots. The aim of this tool is to provide easy access to experiment summaries and latest data analysis results, not only to the owner of the data but also to other group members. Finally, at the global level, experiments and related files can be registered into a large-scale distributed system: iBIOMES (integrated BIOMolEcular Simulations). Registered data can be queried among multiple resources using experiment metadata (e.g. method parameters, force-field, residue chain). Both simulation input and output files can be made available for download, either for data dissemination within a collaborative network or for public access.

2:35 92 Chemical science that underpins the Reaxys database

Juergen Swienty-Busch1, j.swienty-busch@elsevier.com, Pieder Caduff2, David Evans2. (1) Elsevier Information Systems GmbH, Frankfurt, Germany, (2) Reed Elsevier Properties SA, Neuchatel, Switzerland

The chemical literature is increasing year on year. New journals are launched, and existing journals broaden and deepen their coverage. Researchers are increasingly pressurized to maintain an overview of the literature while also finding those data most relevant to them. Providing relevant and accurate information is of fundamental importance. Reaxys strives to provide chemistry researchers with timely, accurate, organized and relevant information. We will discuss the recent advances we have made in order to support the daily workflow of a research chemist. These include automated systems for the identification of chemically relevant articles for excerption, taxonomies to support the organization of data, innovative quality assurance tools, and new technologies for the classification of substances and reactions.

3:05 93 ChemReact: A free database containing more than 524,000 reactions available at your fingertips

Valentina Eigner-Pitto, ve@infochem.de, Hans Kraut, Heinz Saller, Heinz Matuszczyk, Josef Eiblmaier, Peter Loew. InfoChem GmbH, Munich, Germany

With the acquisition of the SPRESI database in 1989, which at that time contained 1.8 million reactions, InfoChem was forced to conceive concepts for the selection of meaningful subsets of reaction databases. Based on a high quality reaction center detection module (ICMAP), InfoChem developed a sophisticated reaction type classification application (CLASSIFY) that is still unique to this day. Using CLASSIFY and applying tailored filters on reaction attributes like yield, relevance of journal, and number of examples per ClassCode, InfoChem generated ChemReact, a subset of the SPRESI data collection. This database contains over 524,000 unique reactions, each of them representing one distinct reaction type. With the development of the free iPad and iPhone app, InfoChem decided to make ChemReact publicly available free of charge. This talk will briefly outline the history of the database and present the free app SPRESImobile that enables easy and intuitive access to this valuable data collection.

3:35   Intermission
3:50 94 Navigating between patents, papers, abstracts, and databases using public sources and tools

Christopher Southan1, Sean Ekins2, ekinssean@yahoo.com. (1) Department of Informatics, ChrisDS Consulting, Göteborg, Vastra Götland 41266, Sweden, (2) Collaborations in Chemistry, Fuquay-Varina, NC 27526, United States

Engaging with chemistry in the biosciences requires navigation between journals, patents, abstracts, databases, Google results and connecting across millions of structures specified only in text. The ability to do this in public sources has been revolutionised by several trends a) ChEMBL's capture of SAR from journals c) the deposition of three major automated patent extractions (SureChem, IBM and SCRIPDB) in PubChem for over 15 million structures, d) open tools such as chemicalize.org, OPSIN, and OSCAR that enable the conversion of IUPAC names or images to structures e) the indexing of chemical terms (e.g. InChIKeys) that turn Google searches into a merged global repository of 40 to 50 million structures. Details of these trends, including PubChem intersect statistics, will be presented, along with practical examples from selected tools. New structure sharing trends will also be considered such as patent crowdsourcing, dropbox, blogs, figshare and open lab notebooks.

4:20 95 ChemSpider reactions: Delivering a free community resource of chemical syntheses

Valery Tkachenko1, tkachenkov@rsc.org, Colin Batchelor2, Ken Karapetyan1, David Sharpe2, Antony J Williams1. (1) Cheminformatics, Royal Society of Chemistry, Wake, NC 27587, United States, (2) Cheminformatics, Royal Society of Chemistry, Cambridge, United Kingdom

There are dozens of public compound databases now available online, some of these providing access to tens of millions of chemical compounds. However, very little effort has been put into the delivery of databases of chemical reactions with the majority of large resources being commercial in nature. In our five years of delivering chemical based data resources to the chemistry community one of the primary requests has been that chemists want to know how to synthesize many of the chemicals they are researching. This presentation will provide an overview of our concerted efforts to enhance access to freely available chemistry data and will discuss the ChemSpider Reactions as an integrating hub of content including data extracted from US patents, from RSC Journals and databases and from our micro-publishing platform ChemSpider Synthetic Pages (CSSP).

4:50 96 Intuitive and integrated browsing of reactions, structures, and citations: The Roche experience

Fausto Agnetti1, Michael Bensch1, Hermann Biller1, Martin Blapp1, Ben Cheikh2, Gerd Blanke1, Joerg Degen1, Bernard Dienon1, Thomas Doerner1, Gunther Doernen1, Frieda Farshchian1, Werner Gotzeina1, Peter Hilty1, Ralf Horstmoeller1, Thomas Jeker1, Brian Jones1, Michael Kappler2, mick.kappler@roche.com, Aslam Momin2, Antonio Regoli1, Denis Ribaud1, Bernard Starck1, Daniel Stoffler1, Klaus Weymann1, Padmanabha Udupa2. (1) Pharma Research and Early Development, F. Hoffmann-La Roche Ltd., Basel, Basel-Stadt 4070, Switzerland, (2) Pharma Research and Early Development, Hoffmann-La Roche Inc., Nutley, New Jersey 07110, United States

Roche has integrated propriety reaction information within the Elsevier Reaxys product, which will run on Roche's infrastructure and inside the Roche firewall to provide high performance and security. The incorporation and discoverability of proprietary information along with public information significantly improves productivity. With this development, Roche researchers are able to launch a single search in Reaxys across integrated internal data and experimental data published in journals and patents, with results unified and organized in a context directly relevant to the researcher workflow. Key points of ELN integration, data modeling, and reaction canonicalization will be discussed.

5:20   Concluding Remarks

Wednesday, April 10, 2013 8:30 am - 11:50 am

Balancing Chemistry on the Head of a Pin: Multi-Parameter Optimization - AM Session
Morial Convention Center
Room: 349
Edmund Champness, Matthew Segall, Organizers
Edmund Champness, Matthew Segall, Presiding
8:30   Introductory Remarks
8:35 97 Exploiting a more polar property space in the design of brain penetrant molecules

Anabella Villalobos, anabella.villalobos@pfizer.com, Travis T Wager, Xinjun J Hou, Patrick R Verhoest. Department of Neuroscience Medicinal Chemistry, Pfizer Inc., 700 Main Street, Cambridge, MA 02139, United States

In our efforts to increase the survival of drug candidates, we undertook a detailed study of the chemical space for Central Nervous System (CNS) molecules. Ultimately, we were interested in optimizing the number of design cycles and in vivo toxicology testing needed to advance candidates from idea to proof of concept clinical studies. We focused on understanding the relationships between physicochemical properties, in vitro absorption, distribution, metabolism, and excretion (ADME) and safety attributes, and binding efficiencies for over 200 marketed CNS drugs and Pfizer CNS candidates. This analysis together with medicinal chemistry knowledge was used to create and validate a prospective design tool which used an overall desirability score for drug-likeness. The novel CNS multi-parameter optimization desirability (CNS MPO Desirability) algorithm, based on six physicochemical parameters, showed that 74% of marketed CNS drugs displayed a high desirability score (>4, using a scale of 0-6). In addition, a relationship between an increasing desirability score and alignment of key in vitro ADME and safety attributes was seen in the marketed CNS drug set, the Pfizer candidate set, and a Pfizer proprietary diversity set. The CNS MPO Desirability score is thus an algorithm in the medicinal chemistry toolbox that may be used prospectively at the design stage to accelerate the identification of compounds with increased probability of success. Furthermore, application of this tool to new clinical drug candidates has challenged the long-held notion that CNS molecules need to be highly lipophilic with low polar surface area, moving the CNS design field in a new direction.

9:05 98 Multi-criteria drug discovery: Recent results in building predictive models, combining predictions, and generating new chemistry ideas

Brian B Masek, brian.masek@certara.com, Fabian Boes, Richard Cramer, Roman Dorfman, Stephan Nagy, Lei Wang, Bernd Wendt. Certara USA, Inc., Saint Louis, Mo 63144, United States

A successful drug candidate will need to overcome a variety of hurdles, including adequate potency and selectivity, as well as acceptable ADME, physical, and safety properties. This presents several challenges to discovery scientists: · Understanding and balancing the competing SAR's for each of the multiple criteria a successful drug candidate must meet · How to create predictive models for ALL of the parameters relevant to successful clinical outcome? · How to identify the scaffolds and R-groups that will optimize or satisfy the potency, selectivity, physical properties, ADME properties and safety profile? Examples will be presented to show how modern CADD methods are addressing these challenges.

9:35 99 Implementation of multi-criteria decision making (MCDM) tools in early drug discovery processes

Marie Ledecq, marie.ledecq@ucb.com, UCB NewMedicines, UCB Pharma, Braine-l'Alleud, Belgium

The current trend in medicinal chemistry is to focus on high quality ligands from the early beginning of the drug design process in order to reduce the drug attrition rate in later stages. Based on this assessment, medicinal chemistry practices are evolving; starting from potency centered drug design strategies towards a much more integrated vision where critical properties have to be optimized in parallel. From this perspective, some specific MCDM tools can be used to discover better balanced lead compounds. These tools include the use of Derringer's desirability functions, and Pareto front based optimizers. In this presentation, it will be shown how these tools can be implemented to be used at several levels of the drug design process: to follow project progression and take enlightened decisions about series, and to help in the data analysis and the design of new compounds.

10:05   Intermission
10:20 100 Being suitably sensitive: Balancing competing performance criteria for in silico models

Robert D Clark, bob@simulations-plus.com, Marvin Waldman, Jinhua Zhang, Adam C. Lee, Michael S. Lawless. Simulations Plus, Inc., Lancaster, CA 93534, United States

Given the large number of descriptors and modeling tools available, identifying the “best” model from among several can be bewildering. When only one performance statistic is relevant, this is relatively straightforward. In many cases, however, it is not clear a priori which criterion should dominate and this problem becomes one of multi-parameter optimization. Here we investigate the effect of manipulating the balance between sensitivity and specificity on the overall performance of artificial neural net ensemble (ANNE) classification models and present a Pareto approach to integrating alternative performance criteria for them.

10:50 101 Withdrawn
11:20 102 Finding multi-parameter rules for successful optimization

Matthew Segall, matt.segall@optibrium.com, Iskander Yusof, Edmund Champness. Optibrium Ltd., Cambridge, ... CB25 9TL, United Kingdom

Multi-parameter optimization (MPO) is increasingly used in drug discovery to prioritize compounds against a profile of properties required for success. But, how do we know what profile to use? The property criteria will depend on the ultimate objective of the project and are typically based on the subjective opinion of the project team. In this presentation we will describe computational approaches, described as rule induction, that guide this process by analysing historical data. These identify objective multi-parameter rules that distinguish successful compounds for a chosen goal, e.g. efficacy, pharmacokinetics or safety. The resulting rules are interpretable and modifiable, allowing experts to understand and adjust them based on their knowledge of the underlying biology and chemistry. Furthermore, the importance of each criterion can be identified, allowing the most critical data to be prioritized in order to make effective compound selection decisions.

Wednesday, April 10, 2013 8:30 am - 11:55 am

Public Databases Serving the Chemistry Community - AM Session
Morial Convention Center
Room: 350
Antony Williams, Sean Ekins, Organizers
Antony Williams, Sean Ekins, Presiding
8:30   Introductory Remarks
8:35 103 Universal SMILES: Finally, a canonical SMILES string?

Noel M O'Boyle, baoilleach@gmail.com, Analytical and Biological Chemistry Research Facility, University College Cork, Cork, Co. Cork, Ireland

The SMILES line notation is widely used for storage and interchange of chemical structures. Although for a single structure many different SMILES strings may be written, most cheminformatics toolkits provide the ability to generate a canonical SMILES representation so that the same structure will always yield the same SMILES string. Unfortunately there is no standard way to generate canonical SMILES and different toolkits generate different canonical SMILES for the same structure.
Another widely used line notation is the InChI identifier, which provides a canonical identifier for chemical structures. I describe how to use the InChI's canonical labels to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). I discuss the performance of these methods on a test set of compounds from PubChem and ChEMBL, the challenges remaining, and the benefits to the community of a standard method of generating canonical SMILES.

9:05 104 Analysis of tautomerism in databases of commercially available compounds

Laura Guasch, lguasch@helix.nih.gov, Markus Sitzmann, Marc C Nicklaus. Chemical Biology Laboratory, Center for Cancer Research, Frederick National Laboratory for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, Frederick, Maryland 21702, United States

We have conducted a tautomerism analysis in a large database of commercially available compounds. The goal of this analysis is two-fold: to investigate how many cases of the same chemical being sold as different products (at possibly different prices) may occur in aggregated screening sample databases; and to test the tautomerism definition of the widely used chemoinformatics toolkit CACTVS. We applied the default CACTVS transforms to the publicly accessible Aldrich Market Select (AMS) database from ChemNavigator/Sigma-Aldrich, which currently comprises over 8 million unique chemicals available from hundreds of suppliers worldwide. We found thousands of cases where at least two products listed as different compounds in the AMS were declared as tautomeric forms of the same compound by CACTVS. We report on our efforts to address the question of the true tautomeric overlap by selecting a number of tautomer pairs (or larger tuples) from the AMS, and analyzing their structural identity or difference by, e.g., NMR.

9:35 105 RSC chemical validation and standardization platform: A potential path to quality-conscious databases

Ken Karapetyan1, karapetyank@rsc.org, Valery Tkachenko1, Colin Batchelor2, David Sharpe2, Antony Williams1. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) Cheminformatics, Royal Society of Chemistry, Cambridge, United Kingdom

High quality chemical databases are struggling with protecting their data from the flow of wild machine-generated chemistry and lower-quality data. The period of primarily human curation prior to deposition in a database is gone and quality-conscious databases need to heavily rely on automated validation checks . An automated chemical validation system is being developed by the cheminformatics team at the Royal Society of Chemistry to be the “quality gatekeeper” of databases at the point of deposition. ChemSpider is leading a community-wide standardization approach starting with our support of the Open PHACTS semantic web project, an Innovative Medicines Initiative. The Chemical Validation and Standardization Platform (CVSP) is being designed as an open, flexible chemical validation and standardization platform that validates and standardizes chemical records. This presentation will review the existing beta version of the system and work in progress.

10:05   Intermission
10:20 106 Challenges and recommendations for obtaining chemical structures of industry-provided repurposing candidates

Christopher Southan1, Anthony J Williams2, Sean Ekins3, ekinssean@yahoo.com. (1) ChrisDS Consulting, Göteborg, Sweden, (2) Royal Society of Chemistry, Wake Forest, NC 27587, United States, (3) Collaborations in Chemistry, Fuquay-Varina, NC 27526, United States

There is an expanding interest in drug repurposing and optimizing in silico methods to assist this. Recent repurposing project tendering calls by the National Center for Advancing Translational Sciences (US) and the Medical Research Council (UK) have included compound information and pharmacological data. However, none of the internal company development code names were assigned to chemical structures in the official documentation. This not only abrogates in silico analysis but also necessitates arduous data gathering to assign structures. We describe here the methods results and challenges associated with this, as well as the in silico predictions for mapped structures. Because ~40% of the code names remain completely blinded we suggest ways by which their structure mappings could be released earlier into the public domain and with more uniform provenance.

10:50 107 One size fits all or how to find the needle in the haystack?

Juergen Swienty-Busch, j.swienty-busch@elsevier.com, Elsevier Information Systems GmbH, Frankfurt, Germany

In an ever growing and dynamic information environment proprietary and public information, free services and paid services live next to each other making it a very difficult task to navigate in a landscape of patchwork information resources to find trusted and reliable information to finally make informed decisions. Increasing pressure is put on scientists to stay up to date with the latest information in a given research domain and they are looking for systems which answer their questions quickly and precisely. We will describe a system, which addresses these needs by applying an optimized computer-aided abstraction process and by being able to integrate other data sources and present use cases and applications.

11:20 108 Pistoia Alliance AppStore: Apps for life sciences R&D

Alex M Clark, aclark@molmatinf.com, R&D, Molecular Materials Informatics, Montreal, Quebec H3J2S1, Canada

The recent industry trend toward "appification" of software is starting to affect the domain of life sciences R&D. This involves the reimagining of conventional cheminformatics and bioinformatics tools and repackaging them as modular apps designed to provide optimal functionality on a mobile device (e.g. smartphone, tablet). For vertical markets such as the pharmaceutical industry, it is easy for these specialized apps to be lost in the forest of consumer oriented apps. In order to address this problem, and many others, the Pistoia Alliance has undertaken the task of building a storefront dedicated exclusively to apps for life sciences R&D. The advantages of an industry-specific storefront are many: the selection of apps is tightly focused (hundreds rather than hundreds of thousands) and the curation criteria are based on the needs of the industry. The Pistoia Alliance AppStore is supported by active discussion forums, and the ability of vendors to contact the users of their products is a key differentiator, compared to the anonymity of general purpose appstores. Apps are available for both iOS and Android devices, and the apps can be made available for free, or licenses can be negotiated directly between vendors and customers without incurring a toll. This presentation will discuss the benefits of the new appstore, and some of the early experiences and lessons learned during its implementation. The Pistoia Alliance is also working with TM Forum to augment its appstore with serverside support, which is intended to allow apps to make use of large datasets and intensive calculations using a secure cloud-hosted environment. Progress towards the design and construction of this service will be described.

11:50   Concluding Remarks

Wednesday, April 10, 2013 1:30 pm - 3:10 pm

Computational De novo Protein and Peptide Design - PM Session
Morial Convention Center
Room: 349
Cosponsored by COMP
Rachelle Bienstock, Organizers
Rachelle Bienstock, Presiding
1:30   Introductory Remarks
1:45 109 Novel in silico prediction algorithms for the design of stable and more effective proteins

Francisco G Hernandez-Guzman, francisco.hernandez@accelrys.com, Velin Spassov, Lisa Yan. Department of LS Modeling and Simulations, Accelrys, San Diego, CA 92121, United States

Understanding the effects of mutation on protein stability and protein binding affinity is an important component of successful protein design. In silico approaches to predict the effects of amino acid mutations can be used to guide experimental design and help reduce the cost of bringing biotherapeutics or new protein molecules (e.g. enzymes) to market. We have developed a number of novel methods for fast computational mutagenesis of proteins which can be applied to calculate the energy effect of mutation on protein stability, and on protein-protein binding affinity with an optional pH dependency calculation. Here, we will present those methods and associated validation results. Furthermore, we will provide a case study using a set of engineered antibodies that have altered pH-selective binding. These demonstrate how binding to either neonatal receptor (FcRn) or to their target antigens can be modified to tune their half-life in the host system.

2:10 110 Advanced structural modeling of biologics with BioLuminate

David A Pearlman, Tyler Day, Kathryn Loving, David Rinaldo, Noeris Salam, Dora Warshaviak, Kai Zhu, Woody Sherman, woody.sherman@schrodinger.com. Schrodinger, New York, NY 10036, United States

The field of biologics continues to grow in importance in the pharmaceutical industry. To address the increasing need for computational tools to model biologics we have developed BioLuminate, which contains a broad range of task-driven applications tailored specifically to the field of biologics. Our objective was to blend an easy-to-use interface with state of the art molecular simulations and de novo prediction tools. In this presentation, we describe the philosophy behind the design of BioLuminate and then focus on distinguishing features of the product, such as protein-protein docking with Piper, de novo antibody loop modeling with Prime, estimation of residue mutation effects, prediction of stabilizing mutations, determination of aggregation hotspots, and other distinguishing features of the product. We conclude by describing the primary challenges in the field and our research efforts to address them.

2:35 111 Virtual mutagenesis for optimizing antibody binding affinity: A prospective study

Enrico O. Purisima, enrico.purisima@nrc.ca, Vivcharuk Victor, Traian Sulea, Denis L'Abbé, Yves Durocher, Jason Baardsnes, Maureen O'Connor. Human Health and Therpeutics Portfolio, National Research Council of Canada, Montreal, Quebec H4P 2R2, Canada

Antibodies are emerging as an important new class of therapeutics that offers many advantages over small-molecule drugs. However, raising antibodies in animals requires a significant investment in resources and time with limited control over the definition of epitopes targeted or the level of binding affinities obtained. Computer-aided molecular design has the potential to speed up the process of affinity maturation. We used virtual mutagenesis to redesign an existing antibody that has dual weak affinities to VEGF-A and HER2. We used a combination of three methods - SIE, FoldX and Rosetta - to design sequences for improved affinities. Forty antibody mutants, each containing up to 4 amino acid mutations, were designed. These were cloned and expressed and their affinities measured by SPR. We will discuss the results of this study and the implications for computational approaches to virtual affinity maturation.

3:00   Concluding Remarks

Wednesday, April 10, 2013 1:30 pm - 4:50 pm

Advances in Virtual High-Throughput Screening - PM Session
Morial Convention Center
Room: 350
Joel Freundlich, Sean Ekins, Organizers
Sean Ekins, Presiding
1:30   Introductory Remarks
1:35 112 Setting up a discovery pipeline in KNIME and PipelinePilot: High-throughput de novo design utilizing gigantic virtual chemistry spaces

Carsten Detering, detering@biosolveit.com, BioSolveIT Inc, Bellevue, WA 98008, United States

Today's drug discovery is under a lot of pressure. Crowded patent space, tightened regulation by the FDA, and increased risk putting compounds in the clinic call for new pathways into unexplored and moreover larger areas of chemical space. The increased need for synthetically viable compounds has rendered de novo design difficult, up until recently. With this contribution we present a way to explore new chemical space utilizing existing in-house chemistry that renders retrosynthesis unnecessary, yet generates new chemical entities whilst maintaining the physico-chemical properties of the query compounds the same. Thus, activity is likely to be similar, but the hit compound will likely enter a different area of chemical space. This screening workflow can be easily set up in one of the popular workflow tools KNIME or PipelinePilot and taking full advantage of the synergy between functionality available from within and the embedded software itself. The presentation will highlight a few example workflows for both KNIME and PipelinePilot as well as the scientific background of the software used within.

2:00 113 New targets addressed by DEKOIS 2.0: Demanding evaluation kits for objective in-silico screening

Frank M. Boeckler, frank.boeckler@uni-tuebingen.de, Matthias R Bauer, Tamer M. I. M. Abdelrehim, Simon M. Vogel. Department of Pharmacy &Biochemistry, Eberhard Karls University, Tuebingen, Germany

With DEKOIS we have created an automated workflow to efficiently generate decoy sets based on a certain number of actives for any targets. Physico-chemical similarity should be maximized between decoys and actives in order to yield challenging sets for benchmarking, while exact mimicking of potentially active substructures should be avoided to omit latent actives in the decoy set (LADS). Overall, the diversity of actives and decoys should be maximized to avoid artifacts based on clusters. Applying this philosophy, we have added more details to describe the physicochemical space and applied this protocol to generate sets for targets which had not been accessible before. These DEKOIS 2.0 sets are available online (www.dekois.com) for benchmarking and development of new tailored scoring functions. Further extension toward additional targets can facilitate a systematic comparison of the virtual screening performance of docking tools and scoring functions in a target dependent way.

2:25 114 PubChem3D: A virtual screening platform

Evan Bolton, bolton@ncbi.nlm.nih.gov, PubChem, NCBI / NLM / NIH, United States

Virtual screening is a critical component of drug discovery to the reduce cost of and improve the success of a given biological assay screening campaign. Decisions need to be made rapidly about what compounds to purchase or which chemicals to synthesize from a large number of possibilities. Similarly, one must prioritize which high throughput screening “hits” to pursue. PubChem contains a huge wealth of information, including data from numerous medicinal chemistry projects and many (if not most) of the chemicals purchasable. Tools within PubChem (some of which are very new) allow one to quickly locate chemicals with similar bioactivity and similar structural features. This talk will provide an overview of tools oriented towards virtual screening available in PubChem with an emphasis on key advancements and newly introduced capabilities.

2:50   Intermission
3:05 115 Dual-event machine learning models to accelerate drug discovery

Sean Ekins1,2, ekinssean@yahoo.com, Robert C Reynolds3,4, Hiyun Kim5, Mi-Sun Koo5, Marilyn Ekonomidis5, Meliza Talaue5, Steve Paget5, Lisa Woolhiser6, Anne J Lenaerts6, Barry A Bunin1, Nancy Connell5, Joel S Freundlich5,7. (1) Collaborative Drug Discovery, Burlingame, CA 94010, United States, (2) Collaborations in Chemistry, Fuquay-Varina, CA 27526, United States, (3) Southern Research Institute, Birmingham, AL 35205, United States, (4) University of Alabama at Birmingham, Birmingham, AL 35294-1240, United States, (5) Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, Newark, NJ 07103, United States, (6) Colorado State University, Colorado, CO 80523, Afghanistan, (7) Department of Pharmacology &Physiology, UMDNJ, Newark, NJ 07103, United States

The identification of novel leads represents a significant challenge in the resource-limited setting of drug discovery. This hurdle is magnified in neglected diseases such as tuberculosis, characterized by ~2 million deaths annually and a need for shorter therapeutic regimens addressing drug resistance. We have leveraged high-throughput screening data, a multi-year and multi-million dollar investment by public and private institutions, to experimentally validate single- and dual-event Bayesian models. We virtually screened a commercial library and experimentally confirm actives with hit rates exceeding typical rates by 1-2 orders of magnitude. The first dual-event Bayesian model identified compounds with antitubercular whole-cell activity and low mammalian cell cytotoxicity from a published set of antimalarials. The most potent hit exhibits the in vitro activity and in vitro/in vivo safety profile of a drug lead. These machine learning models offer significant economies in time and cost while being broadly applicable to drug discovery.

3:30 116 Virtual high-throughput screening of novel pharmacological agents based on PASS predictions

Vladimir V. Poroikov1,2, vladimir.poroikov@ibmc.msk.ru, Dmitry A. Filimonov1, Alexey A. Lagunin1, Tatyana A. Gloriozova1, Olga A. Tarasova1, Pavel V. Pogodin1,2, Marc C. Nicklaus3. (1) Department for Bioinforatics, Orekhovich Institute of Biomedical Chemistry of Russian Academy of Medical Sciences, Moscow, Russian Federation, (2) Medical-Biological Faculty, The Russian National Research Medical University named after N.I. Pirogov, Moscow, Russian Federation, (3) Chemical Biology Laboratory, National Cancer Institute, National Institutes of Health, Frederick, MD, United States

Among the numerous tools currently used for virtual screening, PASS (http://pharmaexpert.ru/passonline) occupies a special place. PASS predicts 6400 biological activities of drug-like compounds with a mean accuracy of about 95%. Its training set consists of 330,000 biologically active compounds. Since PASS calculations for 50,000 structures take a few minutes on an ordinary PC, PASS is applicable to chemical libraries containing millions of compounds. Based on PASS predictions, novel pharmaceutical agents have been discovered with anxiolytic, anti-inflammatory, antihypertensive, anticancer and other actions. To find new anticancer agents, we have analyzed dozens of millions of structures from ChemNavigator and selected a few dozens for biological testing. Two out of eleven tested compounds were found to be potent anticancer NCEs, which are under preclinical studies now. We also present recent results of virtual screening for HIV-1 microbicides. Acknowledgement: This work was partially supported by FP7 grant No. LSHB-CT-2007-037590 and RFBR/NIH grant No. 12-04-91445-NIH_a/RUB1-31081-MO-12.

3:55 117 How GPUs can find your next hit: Accelerating virtual screening with OpenCL

Simon Krige1, simon@cresset-group.com, Mark Mackey1, Simon McIntosh-Smith2, Richard Sessions2. (1) Cresset Biomolecular Discovery, United Kingdom, (2) University of Bristol, Bristol, United Kingdom

The use of virtual screening to find new hits and leads has become commonplace within the pharmaceutical industry. 2D methods have largely been replaced by 3D ligand-based methods and by structure-based methods (docking) where a reliable protein structure is available. Cresset's blaze V10 virtual screening algorithm has been shown to significantly outperform DOCK on a wide range of targets, both in terms of raw enrichment rates and in terms of enrichments of novel chemotypes. However, the cost of calculating 3D molecular similarities is much higher than that for 2D similarity methods, and therefore large amounts of computing power are needed to screen a reasonable number of compounds on a useful time scale. In recent years, graphical processing units (GPUs) have become very popular for some high performance computing applications as they have a very good cost to performance ratio. Various frameworks have been developed and are now sufficiently mature to consider using in production environments. GPUs are therefore an ideal solution for computationally-intense problems such as virtual screening. In collaboration, the University of Bristol and Cresset have ported the blaze V10 virtual screening code to OpenCL, a framework for writing programs that execute across heterogeneous platforms (both CPU and GPU). We present results showing that the OpenCL port can provide an up to 40-fold speed increase and more accurate results when run on an off-the-shelf latest generation GPU, compared to a contemporary multi-core CPU. This not only reduces the time required to obtain results but also saves hardware cost and space, with a single cheap GPU performing as well as a cluster of dozens of CPUs. We discuss some of the difficulties encountered in reworking the blaze V10 algorithms to fit into a heterogeneous computing environment, present hardware comparisons, and give guidance on how to maximize performance while retaining full precision.

4:20 118 Mining frequent itemsets: Constructing topological pharmacophores using pharmacophore feature pairs

Paul J Kowalczyk, paul.kowalczyk@scynexis.com, Department of Computational Chemistry, SCYNEXIS, Research Triangle Park, NC 27709-2878, United States

We have adopted association rule mining to the task of topological (2D) pharmacophore construction. Association rule mining is a popular and well researched statistical approach for discovering interesting relationships between variables in large datasets. This approach finds joint values of variables that appear most frequently in a dataset. In this study, these variables are topological pharmacophore feature pairs (e.g., hydrogen bond donors, hydrogen bond acceptors, hydrophobes, aromatic rings, positive centers, negative centers) and the corresponding bond distances between them. Measures of significance and interest are used to score these joint pharmacophore feature pairs, with high scores identifying candidate topological pharmacophores. We demonstrate the construction of topological pharmacophores using publicly available antimalarial datasets. We also show how these topological pharmacophores may be leveraged as data mining and data visualization tools. The construction of topological pharmacophores by means of association rule mining and protocols for data visualizations are made freely available as scripts written in the Python and R programming languages.

4:45   Concluding Remarks

Thursday, April 11, 2013 8:00 am - 10:45 am

General Papers - AM Session
Morial Convention Center
Room: 349
Jeremy Garritano, Organizers
Jeremy Garritano, Presiding
8:00   Introductory Remarks
8:05 119 Lexichem: Not another chemical nomenclature app

Edward O Cannon, ed.cannon@eyesopen.com, OpenEye Scientific Software, Santa Fe, NM 87508, United States

A novel, fast, easy to use desktop application has been developed for Lexichem[1], OpenEye's chemical nomenclature software[2]. The desktop application offers the ability to extract chemical names and structures from patents, to easily visualize chemical structures by dragging and dropping files plus numerous other features.[EdwardCannon_ACSNewOrleansImage1.png] [1] E. O. Cannon, “ New Benchmark for Chemical Nomenclature Software , J. Chem. Inf. Model., 2012, 52 (5),pp 1124-1131 [2] Headquarters, OpenEye Scientific Software, 9 Bisbee Court, Suite D, Santa Fe, NM 87508

8:30 120 Teach our naming tool to be bilingual: Chinese name-to-structure conversion

David Deng, ddeng@chemaxon.com, Daniel Bonniot. ChemAxon LLC, Cambridge, MA 02138, United States

Chinese patent filings have been on the rise sharply during the past decade. In 2011, China overtook the U.S. and became the world top patent filer. Therefore, text mining of Chinese patents, including chemical patents, is of increasing importance. An application to convert Chinese chemical names to structures is urgently needed for Chinese chemical patent analysis. ChemAxon has developed a mature English name-to-structure conversion tool. In this presentation, we will demonstrate how this tool can now convert Chinese chemical names to structures. It has great potential to be used in other text mining fields, e.g. to extract chemical information from Chinese documents and webpages.

8:55 121 Withdrawn
9:20   Intermission
9:30 122 Withdrawn
9:55 123 Withdrawn
10:20 124 Algorithm for efficient conformational equivalence testing without a priori atomic correspondence or connectivity information

Gregory R. Magoon1,2, gmagoon@aerodyne.com, William H. Green2. (1) Center for Aero-Thermodynamics, Aerodyne Research, Inc., Billerica, CT 01821, United States, (2) Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States

An algorithm is described for comparing two sets of three-dimensional molecular coordinates to assess whether they correspond to the same conformer (within some small tolerance). The algorithm does not require or make use of connectivity information, and does not require a priori atomic correspondence information, though it will identify one or more viable atomic mappings if the two conformers are equivalent within the user-specified tolerance. In contrast to typical approaches (e.g. Kabsch algorithm) that make use of an RMSD metric, the algorithm uses an error metric based on maximum deviations between intraconformer atom-pair distances. The algorithm scales well with molecule size, avoiding the N! explosion of potential atomic mappings and achieving O(N2) scaling in the best case. Preliminary tests of the algorithm are described. The algorithm could be used in approaches to enumerate conformer ensembles, and is made available on the web through the open-source MoleCoor package.