CINF Symposia

ACS Chemical Information Division (CINF)
Spring, 2011 ACS National Meeting
Anaheim, CA (March 27-31)

R. Bienstock, Program Chair


Section A
Anaheim Convention Center
213 C

50 Years of Computers in Organic Chemistry: Symposium in Honor of James B. Hendrickson - Cosponsored by ORGN
M. Walker, Organizer, Presiding
9:00   Introductory Remarks.
9:10 1 James Hendrickson: A life-long quest for systematizing organic synthesis.
G. Grethe
Self employed, 352, CA, United States

During his long academic tenure, James Hendrickson was interested in applying logic and systematic characterization of molecules and reactions to organic synthesis. Starting in the early 70's, his work gradually evolved from a mathematical presentation of the structural and functional features of molecules and their reactions to the development of systematic signatures for organic reactions. In this presentation we will discuss the individual steps along the way illustrated by examples. Some recent developments by other groups in the area of reaction classification will be mentioned
10:00 2 Reaction classification, an enduring success story.
V. Eigner- Pitto, H. Kraut, H. Saller, H. Matuszczyk, P. Loew, G. Grethe
InfoChem GmbH, Munich, Germany; None, United States

Beginning in the late 1980s InfoChem started to develop a deep understanding of the storage and handling of chemical structure and reaction information. The first major project was the development of an electronic version of the printed abstract series “ChemInform” published by FIZ CHEMIE Berlin. Then in 1989 InfoChem acquired an exclusive license to a reaction database (SPRESI) of (initially) 2.3 million records. Since the reaction database management systems (REACCS and ORAC) commercially available at that time could not handle more than 500,000 records, InfoChem was forced to conceive a concept for the selection of meaningful subsets of SPRESI. Based on a high quality reaction center detection module, InfoChem's sophisticated reaction type classification application, “Classify”, remains unique to this day. This concept allowed the generation of widely used reaction type databases such as ChemReact (400,000 reaction types) and ChemSynth (100,000 reaction types). Classify also enables reaction type searching, and clustering of reaction databases, and, in particular, it is the only way of linking different reaction databases. The world's major vendors of chemical information have adopted this technology to enhance the reaction retrieval capabilities of their products. More recent developments at InfoChem have resulted in a processing tool for detecting name reactions in any reaction database, and the retrosynthesis tool ICSYNTH, both of which are based on the company's earlier fundamental work. This talk will briefly present the background and technology of these software modules and their efficient use in the field of modern reaction planning.
10:30   Intermission.
10:40 3 Back to the future of synthesis planning: How new technology and new resources revitalize the vision of computer aided synthesis design.
J. Law, M. Mirzazadeh, A. P. Cook, O. Ravitz, P. A. Johnson, A. Simon
SimBioSys Inc., Toronto, Ontario, Canada; School of Chemistry, University of Leeds, Leeds, United Kingdom

Sophisticated systems like LHASA and SYNGEN were regarded in the late 1980's as a great promise to the field of organic synthesis. Their intent, as Hendrickson stated, was “not to replace art ¼ but to show where real art lies”. Sparked by the introduction of retrosynthetic analysis, the newborn field of computer aided synthesis design proved that chemical perception and synthetic thinking can be formulated in an algorithmic fashion. However, the vision of routine use of such tools has not materialized, and research in that area came to a lull in the early 1990's. The major obstacle was the difficulty of generating high quality and up-to-date databases of synthetic transforms. We show how our retrosynthetic analysis system, ARChem, capitalizes on the advent of comprehensive reaction databases and the dramatic progress in computing capabilities to automatically generate expansive synthetic rule-sets, which pave the way to representation and application of synthetic strategies.


Section B
Anaheim Convention Center
211 B

Integration of Combinatorial Chemistry with Cheminformatics: Current Trends and Future Directions in Drug Discovery and Material Science
J. Medina-Franco, Organizer
M. Haranczyk, Organizer, Presiding
1:00   Introductory Remarks.
1:05 4 Experimental design for high throughput materials development.
J. N. Cawse
Cawse and Effect LLC, Pittsfield, MA, United States

High-throughput methods of chemical experimentation present a challenge to experimental planning. Experiments run in arrays of dozens to hundreds require rethinking of the classic methods of Design of Experiments. This talk will review the adaptation of classical methods and improvisation of new methods for high throughput systems. These methods are becoming more important as laboratories for chemistry and materials science are being equipped with the robots and high-speed analytical tools for the acceleration of research. In particular, the use of these methods for effective protection of a chemical patent will be discussed.
1:30 5 High-throughput strategies for synthesis and characterization of metal- organic frameworks for CO2 capture.
K. Sumida
Department of Chemistry, University of California, Berkeley, Berkeley, CA, United States; Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States

High-throughput methodologies are a tremendously versatile platform for the discovery of next-generation materials (metal-organic frameworks) for CO2 capture. However, the considerable impact that the reaction conditions employed in the synthetic step can have on the material properties results in a large number of synthetic trials, which result in tremendous quantities of data from powder X-ray diffraction and gas adsorption experiments. An ideal computational support system in this regard would allow rapid, automated identification of the highest performance materials, and provide feedback to the high-throughput synthetic step, such that the preparation of a material may be more rigorously optimized, and new target materials that might show high CO2 capture performance can be identified. Here, we discuss our overall progress towards this goal, and present a number of examples in which the system has been employed to discover the optimal synthetic conditions for the preparation of new metal-organic frameworks for CO2 capture.
1:55 6 Combinatorial library design revisited: Finding new uses for old tools.
D. K. Agrafiotis, V. S. Lobanov
Informatics, Johnson & Johnson Pharmaceutical Research & Development, LLC, Spring House, PA, United States

In the 15 years since our first publication on diversity analysis and library design, the field of combinatorial chemistry has traversed the entire length of the hype curve, from the initial excitement, to the peak of inflated expectations, to the trough of disillusionment, and finally to the plateau of productivity. Along the way, many of the tools that were originally developed for analyzing massive virtual libraries were either forgotten or adapted to the realities of modern pharmaceutical research. While the need to mine massive combinatorial libraries is no longer there, the tools have found a new life in supporting and automating smaller parallel synthesis efforts in lead generation and lead optimization. In this talk, we review some of these earlier technologies and describe their adaptation and integration in today's discovery workflows.
2:20 7 How to screen 10^14 cores per second.
P. S. Shenkin, K. P. Lorton
Schrodinger, New York, NY, United States

We describe Schrodinger's attachment-based core-hopping method and present results achieved using it. The method starts with a template compound in which core and side-chains are identified. The core is replaced by new cores from a library while maintaining side-chain positions as well as possible. No receptor is required, but if a docked pose is available, receptor interactions can be conserved. Several scores are computed. These include a synthesizability score as well as a score reflecting how well side-chain positions are maintained. A combination of GPU processing, multithreading, and automatic linker addition lead to an overall screening rate in excess of 1.0e14 unique cores per second.
2:45   Intermission.
3:00 8 Synergies of combinatorial chemistry and fragment-based drug design for efficient generation of focused virtual libraries.
L. Meireles, G. Mustata, I. Bahar
Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, United States

While combinatorial chemistry used to emphasize rapid synthesis and screening of large libraries of compounds, the current trend is to synthesize much smaller focused compound libraries. In this talk, we present our recently developed computational strategy that combines combinatorial chemistry and fragment-based drug design techniques, fragment linking and fragment growing, to generate focused virtual libraries more efficiently. Once combinatorial chemistry scaffolds are placed in the binding site, fragments can be grown from and/or linked to the scaffold side chains to maximize favorable interactions with the target protein. Different methods for placing the scaffold on the binding site will be discussed along with rules that are essential for effective filtering. One advantage offered by our strategy is that it can also be universally applied to design compounds that replicate onto combinatorial chemistry scaffolds the essential binding features of proteins, peptides and small molecules. The application of the methodology to designing inhibitors of c-Myc-Max protein interaction will be presented.
3:25 9 Chemical library design: From diversity, similarity, and multicriterion optimization to a versatile cheminformatics content management system (CCMS).
W. Zheng
Pharmaceutical Sciences, North Carolina Central University, Durham, North Carolina, United States

Combinatorial chemistry and high throughput screening research often involve the generation, storage and analysis of large datasets. These data are often complex and heterogeneous in nature. To enable the most efficient design of chemical libraries and biological assays, various computational methods have been developed in the past 15 years. More recent research in chemical genomics and systems chemical biology require the integration of different data sources and computational tools. For example, target family- and pathway- based library design may require information about biological targets and pathways. These requirements call for an integrated system that can organize data, models and computational tools in a flexible and extensible fashion. In this talk, I will first briefly review some concepts for library design, and then describe our effort to develop a flexible cheminformatics content management system (with tagging, sharing as well as user uploading of data and tools).
3:50 10 Six years of collaborative drug discovery in the cloud.
B. Bunin, S. Ekins, M. Hohman, K. Gregory, B. Prom, S. Ernst
Collaborative Drug Discovery (CDD, Inc.), Burlingame, CA, United States

Collaborative Drug Discovery hosts a widely used drug discovery data cloud platform with advanced collaborative capabilities for distributed researchers. The CDD Vault, Collaborate, and Public together host private, collaborative (selectively shared), and public data spanning the competitive, precompetitive, and neglected disease domains including publicly disclosed collaborations with GlaxoSmithKline, Pfizer, and the Bill & Melinda Gates Foundation, as well as with hundreds of academic and biotech startup companies. CDD provides a novel, collaborative approach for integration experimental and computational screening with distributed data collection, storage, visualization and analysis - balancing privacy-security with encouraging collaborations, when desired. Experiences will be shared with researchers using the “CDD Vault” - a secure, private industrial-strength database combining traditional drug discovery informatics (registration and SAR) with social networking capabilities. CDD Collaborate enables real-time collaboration by securely exchanging selected confidential data. Traditional drug discovery capabilities include the ability to import/export to ExcelÔ and sdfiles, Boolean queries for potency, selectively, and therapeutic windows for small molecule enzyme, cell, and animal data, substructure and Tanimoto similarity search, physical chemical property search, as well as IC50 calculation/curve generation, heat-maps, and Z/Z' statistics for archived data (protocols, molecules, plates, hyperlinked files). CDD Public has unique, constantly growing drug discovery SAR content.
4:15 11 Managing giant combinatorial chemistry spaces in silico.
C. Detering, H. Claussen, M. Lilienthal, C. Lemmen
BioSolveIT, Sankt Augustin, NRW, Germany

We will introduce a method which catches the two aforementioned two birds (chemcial complexity and chemical universe) with one stone: by cleverly searching a fragment space on the fly without the need to enumerate compounds, the computational overhead is kept to a minimum, and thus, search times are low (minutes for 1010 molecules). Secondly, if the fragment space is composed of the inhouse available chemistry, results obtained are much more likely to be synthesizable, as the chemical reaction protocol is automatically delivered together with the hits. We will show a few validation cases from the industry, and look at the properties of one publicly available fragment space which contains 12 billion molecules.

Section A
Anaheim Convention Center
213 C

50 Years of Computers in Organic Chemistry: Symposium in Honor of James B. Hendrickson - Cosponsored by ORGN
M. Walker, Organizer, Presiding
1:30 12 Toward the ideal synthesis: The role of step economy and function oriented synthesis in first-in-class approaches to HIV eradication, overcoming cancer resistance and treating Alzheimer's disease.
P. A. Wender
Department of Chemistry, Stanford University, Stanford, CA, United States

Jim Hendrickson has had a major impact on how we think about synthesis. He was also an inspiring influence of my early career. Evolving from that time are programs in our group directed at the eradication of HIV (Science 2008,649), overcoming resistant cancer (PNAS 2008 12128, the major cause of chemotherapy failure) and novel strategies for treating Alzheimer's disease (Neurobiology of Disease 2009, 332). A major aspect of these programs is the singular importance of step economy in synthesis and how that can be achieved by computational analysis, new reactions and function oriented synthesis (Accounts 2008 40). In this lecture we will show three case studies of how step economy provides a key to addressing major therapeutic challenges of our time.
2:20 13 Aiming for the ideal synthesis.
P. S. Baran
Department of Chemistry, Scripps Research Institute, La Jolla, CA, United States

Our laboratory is focused on the practical total synthesis of complex natural products such as alkaloids and terpenes by aiming to achieve the “ideal synthesis”. Hendrickson defined such a synthesis in 1975, stating: ”The ideal synthesis creates a complex molecule . . . . . in a sequence of only construction reactions involving no intermediary refunctionalizations, leading directly to the target, not only its skeleton but also its correctly placed functionality.” (JACS 1975, 97, 5784). In order to achieve this level of efficiency one must minimize superfluous refunctionalization steps such as protecting group and non-strategic redox chemistry. Such considerations require exquisite control of chemoselectivity by the invention of chemistry and logical frameworks to aid in the planning of such routes. This invention-oriented approach to total synthesis will be illustrated with several case studies from our laboratory.
3:10   Final introduction.
3:25 14 Half a century of computers in chemistry.
J. B. Hendrickson
Department of Chemistry, Brandeis University, Waltham, MA, United States

My half-century of chemistry and computers may be divided into three areas. The first was to calculate the lowest-energy conformations of the 6-10-membered cycloalkane rings, and then their pseudorotation energies, to assist in synthesis planning. The second area was to define a process to seek the optimal plans for efficient synthesis design. We developed a process to find just the few shortest synthesis routes to any input target structure and this has resulted in the SynGen program. This effort led to the third area, the development of a general system to afford a unique, linear string to describe any organic reaction, defined by its input reactant and product structures, irrespective of mechanism or number of operational steps in the reaction. This has afforded a program to assign a unique “signature” for any given reaction and has the important feature of providing searchable indexing for any reaction database.


Section A

CINF Scholarship for Scientific Excellence Financially supported by Accelrys
G. Grethe, Organizer
6:30 - 8:30
15 Exhaustive docking protocol with SAR-based pose selection.
F. Klepsch, G. F. Ecker
Department of Medicinal Chemistry, University of Vienna, Vienna, Austria

The polyspecific nature of the transmembrane drug efflux pump P-glycoprotein (P-gp) represents a great impediment for standard docking protocols. Furthermore, a ~6000 Å3 large transmembrane binding cavity, consisting of several binding sites, the high flexibility of P-gp and the lack of structural information render the correct ranking of docking poses a quite challenging task. Thus, we present a docking protocol that combines exhaustive conformational sampling of propafenone-type P-gp inhibitors with common scaffold clustering and SAR-based pose selection. The resultant binding hypotheses are in agreement with experimental data, which strengthens the validity of this approach. Analogous protocols were performed with other membrane proteins, like the GABAA receptor and the serotonin transporter. We acknowledge financial support provided by the Austrian Science Fund, grant F03502.
16 Comparison of weighted and unweighted consensus approaches in QSAR/QSPR..
D. Zhuang, A. Lee, R. Fraczkiewicz, M. Waldman, B. Clark, W. Woltosz
Life Science, Simulations Plus, Inc, Lancaster, CA, United States

Two flavors of making consensus categorical predictions in QSAR/QSPR, 'unweighted consensus' and the 'weighted consensus' approaches, were compared with several datasets using ADMET Predictor(TM). While the unweighted method gives equal weight to every member model, the weighted implicitly assigns different weights to the outcomes of its member models. To find out if there is any benefit of using one approach over the other, we constructed several datasets, which have different structural characteristics (balanced, imbalanced, diverse, non-diverse, and etc.), and built predictive models from them. The performances of the two approaches on these datasets were compared head-to-head using paired t-test. Our results show that the performances of the two approaches on the selected datasets are statistically equal, and thus in general there is no clear advantage of using one approach over the other. Possible reasons for the observation will be discussed.
17 When is chemical similarity significant? The statistical distribution of chemical similarity scores and its extreme values.
P. Baldi, R. J. Nasr
Department of Computer Science, University of California, Irvine, Irvine, CA, United States

As repositories of chemical molecules continue to expand and become more open, it becomes increasingly important to develop tools to search them efficiently and assess the statistical significance of chemical similarity scores. Here, we develop a framework for modeling, predicting, and approximating the distributions of chemical similarity scores and their extreme values in large databases. From the distributions of the scores and their analytical forms, Z-scores, E-values, and p-values are derived to assess the significance of similarity scores. In addition, the framework also allows one to predict the value of standard chemical retrieval metrics, such as sensitivity and specificity at fixed thresholds, or receiver operating characteristic (ROC) curves at multiple thresholds, and to detect outliers in the form of atypical molecules. Numerous and diverse experiments that have been performed, in part with large sets of molecules from the ChemDB, show remarkable agreement between theory and empirical results.
18 Reaction prediction as ranking molecular orbital interactions.
M. A. Kayala, C. A. Azencott, J. H. Chen, P. Baldi
Department of Computer Science, University of California, Irvine, Irvine, CA, United States

Being able to predict the course of chemical reactions is essential to the practice of chemistry. While computational approaches to this problem have been extensively studied in the past, a fast, accurate, and scalable solution has yet to be described. Here, we propose a novel formulation of reaction prediction as a machine learning ranking problem: given a set of molecules and a description of conditions, learn a ranking over potential filled to unfilled molecular orbital (MO) interactions approximating the corresponding transition state energy ranking. Using an existing rule-based expert system (ReactionExplorer), we derive restricted chemistry dataset consisting of 1300 full multi-step reactions with 2200 distinct starting materials and intermediates. This yields 3600 predicted MO interactions and 14 million unpredicted MO interactions. A two-stage machine learning scheme is used to learn the model. First, we train reactive site predictors using a combination of topological and real-valued global features to filter out 61% and 44% of non-predicted filled and unfilled MOs with a 0.0001% error rate. Then various ranking models are trained on the MO interactions using features engineered to approximate transition state entropy and enthalpy. Using cross-validation, current best models recover a perfect-ranking 61% of the time and recover a within-4-ranking 95% of the time.
19 Re-examining the tubulin-binding conformation of antitumor epothilones using QSAR and crystallographic refinement.
S. A. Johnson, A. J. Smith, J. P. Snyder, K. N. Houk
Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA, United States; Department of Chemistry, Emory University, Atlanta, GA, United States

Several different bioactive conformations of epothilones, potent anti-tumor compounds, have been reported in the literature. We proposed to provide additional support to one of these conformations using a QSAR-based approach. By assuming a common pharmacophore for a set of epothilone analogs, we clustered conformations of these analogs using dihedral angles responsible for orienting functional groups with known SAR effects. We identified clusters common among the most active compounds, and developed simple QSAR models that relate the experimental IC50 values to the conformational strain energy. The resulting epothilone conformer that minimizes strain energy in the active epothilone analogs is different from previously proposed conformers. This conformation demonstrates good agreement when refined in the experimental electron crystallographic density for tubulin-bound epothilone.
20 Efficient core structure searches using various fingerprinting methodologies: Advantages, particularities and pitfalls.
S. M. Furrer, D. J. Wild
School of Informatics and Computing, Indiana University, Bloomington, IN, United States; Science & Technology, Givaudan Flavors Corp, Cincinnati, OH, United States

The complexity of medicinal chemistry patent applications as well as the number of compounds enumerated as examples was increasing spectacularly in recent years. Finding the structures of major interest using traditional methods is often a difficult task. Molecular fingerprinting methods are excellent tools to rapidly organize chemical information. Different fingerprinting methods however represent structural characteristics in different ways. Multiple fingerprinting methodologies were evaluated in their capacity to differentiate and isolate core compounds in chemical patents. It was found that the fingerprint designs as well as medicinal chemistry approaches have significant impact on the overall performance: different tools shed different "lights" over the molecular landscape. Modal fingerprints were investigated to focus on core compounds in patents, through a relative over-expression of co-occurring molecular features. Concrete examples will be given based on several major patent cases.
21 DockingDB: A cyberinfrastructure for computer-aided drug design based on ChemDB.
P. M. Rigor
School of Information and Computer Sciences, University of California in Irvine, Irvine, CA, United States

Although there are several open-source and commercially available computational tools for virtual high-throughput drug screening -- including DOCK, Autodock and Schroedinger's Maestro; there is still a lack of a more general, tool-agnostic and scalable framework that is able to leverage the advantages offered by readily available docking and molecular dynamics programs in a high-performance computing (HPC) environment. We have developed a cyber-infrastructure built on top of an HPC pipeline and existing proteomics and chemical informatics tools -- such as ChemDB and SCRATCH -- to support an iterative computer-aided drug design methodology. We have applied our approach to two biological problems and describe preliminary results. Moreover, growing extensions to the pipeline and related tools are discussed.


Section A
Anaheim Convention Center
207 C

Natural Products and Drug Discovery: Chemiformatics and Computational Chemistry
R. Bienstock, Organizer
X. Wang, Organizer, Presiding
8:30   Introductory Remarks.
8:35 22 Protein Fold Topology: Will it aid drug discovery or is it the reason natural products have drug properties?
R. J. Quinn, E. Kellenberger
Eskitis Institute, Griffith University, Brisbane, Queensland, Australia; Université de Strasbourg, Illkirch, France

Natural products are made by nature through interacting with biosynthetic enzymes. Natural products also exert their effect as drugs by interaction with proteins. We have explored the question does the recognition of the natural product by biosynthetic enzymes translate to recognition of the therapeutic target. Molecular modeling of flavonoid biosynthetic enzymes and protein kinases with a series of natural product kinase inhibitors led to the development of the concept of Protein Fold Topology (PFT). PFT describes cavity recognition points unrelated to protein fold similarity. The topology or spatial properties are preserved even though there is deformation of the protein elements that participate in the protein-ligand interactions. We observe helices or Β-sheets as equivalent in providing the invariant topology for protein-ligand interaction and, as such, are seeking to find automated methods to interrogate these interactions.
9:05 23 Screening of herbs used in traditional Indonesian medicine for inhibitors of aldose reductase.
D. Barlow, S. Naeem, P. Hylands
Pharmacy, King's College London, London, London, United Kingdom

Virtual screening of phytochemical constituents of herbs used in traditional Indonesian medicine has been performed to search for novel leads active against the enzyme aldose reductase (AR). The screening was performed using the docking software, MolDock, and the activities (IC50s) of the docked compounds predicted using an artificial neural network (ANN) trained using the crystallographic data for AR complexes involving inhibitors of known potency. The ANN gave a mean accuracy of ~ 98% for the activities of those compounds involved in the known protein crystal structures. The trained ANN was used to predict the IC50s for all carboxyl containing compounds in the database of Indonesian herbal constituents, and the predicted IC50 values ranged from 17 nM to 118 mM. Selected hits were subsequently tested in vitro against human recombinant AR and while some of these proved to be about as active as predicted, others proved significantly less potent than predicted.
9:35 24 Common cold and flu: Computational strategies for the identification of antiviral leads from nature.
J. M. Rollinger, J. Kirchmair, U. Grienke, D. Schuster, K. R. Liedl, M. Schmidtke
Institute of Pharmacy and Center for Molecular Biosciences, University of Innsbruck, Innsbruck, Austria; Institute of Theoretical Chemistry and Center for Molecular Biosciences, University of Innsbruck, Innsbruck, Austria; Institute of Virology and Antiviral Therapy, Friedrich Schiller University, Jena, Germany
The search for new drug leads against respiratory viruses remains an area of active investigations. In this regard natural products offer a tremendous potential as source for antivirals. In our lab several virtual screening campaigns on 3D natural product databases such as pharmacophore searches, similarity-based approaches and docking have proven to be highly efficient for the target-oriented identification of bioactive candidates. Integration of these heuristic approaches with empirical ones, like ethnopharmacology and in vitro extract screening, are helpful strategies for prioritizing compounds to be isolated from natural sources and pharmacologically tested. Here we demonstrate the application of different in silico techniques for the discovery of new anti-rhinoviral and anti-influenza virus natural compounds using well defined molecular targets, such as the hydrophobic pocket in the rhinoviral capsid and the influenza virus neuraminidase.
10:05   Intermission.
10:20 25 Chemoinformatic analysis of natural products: Towards the discovery of DNA methyltransferase inhibitors of natural origin.
J. Medina-Franco, F. López- Vallejo, R. Guha, A. Bender, D. Kuck, F. Lyko
Torrey Pines Institute for Molecular Studies, Port St. Lucie, Florida, United States; NIH Chemical Genomics Center, Rockville, Maryland, United States; Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom; Division of Epigenetics, Deutsches Krebsforschungszentrum, Heidelberg, Germany

A comparative diversity analysis of natural products, drugs, the Molecular Libraries Small Molecule Repository (MLSMR), and combinatorial libraries is presented in this work. To this end, a multiple criteria strategy was employed including physicochemical properties, scaffolds and different fingerprints as molecular descriptors. The approach enabled a comprehensive analysis of property space coverage, the degree of overlap between collections, scaffold and structural diversity and overall structural novelty. Since several natural products contained in dietary products are implicated in the inhibition of DNA methyltransferases (DNMTs), which are emerging targets for the treatment of cancer, we conducted a docking-based virtual screening of a natural product database with a homology model of the catalytic domain of DNMT1. Herein we discuss the results of the virtual screening that represents a first step towards the systematic screening of compounds with natural origin targeting DNMTs.
10:50 26 Lessons from covalent inhibitor modeling.
O. Eidam, S. Bonazzi, S. Guttinger, J. Wach, I. Zemp, U. Kutay, K. Gademann
Chemical Synthesis Laboratory, EPFL, Lausanne, VD, Switzerland; Department of Pharmaceutical Chemistry, UCSF, San Francisco, CA, United States; Institut fur Biochemie, ETHZ, Zurich, ZH, Switzerland

Leptomycin B (LMB) has antifungal, antibacterial and anti-tumor activity and is an important “tool compound” in cell biology. It inhibits the export of certain proteins from the nucleus through specific alkylation of Cys528 of human CRM1. The recently published x-ray structure of CRM1 motivated us to model LMB to rationalize the activity of recently discovered LMB analogues. A manual modeling approach combined with all-atom energy minimizations was used. We found that modeling was largely guided by the structural environment, and steric and geometric restraints imposed both from the binding site and the ligand. Mechanistic considerations of covalent inhibitor binding highlight important residues in the binding site, and the internal energy of the ligand may play a crucial role in the binding mode of covalent inhibitors. Perhaps the most important lesson is that manual modeling can generate models useful for the design of future analogues.

Section B
Anaheim Convention Center
202 A

Open Data Open Data-, Open Science-, Open Knowledge- Financially supported by Chemical Structure Association Trust
P. Rusch, Organizer
I. Sens, Organizer, Presiding
9:00   Introductory Remarks.
9:10 27

Open Data and the Panton Principles.
P. Murray-Rust

Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom
Although an increasing amount of chemical data is becoming visible on the Internet it cannot be re-used without explicit permission to avoid potentially breaking copyright. The Open Knowledge Foundation and Science Commons have collaborated on a definition of Open Data and produced a set of principles and practices (Panton Principles) to help authors and publishers assert that their published data is truly Open. An example of fully Open Data is shown in Crystaleye http://wwmm.ch.cam.ac.uk/crystaleye with over 200,000 crystallographic datasets from the literature. Several publishers are adopting Panton, and this presentation will show the advantages of doing so.


9:35 28

Making priors a priority.
M. D. Segall, A. Chadwick
Optibrium Ltd., Cambridge, United Kingdom; Tessella plc., Burton upon Trent, Staffs, United Kingdom

When we build a predictive model of a drug property we rigorously assess its predictive accuracy, but we are rarely able to address the most important question, “How useful will the model be in making a decision in a practical context?” To answer this requires an understanding of the prior probability distribution and hence prevalence of negative outcomes due to the property. We will illustrate the importance of the prior to assess the utility of a model to select or eliminate compounds for further investigation. A better understanding of the prior probabilities of adverse events due to key factors will improve our ability to make good decisions in drug discovery, finding higher quality molecules more efficiently. As the data necessary to estimate these priors does not include proprietary compound structures, this presents an opportunity for collaboration to improve the basis for good decision-making for all.

Presentation (pdf)

10:00   Intermission.
10:10 29

Ensuring sustainability of a comprehensive and highly curated scientific data resource.
I. J. Bruno, C. R. Groom
CCDC, Cambridge, Cambridgshire, United Kingdom

The Cambridge Crystallographic Data Centre (CCDC) has been established as the primary repository for the experimentally determined 3D structures of organic and organometallic compounds for over 45 years. Individual data sets are available to the scientific community free of charge through CCDC's structure request service. Additionally structures are made available as part of the Cambridge Structure Database (CSD). Structures in the CSD are expertly curated by editorial staff so as to facilitate reliable and sophisticated retrieval, visualisation and analysis by software that the centre also develops. The CSD and associated software is made available on a subscription basis with significant discounts applied for academic institutions. The income generated from subscriptions has ensured until now the sustainability of a comprehensive and highly curated scientific resource. This presentation will discuss the implications that increasing throughput and scientific complexity have for the way CCDC must operate, opportunities for alternative distribution models that respond to evolving expectations of the scientific community, and the pitfalls we must avoid to ensure sustainability in the years ahead.

Presentation (pdf)

10:35 30 Visual search in scientific research data.
I. Sens, O. Koepler
German National Library of Science and Technology, Hannover, Germany

In recent discussions among research institutions and research funding agencies, scientific research data has been identified as of strategic interests. As a consequence there are ongoing efforts to establish an infrastructure to support storage, long-term preservation, and accessing of scientific research data. Registration of datasets with DOI names makes research data citable and searchable. To date a number of operational Digital Library systems for scientific research data already exist. Datasets often comprise numeric data on continuous or discrete scales and are often associated with textual metadata including data description, author and origin information. While searching in textual metadata is commonly available a content-based access to the research data is an open challenge. Thereby visualisation and visual analysis of numeric data is common when processing scientific research data. To close this gap in the information retrieval process we report on a concept and first implementations to support visual retrieval and exploration in a specific class of primary research data, namely, time-oriented data. The concept discusses relevant challenges for a general approach to scientific primary data and we present first implementations on a real-world dataset.


Section A
Anaheim Convention Center
204 C

Natural Products and Drug Discovery: Cheminformatics and Computational Chemistry
X. Wang, Organizer
R. Bienstock, Presiding
1:30 31 Specific targeting of the G-quadruplex in the c-Myc promoter with ellipticine.
T. A. Brooks, V. Gokhale, R. Brown, L. H. Hurley
College of Pharmacy, University of Arizona, United States; Arizona Cancer Center, University of Arizona, United States; BIO5 Institute, University of Arizona, United States

Previous studies have shown that the G-quadruplex in the c-Myc promoter is the silencer element for transcriptional control. More recent studies have shown the involvement of NM23-H2 and nucleolin in the activation and silencing of c-Myc transcription. Using a computational overlay of c-Myc G-quadruplex-binding compounds and virtual screening, we have identified ellipticine as a potential G-quadruplex-interactive compound. Then, by taking advantage of a Burkitt's lymphoma cell line in which only the non-translocated allele is under the direct control of the promoter containing the G-quadruplex, we were able to show that the c-Myc-lowering effect is directly due to interaction with the G-quadruplex. In follow-up studies using CADD we designed further ellipticine analogs. These studies provide the best available cellular evidence not only for the presence of G-quadruplex in the promoter elements of oncogenes such as MYC but also that inhibition of specific transcription can be mediated by small molecules that bind to this promoter element.
2:00 32 Exploring natural products for drug discovery by mining biomedical information resources.
N. Baker, N. Rice, D. Fourches, E. Muratov, A. Tropsha
Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, CHAPEL HILL, NORTH CAROLINA, United States; Laboratory of Theoretical Chemistry, Department of Molecular Structure, Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, Ukraine

Parallel screening of Natural Products (NPs) is a typical approach for identifying drug candidates and their targets. However, biomolecular targets of NPs are often discovered serendipitously. We report on the use of Chemotext, a database of assertions extracted from biomedical literature that link chemicals, targets, and diseases [J Biomed Inform 2010, 43:510-9] to rationalize the search for NP targets in the context of the Systems Chemical Biology paradigm [Nat Chem Biol 2007, 3:447-50]. We have identified similar biochemical pathways that NPs are known to interact with in both plants and humans. Through this analysis, we can deduce novel compound-target-disease associations as well as novel molecular targets for NP-derived compounds. Using Chemotext, we have collected and integrated cross-species NP-target associations. We present the case studies of Diabetes mellitus for predicting new compound-target interactions and Tacrolimus-Binding Proteins for detecting similar biochemical pathways in both plants and animals/humans.
2:30 33 In silico strategies in natural product research to combat inflammation and lifestyle diseases: Identification of FXR-inducing triterpenes from Ganoderma lucidum.
U. Grienke, J. Mihály-Bison, D. Schuster, D. Guo, B. R. Binder, G. Wolber, H. Stuppner, J. M. Rollinger
Institute of Pharmacy and Center for Molecular Biosciences, University of Innsbruck, Innsbruck, Austria; Center of Biomolecular Medicine and Pharmacology, Department of Vascular Biology and Thrombosis Research, Medical University of Vienna, Vienna, Austria; Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China

Farnesoid X receptor (FXR) is a ligand-activated transcription factor. The available structural information and the importance of FXR to control endogenous pathways related to inflammation and lifestyle diseases, like metabolic syndrome, dyslipidemia, atherosclerosis and type 2 diabetes renders FXR an attractive target for computational approaches. Virtual screenings of our in-house Chinese Herbal Medicine database with structure-based pharmacophore models revealed mainly triterpenes of the famous TCM fungus Ganoderma lucidum Karst. as putative FXR ligands. Ganoderma fruit body extracts verified the predicted FXR-inducing effect in a reporter gene assay which prompted us to determine its bioactive constituents. Five out of 25 secondary metabolites from G. lucidum, i.e. ergosterol peroxide, lucidumol A, ganoderic acid TR, ganodermanontriol, and ganoderiol F, dose-dependently induced FXR in the low micromolar range. To rationalize the binding interactions, additional molecular docking studies were performed, which allowed establishing a first structure activity relationship of the investigated triterpenes.
3:00   Intermission.
3:15 34 Discovery of natural product-derived 5HT-1A receptor binders by QSAR modeling of known inhibitors, virtual screening and experimental validation.
X. S. Wang
Department of Pharmaceutical Sciences, Howard University, Washington, DC, United States

The 5-Hydroxytryptamine receptor subtype 1A (5-HT1A) has been an attractive target to treat mood disorders such as anxiety and depression. In this study we have developed combinatorial Quantitative Structure-Activity Relationship (QSAR) models for 105 5-HT1A binders and 61 non-binders retrieved from the Psychactive Drug Screening Program (PDSP) Ki database. Three advanced methods, k-Nearest Neighbor (kNN), Random Forest (RF) and Support Vector Machine (SVM), were employed for model building. The robust QSAR models of 5-HT1A binders were then used to mine major natural product libraries such as the TimTec Natural Product Library (NPL) and Natural Derivatives Library (NDL). Multiple potential hits were identified and are currently examined by the PDSP for experimental validation. The success ratios, chemical diversities and structural novelties of the natural product libraries for the purpose of virtual screening were further explored in comparison with other types of screening libraries, i.e. drug-like libraries, targeted libraries and diversity libraries.
3:45 35 Traditional medicine patents lead to enhanced drug discovery derived from natural products.
J. Zabilski, R. Schenck
Content Planning, CAS, Columbus, OH, United States

Since ancient times natural products have provided relief from numerous aliments. Hippocrates, the father of modern medicine, noted that powder derived from the bark of the willow tree helped heal pain and headaches. In the 1800's, chemists isolated the beneficial substance as salicylic acid and refined it by buffering sodium salicylate with acetyl chloride to create acetylsalicylic acid or aspirin. In more recent years, Traditional Medicine patents have increasingly delved into rich vein of natural products for potential drug discovery. The CAS databases have mined this wealth by adding more than 50,000 new traditional patent records from several countries. This presentation will illustrate the vast content available and methods to easily explore it by using SciFinder or STN.

Section B
Anaheim Convention Center
204 A

Data Archiving, E-Science, and Primary Data
R. McFarland, N. Xiao, Organizers
L. Solla, Organizer, Presiding
1:30   Introductory Remarks.
1:40 36 Librarian2.0: Synthesizing data management and subject expertise.
B. Blanton-Kent, S. Lake, A. Sallans
University of Virginia Library, Charlottesville, VA, United States

The University of Virginia Library is working to support new data management requirements in science and engineering by developing a model that first draws upon close collaboration between data experts and subject librarians, and culminates in policy and infrastructure recommendations to the University's Office of the Vice President for Research (VPR) and the Office of the Vice President/Chief Information Officer (VP/CIO). This model begins with a data interview to assess the researcher's data management practices and needs and to establish a baseline awareness of current practice. After collecting this information, the results are furnished to the institutional repository team and NSF Data Management Plan working group to inform their processes. In aggregate form, this information is provided to the VPR and VP/CIO as policy and infrastructure recommendations. Ultimately, the entire process cycles back to the researcher. This presentation will offer a case study following a chemist/chemical engineer through this process.
2:05 37 Anatomy of a PubChem project.
S. Swamidass, B. Calhoun, M. Browning
Department of Pathology and Immunology, Washington University in St Louis, St Louis, MO, United States

More raw data from high-throughput screens is made available to the public every day, often through repositories like PubChem. This data, however, is often unorganized and incompletely annotated. Of particular interest, often several screens are components of a larger project. Each screen is a step in the project's workflow, its anatomy. Knowledge of the project's workflow includes non-obvious but valuable information. For instance, the scaffolds the project team chose to pursue and how exactly compounds were chosen for follow testing. Although, these details are not well annotated in PubChem projects, it is possible to infer them from the raw screening data using a collection of statistical techniques. Moreover, inferred workflows can be used to automatically discover additional active molecules, inform useful views of screening data, and identify methodological errors.
2:30 38 Evolution of the University of Minnesota Libraries' approach to e- scholarship.
M. Lafferty, L. Johnston
Science and Engineering Library, University of Minnesota, Minneapolis, MN, United States

Libraries have struggled with how best to respond to the challenges of e-science since the middle of the last decade. The University of Minnesota Libraries' approach to e-science and other cyberinfrastructure issues has changed multiple times since our initial response in 2006; it has primarily taken the form of groups rather than a dedicated position. We have more recently expanded our focus beyond e-science to e-scholarship in order to include areas such as the digital humanities. The talk will address the evolution of group structures and their primary emphases over the past 5 years, the rationales for different changes, and potential future directions.
2:55   Intermission.
3:05 39 Hosting a compound centric community resource for chemistry data.
A. J. Williams, V. Tkachenko, R. Kidd
ChemSpider, Royal Society of Chemistry, Wake Forest, NC, United States; Informatics, Royal Society of Chemistry, Cambridge, United Kingdom

Laboratories around the world continue to generate immense amounts of data that are non-proprietary and of value to the community. If available these data could dramatically reduce costs by minimizing rework and ultimately facilitating faster research. High quality reference data collections of chemical compound dictionaries, properties and spectra have been generated over many decades. With the advent of social networking tools and platforms such as Wikipedia, the community has an opportunity to contribute. The ChemSpider platform hosted by the Royal Society of Chemistry is a compound centric database with associated data. Already populated with almost 25 million unique compounds the community can deposit and host their own data, and curate and annotate existing data including those generated in Open Notebook Science Efforts. This presentation will provide an overview of progress to date and outline the vision of this community platform for chemistry and ensuring the longevity of chemistry reference data.
3:30 40 Library data services in the social sciences: Lessons for science?
K. Peter
University of Southern California Libraries, University of Southern California, Los Angeles, CA, United States

Social science data have a rich history within universities: aggregate statistical publications, such as Statistical Abstract of the United States, and even more detailed U.S. decennial census results, have long held a place within academic depository library collections. Following the development of Machine Readable Data Files, social science data archives were established within several universities across the United States—notably, the Inter-university Consortium for Political and Social Research and Roper Center for Public Opinion Research. Although differences between social science and science data are not insignificant (for example, average file size), as data librarians we face the similar obstacles to: outreach, access, archiving and management and, in general, effectively creating a place within libraries for data and data services. This presentation will outline current library services and service models for social science data in hopes of launching a dialog and skill-share between social science and sciences data professionals.
3:55 41 Using Data Curation Profiles (DCPs) as a means of raising data management awareness.
J. R. Garritano
Purdue University, West Lafayette, IN, United States

While one can discuss data management plans in a general sense, there is no single solution for managing the diverse data generated by various disciplines and projects. Therefore one possible solution is to determine best practices for individual data management plans guided by a more general Data Curation Profile (DCP). The DCPs were created at Purdue University and the University of Illinois Urbana-Champaign through a grant from the Institute of Museum and Library Services. Using a DCP, librarians and/or researchers explore various data management issues. Once a profile has been completed, not only will the librarian have a richer understanding of the kind and quantity of data that might have to be curated and archived, but the researcher will have a better understanding of their data preferences related to sharing and intellectual property, regardless of where the data ultimately resides. Current applications of the DCP at Purdue will be discussed.


Section A
Anaheim Convention Center
Hall B

R. Bienstock, Organizer
8:00 - 10:00 16. See previous listings.
42 Synthesis of 3-halo-2-butanones.
J. Porter
Transylvania University, United States

This study is attempting to find out the effect of adding a halide group to a ketone. The main molecules I worked with were 3-halo-2-butanones. I used ether as a solvent and performed Grignard reactions under nitrogen adding ethynyl Grignards as the nucleophiles. I was measuring diastereomeric ratios using GC-MS, H1 and C13 NMR, and GC. Unexpectedly, results showed that ratios were similar to those found using LiAlH4 as the nucleophile. Future experiments will be working with larger nucleophiles as well as using larger ketones.
43 Visualizing molecule similarity.
K. Boda
OpenEye Scientific Software, Santa Fe, New Mexico, United States

Similarity searching based on fingerprint similarity is one of the most common approach for virtual screening. The main advantages of the method that it provides a rapid calculation of similarity scores to identify molecules that are similar to the reference structure. However, most fingerprint methods does not provide any insight into molecule similarity beyond a single numerical score. The poster will represent a method where molecular graphs are highlighted using a color gradient scheme that emphasizes shared fragments encoded into fingerprints. This representation not only makes molecular similarity immediately apparent but also reveals information about the underlying fingerprint method. The method is utilized to analyze the hit-lists using different fingerprint methods on datasets of previously published benchmarks. The 2D graphics are generated using OpenEye's Ogham package that provides a framework to construct molecular diagrams. The poster will also represent various Ogham functionalities that allow the customization of molecule depiction.


Section A
Anaheim Convention Center
204 A

Internet and Chemistry: Social Networking - Cosponsored by YCC
H. Rzepa, Organizer
S. Bachrach, Organizer, Presiding
8:25   Introductory Remarks.
8:30 44 Collaborative agile Internet projects: The Green Chain Reaction.
P. Murray-Rust, S. E. Adams, L. Hawizy, D. M. Jessop
Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom

An Open Science project was designed, implemented and completed within a month to investigate whether chemical reactions were using "greener" solvents than formerly. 10 volunteers wrote or implemented code to extract recipes from European patents. The recipes were analysed by OSCAR and chemical Natural Language processing using medium-depth parsing to extract solvents, with high precision. The volunteers crawled the patent website, analysed over 100,000 recipes and posted the results to a communal, Open server, using the Lensfield "make/build" philosophy. The solvent information was then aggregated and presented for the years 2000 to 2010. There is no obvious trend showing that "green" solvents are becoming commoner.
9:10 45 Re-imagining scientific communication for the 21st century: Is chemistry low hanging fruit or the worst-case scenario?
C. Neylon
ISIS Neutron Source, Science and Technology Facilities Council, Didcot, NON-US, United Kingdom

We are told that “the web changes everything” but scientific communication still owes more to the 17th century than to the 20th. The central problem with current practice is the view of “the paper” as a monolithic object, and the only form of communication that is rewarded. We need to both technically enable the publication of many different research objects and to create tools to aggregate these together into large narrative works that retain the structure and meaning of internal links. Along with this we need both technical and social infrastructure to help us filter and discover this large range of items. I will argue that chemistry, and in particular synthetic organic chemistry, is a special case with its own particular difficulties, but that the inherent structure and regularity of synthetic research makes it a good target for testing and demonstrating new approaches to scholarly communication.
9:50 46 Quixote: An Internet project to build a distributed Open Knowledgebase for quantum chemistry.
P. Murray-Rust, J. Thomas, P. Echenique, J. Estrada, M. D. Hanwell, S. E. Adams, W. Phadungsukanan, L. Westerhoff
Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom; Computational Science and Engineering Department, Science and Technology Facilities Council, Daresbury Laboratory, Daresbury, Cheshire, United Kingdom; Instituto de Química Física "Rocasolano", CSIC, Madrid, Spain; Department of Scientific Visualization, Kitware, Inc, Clifton Park, NY, United Kingdom; Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom; QuantumBio Inc., State College, PA, United States

Quixote is a distributed semantic knowledgebase for quantum chemistry deliberately prototyped within a month by distributed volunteers. It uses a wide range of existing Open Source tools such as from the Blue Obelisk collection and uses them to translate conventional QC files (log, punch, archive, input) into semantic form. The semantics are controlled by per-program dictionaries which are created by program experts. The process is controlled by and rests heavily on modern Internet approaches such as Etherpad, Skype, Wiki, REST, HTTP, RDF and SPARQL. Parsing is through ANTLR and recursive descent. Semantics are provided by namespaced dictionaries, elements and attributes allowing lossless transmission of information. The system is completely Open/free and allows anyone to clone and run a node, on a peer-to-peer system with as much or little security as desired.
10:30   Intermission.
10:40 47 Catching the mobile wave.
S. M. Muskal
Eidogen-Sertanty, Oceanside, CA, United States

With the explosive growth of mobile computing environments, including the iPhone, Android-based devices, the iPad, and its fast-followers, it has become important for scientific software companies to enable technology and content access on these ubiquitous devices. Coupled with cloud computing environments (e.g. Amazon's EC2 and RDS environments), these platforms represent the new frontier for scientific computing. We will describe both technical and business challenges and lessons learned as we developed our mobile apps - iKinase, iKinasePro, iProtein, and MobileReagents.
11:20 48 Chemistry in your pocket: Shrinking cheminformatics applications for mobile devices.
A. M. Clark
Molecular Materials Informatics, Montreal, Quebec, Canada

Internet resources are now a routine part of the workflow of a research chemist, and in recent years many of these services have been made accessible from ultra-portable devices such as smartphones and tablet computers. Efforts have been hampered by the need to draw chemical structures to access certain functionality, e.g. searching databases by structure. To a large extent mobile devices have been limited to use for content consumption. Implementing a chemical structure sketching interface on a tiny device is difficult, because the traditional paradigm requires an accurate pointing device, such as a mouse. A finger on a touchscreen is simply too clumsy for standard structure drawing techniques, and many devices lack a pointing device entirely. This presentation will describe a new approach to drawing 2D chemical structures, which reevaluates the traditional drawing techniques in order to make them work well with input-constrained devices. This is accomplished by using a high degree of automation and inference, which is provided by newly developed algorithms. The end result is a mobile application which can be used to create publication quality 2D sketches with a small number of steps, which is convenient to use on a variety of current smartphones and tablets, including BlackBerry, iPhone and iPad devices. Also discussed will be some of the internet-based applications which are possible now that a viable structure editor is available. With this hurdle removed, a large number of desktop-based cheminformatics applications can be migrated to smaller devices by splitting the interface between a mobile client and web-based services. Mobile devices can now be used for creating, managing, viewing and sharing chemical information.


Section A
Anaheim Convention Center
204 A

Internet and Chemistry: Social Networking - Cosponsored by YCC
H. Rzepa, Organizer
S. Bachrach, Organizer, Presiding
1:30 49 chemicalize.org: Adding chemistry to Web pages and predicted data and links to structures.
A. Allardyce, A. Stracz, D. Bonniot, F. Csizmadia
ChemAxon, Budapest, Hungary

chemicalize.org is a new free online service developed by ChemAxon which adds chemistry to Web pages as well as data and Web pages to structures. The primary use is to parse chemical names from Web page text and serve an annotated Web page version which includes structure images hyper-linked from the chemical name source. By storing structures and Web page URL's we can search the database to find those Web pages containing any given structure query. For each structure users can also generate structure based prediction results within a user customizable report, predictions include logP, pKa, logD etc. Current developments center around user profiles, 'tracking' structures in newly chemicalized pages and presenting chemicalize.org user activity to give a snapshot of current Web pages and structures that are interesting chemists online. This presentation will outline the aims of the development, describe the service, current developments and overview use and user feedback.
2:10 50 Using Campus Guides for leveraging Web 2.0 technologies and promoting the chemistry and life sciences information resources.
S. Baykoucheva
White Memorial Chemistry Library, University of Maryland, College Park, MD, United States

The introduction of Campus Guides and a “lighter” version of this program, Lib Guides, in the last few years has created many exciting opportunities for science librarians to promote the chemistry and life sciences information resources in a new way using multimedia and social networking tools. The flexibility and the wide range of solutions these programs provide have tempted librarians to use them in many innovative ways, which has not been possible to do in static web pages controlled by rigid rules and other external factors. This presentation will show how users have responded to the new dynamic information environment created with Campus Guides and what the statistical data show about their preferences toward particular information resources in chemistry and the life sciences.
2:50   Intermission.
3:00 51 How the web has weaved a web of interlinked chemistry data.
A. J. Williams
ChemSpider, Royal Society of Chemistry, Wake Forest, NC, United States

The internet has provided access to unprecedented quantities of data. In the domain of chemistry specifically over the past decade the web has become populated with tens of millions of chemical structures and related properties of assays together with tens of thousands of spectra and syntheses. The data have, to a large extent, remained disparate and disconnected. In recent years with the wave of Web 2.0 participation, any chemist can contribute to both the sharing and validation of chemistry-related data whether it be via Wikipedia, the online encyclopedia, or one of the multiple public compound databases. This presentation will offer a perspective of what is available today, our experiences of building a public compound database to link together the internet, and a suggested path forward for enabling even greater integration and connectivity for chemistry data for the masses to both use and participate in developing.
3:40 52 What is the Internet doing to chemistry and our brains?
S. Heller
NIST, Gaithersburg, Maryland, United States

The Internet, like any technology, has good, bad, and ugly sides to it. This lecture will attempt to talk about these aspects with examples in chemistry that should both enlighen and disturb.


Section A
Anaheim Convention Center
204 B

Internet and Chemistry: Social Networking - Cosponsored by YCC
H. Rzepa, Organizer
S. Bachrach, Organizer, Presiding
8:30 53 Bridging the gap: Publishing and consuming the scientific literature in a digital, device-agnostic world.
D. P. Martinsen
American Chemical Society, Washington, DC, United States

Scientific publishing has seen a steady transition from the primarily paper-based model of the pre-2000 era to the digital world of the late 1990s and now the first decade of the 21st century. While usage analysis, as well as end-user studies, indicate that paper, or at least PDF files printed out on paper, are still the preferred way for most scientists to interact with the scholarly literature, there is a growing percentage of scientists who are asking for more. New data formats, new devices, and new applications present a challenge for publishers as well as authors and readers. Publishers try to keep up with the demands of authors and readers who want to push the technology, while at the same time addressing the more modest concerns of the majority of scientists who just want to get the article text and not be bothered with bells and whistles. While some call for a revolution in publishing, the reality is a much slower evolution. Publishers, authors, editors, reviewers, and readers all make inputs into the ecosystem, and each responds, sometimes in unexpected ways, to the changes that are made. As the journal of the future and the article of the future, emerge from the old models, it is useful to consider the impact of those changes.
9:10 54 Open access in chemistry: Information wants to be free?
J. Kuras, B. Vickery, D. Kahn
Chemistry Central, London, United Kingdom

The open access (OA) publishing movement was motivated by a desire to increase visibility and dissemination of scientific information. Electronic publishing and the advent of the Internet helped establish and accelerate the growth of OA in the early 2000s. Acceptance and uptake was significant amongst e.g. the high-energy physics and biomedical research communities as demonstrated by the success of initiatives such as ArXiv, BioMed Central, and the Public Library of Science. In chemistry, the growth of OA has been more conservative. This presentation will review the development of OA in chemistry, examine the current situation with reference to recent studies, and look forward to future directions in particular with the emergence of other open data initiatives and Web technologies.
9:50   Intermission.
10:00 55 OpenTox: An open-source web-service platform for toxicity prediction.
D. A. Gallagher, B. Hardy, S. Chawla
CAChe Research LLC, Beaverton, Oregon, United States; Douglas Connect, Zeiningen, Switzerland; Seascape Learning LLC, Cuppertino, California, United States

The new European Union (EU) REACH chemical legislation will require 3.9 million additional test animals, if no alternative methods for toxicity prediction are accepted. However, the number of test animals could be significantly reduced by utilizing existing experimental data in conjunction with (Quantitative) Structure Activity Relationship ((Q)SAR) models. To address the challenge, the European Commission has funded the OpenTox (www.OpenTox.org) project to develop an open source web-service-based framework, that provides unified access to experimental toxicity data, in Silico models (including (Q)SAR), and validation/reporting procedures. Now, in the final year of the initial three-year project, the current state of architecture, Open API, algorithms, ontologies, and approach to web services will be presented. Our experiences on current collaborative approaches aiming to combine OpenTox with other systems such as CERF, Bioclipse, CDK, and SYNERGY to create “super-interoperable K-infrastructure” will be discussed both in terms of conceptual promise and implementation reality.
10:40 56 CAS Registry: Maintaining the gold standard for chemical substance information.
R. Schenck, J. Zabilski
Department of Content Planning, Chemical Abstracts Service, Columbus, OHIO, United States

CAS has traditionally built its databases from the journal and patent literature. With the advent of the Internet, CAS now has another major source of chemical substance information. This presentation will discuss these internet resources and how CAS evaluates them for inclusion in CAS REGISTRY, while maintaining its quality standards. Since 1965, the scientific experts at CAS have identified more than 56 million organic and inorganic substances. This presentation will examine the sources of this growth and illustrate what CAS is doing to keep pace with this explosion in small molecule chemistry.
11:20 57 Evolution of the science journal and the chemical publication.
H. S. Rzepa
Department of Chemistry, Imperial college London, London, United Kingdom

The concept of a modern scientific journal becomes 346 old in 2011 (DOI: 10.1098/rstl.1665.0001), although only since 1994 has the journal article been embedded in the Internet and Web era (DOI: 10.1039/C39940001907). Although the structure of the article itself morphed little during the first part of the Internet age, there are now signs that many aspects of its creation and dissemination are starting to evolve more rapidly. Here, several potential future enhancements are reviewed, including the role of the scientific blog in augmenting the effectiveness of the peer-review processes, the role of data-integrity within the article, integration of Web-enhanced and other data-rich and functional objects, the role of open digital repositories, article semantification, and delivery and re-functionalisation of the re-invented article via new generations of mobile personal devices.

Section B
Anaheim Convention Center

General Papers
R. Bienstock, Organizer, Presiding
9:00   Introductory Remarks.
9:05 58 Collaborative QSAR analysis of Ames mutagenicity.
E. Muratov, D. Fourches, A. Artemenko, V. Kuz'min, G. Zhao, A. Golbraikh, P. Polischuk, E. Varlamova, I. Baskin, V. Palyulin, N. Zefirov, L. Jiazhong, P. Gramatica, T. Martin, F. Hormozdiari, P. Dao, C. Sahinalp, A. Cherkasov, T. Oberg, R. Todeschini, V. Poroikov, A. Zaharov, A. Lagunin, D. Filimonov, A. Varnek, D. Horvath, G. Marcou, C. Muller, L. Xi, H. Liu, X. Yao, K. Hansen, T. Schroeter, K. Muller, I. Tetko, I. Sushko, S. Novotarskyi, N. Baker, J. Reed, J. Barnes, A. Tropsha
University of North Carolina, Chapel Hill, NC, United States; A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, Ukraine; Moscow State University, Moscow, Russian Federation; University of Insubria, Varese, Italy; US Environmental Protection Agency, Cincinnati, OH, United States; Simon Fraser University, Burnaby, Canada; University of British Columbia, Vancouver, Canada; University of Kalmar, Kalmar, Sweden; University of Milano-Bicocca, Milan, Italy; Institute of Biomedical Chemistry RAS, Moscow, Russian Federation; University of Strasbourg, Strasbourg, France; Lanzhou University, Lanzhou, China; Technical University of Berlin, Berlin, Germany; Institute for Bioinformatics, Nuremberg, Germany; BioWisdom Ltd, Cambridge, United Kingdom

We report the results of a collaborative QSAR modeling project between 15 teams to develop predictive computational QSAR models of in vitro Ames mutagenicity induced by organic compounds. The Ames dataset consisted of 6542 compounds (after curation). In total, 32 predictive classification QSAR models were developed using different combinations of chemical descriptors and machine learning approaches, representing the most extensive combinatorial QSAR modeling study ever done in the cheminformatics field in public domain. The resulting consensus model had the highest external predictive power nearly reaching the experimental reproducibility of 85% for the Ames test. In addition, we found published evidence indicating that 31 of 130 outliers (29 mutagens and 2 non-mutagens) were erroneously annotated in the original dataset. This work presents a model of collaboration that integrates the expertise of participating laboratories to establish the best practices and most reliable solutions for difficult problems in chemical and computational toxicology.
9:25 59 How (not) to build a toxicity model.
A. C. Lee, R. Clark, M. Waldman, J. Chung, R. Fraczkiewicz, W. S. Woltosz
Department of Life Sciences, Simulations Plus, Inc., Lancaster, CA, United States

When a seemingly well-curated chemical data set hits the press, a modelers' first impulse is to apply their preferred QSAR method to the data in hopes of building a model that exhibits superior statistics to other published models. Occasionally, the results appear too good to be true. Are these models useful? This work details a procedure for building a useful and well-validated model, using respiratory sensitization data. We highlight the do's and don'ts of data selection, pre- and post- data curation, QSAR methodologies, and validation strategies implemented from 1984 to present. The examples demonstrate how to identify a narrow sampling of chemical space by examining good-looking models, applying a model to (believable) real-world data in order to determine its usefulness both inside and outside the model's applicability domain, and techniques that modelers (should) use to validate as well as assess the robustness of a model.
9:45 60 Metabolic site prediction using artificial neural network ensembles.
M. Waldman, R. Fraczkiewicz, J. Zhang, R. D. Clark, W. S. Woltosz
Simulations Plus, Inc., Lancaster, CA, United States

Hepatic first-pass metabolism of drugs and prodrugs plays a key role in oral bioavailability, and the cytochrome P450 enzymes are responsible for metabolism of most drugs. Knowledge of likely sites of metabolic attack in a drug molecule can aid in designing out unwanted metabolic liabilities early on in the drug discovery process as well as in the design of prodrugs where metabolic transformation is desired. Using datasets constructed from literature compilations and commercially available databases, we have constructed models based on artificial neural network ensembles that predict one or more likely sites of metabolism for a given molecule for several CYP isoforms including 2C9, 2D6, and 3A4. The models employ atomic descriptors describing charge, reactivity, steric accessibility, and other properties of the candidate atom and its local environment. Model performance will be shown based on various statistical criteria as well as specific examples demonstrating scope and limitations.
10:05 61 Withdrawn.
10:25   Intermission.
10:35 62 Use and results of using an online chemistry laboratory package in a large general chemistry course.
R. L. Nafshun
Department of Chemistry, Oregon State University, Corvallis, Oregon, United States

In addition to traditional on-campus general chemistry courses, The Department of Chemistry at Oregon State University has been offering an online general chemistry sequence since 2003. We have struggled to identify a method of facilitating an appropriate distance laboratory program. We have investigated a "kitchen" chemistry kit and various online virtual toolboxes. We are currently using a virtual laboratory package (www.onlinechemlabs.com) which presents the user with a split screen: one side contains chemistry laboratory tools and the other is text. The tools include standard experimental equipment such as an analytical balance, flasks, pipettes, and reagents, as well as more complex analytical instruments or reaction equipment such as an absorbance spectrophotometer, calorimeter, NMR, and a combustion chamber. The logical progress (or flow) of these tools in experiments is analogous to that in classroom labs. The tools incorporate both random and systematic error, providing data simulations where detailed error analyses can be performed that are analogous to that in classroom laboratory experiments. Each of these features allows for a significant enhancement in instructional capabilities, and could integrate very well with the instructional modalities of models and argumentation that have been recently developed and outlined in more detail below. Results of the use of the online chemistry laboratory package in three different modes (fully online/hybrid/supplemental) and methods of use will be discussed.
10:55 63 Reaction prediction as ranking molecular orbital interactions.
M. A. Kayala, C. A. Azencott, J. H. Chen, P. Baldi
Department of Computer Science, University of California, Irvine, Irvine, CA, United States

Being able to predict the course of chemical reactions is essential to the practice of chemistry. While computational approaches to this problem have been extensively studied in the past, a fast, accurate, and scalable solution has yet to be described. Here, we propose a novel formulation of reaction prediction as a machine learning ranking problem: given a set of molecules and a description of conditions, learn a ranking over potential filled to unfilled molecular orbital (MO) interactions approximating the corresponding transition state energy ranking. Using an existing rule-based expert system (ReactionExplorer), we derive restricted chemistry dataset consisting of 1300 full multi-step reactions with 2200 distinct starting materials and intermediates. This yields 3600 predicted MO interactions and 14 million unpredicted MO interactions. A two-stage machine learning scheme is used to learn the model. First, we train reactive site predictors using a combination of topological and real-valued global features to filter out 61% and 44% of non-predicted filled and unfilled MOs with a 0.0001% error rate. Then various ranking models are trained on the MO interactions using features engineered to approximate transition state entropy and enthalpy. Using cross-validation, current best models recover a perfect-ranking 61% of the time and recover a within-4-ranking 95% of the time.


Section A
Anaheim Convention Center
204 B

Internet and Chemistry: Social Networking - Cosponsored by YCC
H. Rzepa, Organizer
S. Bachrach, Organizer, Presiding
1:40 64 Automated semantic data embargo and publication by the CLARION project.
S. E. Adams, N. Day, J. Downing, B. Brooks, P. Murray-Rust
Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom

The CLARION project has created the infrastructure to enable research chemists to make selected data available as Open Data, shared over the Semantic Web, without requiring technical expertise themselves. Data is automatically collated from central services, such as the Departmental Crystallographic Service, and chemists' Electronic Lab Notebooks. An Embargo Manager application presents research groups with a view of the data they own, and allows them to set embargo conditions and add additional metadata. Once the embargo period expires data is automatically semantified and deposited as Open Data in a public Chem# repository.
2:20 65 Chemical eCommerce.
K. Gubernator
eMolecules, Inc., Solana Beach, CA, United States

Chemist are late adopters of the internet. The main obstacle is that search engines and eCommerce systems are text-based and as such inherently inadequate to handle chemical structures. Also, chemical nomenclature and names are poorly standardized and inconsistently used by both suppliers and buyers of chemicals. Therefore, only the combination of a chemical search engine and a chemical eCommerce system can address the needs of the market. Such a system has to handle millions of chemical structures, return results in seconds, and provide tools to handle lists of thousands of molecules. In addition, user expectations are created by their experiences with Amazon and eBay: Prices and availability should be on line. The purchasing process is expected to be predictable: you get what you order on time. Implementing and operating a chemical eCommerce system therefore requires a paradigm shift in the quality of the entire purchasing process.
3:00   Intermission.
3:10 66 Waiting on the Chemical Internet.
S. M. Bachrach
Department of Chemistry, Trinity University, San Antonio, TX, United States

The chemical internet dates back roughly to 1994. Over that time the impact of the Internet and the web on society in general has been overwhleming. Business have come and gone, communication has evolved from web sites to blogs to tweets. But for chemists, the impact has been of much less significance. The talk will present some of the causes of the slow uptake of the Internet by chemists and what potentially the future might hold for us.
3:50 67 Rapid dissemination of chemical information for people and machines using Open Notebook Science.
J. Bradley, A. S. Lang
Department of Chemistry, Drexel University, Philadelphia, PA, United States; Department of Mathematics, Oral Roberts University, Tulsa, OK, United States

This presentation will cover methods and tools used to collect, record and disseminate chemical information using Open Notebook Science, the practice of making a laboratory notebook and all associated raw data available publicly in as close to real time as possible. Both solubility measurements and organic chemistry reactions are handled in this way. The recording of laboratory data is handled primarily using free and hosted services such as Wikispaces and Google Spreadsheets. The information is made discoverable using redundant communication channels, including Google, Google Scholar, Wikipedia and other vehicles. The abstraction of key elements from the solubility measurements and the chemical reactions allows for the use of live machine-readable feeds and web services. The implications for the future of the automation of the scientific process based on Open Data and Open Services will be discussed.