Technical Program (abstracts)

ACS Chemical Information Division (CINF)
Spring, 2012 ACS National Meeting
San Diego, CA (March 25 - 29)

CINF Symposia

R. Bienstock, Program Chair

SUNDAY MORNING

Section A
San Diego Convention Center
Room 27A

Drug Polypharmacology Prediction and Design Cosponsored by LIFE
S. Zhang, Organizer, Presiding
9:00   Introductory Remarks.
9:05 1 Polypharmacology, drug repurposing and collaborative drug discovery: Shining light or flash in the pan?
Christopher A Lipinski, clipinski@meliordiscovery.com, Scientific Advisory Board, Melior Discovery, Waterford, CT 06385-4122, United States
Technical and scientific arguments strongly support polypharmacology, drug repurposing and collaborative drug discovery approaches. In opposition are people / cultural issues that tend to a pessimistic viewpoint. Societal value of efforts in academia versus industry in these areas is unclear. Is there an aspect of academic culture that directs to a higher error rate and lack of confidence compared to work performed in industry? Bias and error in academic biology is well documented as are errors in public chemistry databases. Both academic target identification errors and public database chemistry structural errors are common. Peer reviewed publication pressure induces bias and error. “Hypothesis driven research” can select for high error rates. Unknown is how industry compares to academia in terms of bias and error. How do the internal pressures of metrics, stage gates and timelines in industry compare to the pressures of publish or perish in academia?
9:25 2 High accuracy polypharmacology models for large datasets
S. Joshua Swamidass, swamidass@gmail.com, Bradley Calhoun, Department of Pathology and Immunology, Washington University in St Louis, St Louis, MO 63108, United States
To predict the targets and off-targets of molecules we developed predictive models of the small molecule inhibitors of several hundred proteins. Surprisingly, support vector machine (SVM) predictive models could not reliably separate known inhibitors from a set of a half million commonly screened molecules. In contrast, carefully predictors that specifically encode critical pieces of chemistry knowledge (like the similarity principle) yield more powerful models that can reliably extrapolate to large, diverse sets of molecules. In most cases, these models correctly identify---with accuracy greater than 95%---inhibitors in the same test the SVM fails. This study highlights the pitfalls of relying on models outside their domain of applicability, but also suggests predictive models specifically designed to incorporate chemistry knowledge can dramatically outperform generic predictive algorithms.
9:45 3 QSARome of GPCRs
Eugene Muratov1,2, murik@email.unc.edu, Guiyu Zhao1, Denis Fourches1, Chris Grulke1, Alexander Tropsha1. (1) University of North Carolina, Chapel Hill, NC 27599, United States (2) A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, Ukraine
Many marketed drugs for treating CNS disorders have complex G Protein-Coupled Receptors (GPCR) polypharmacology leading either to favorable pharmaceutical outcomes or undesired adverse events. We have curated and integrated binding data for thousands of GPCR ligands extracted from both ChEMBL and PDSP databases. We have developed and extensively validated a panel of Quantitative Structure Activity Relationships (QSAR) classification models for 34 different receptors (i.e., the QSARome). Most models achieved high prediction performances according to a rigorous 5-fold external validation. The QSARome was applied to assess the GPCR binding profiles of 13 external drugs not present in the modeling set, reaching an external prediction accuracy of 70.5%. The QSARome was also used to identify novel compounds with unique target-selective GPCR binding profiles. The QSARome panel is integrated within the Chembench webportal (chembench.mml.unc.edu) providing an effective in silico means to search for novel molecules with the desired GPCR polypharmacology.
10:05   Intermission.
10:15 4 Gaussian ensemble screening (GUESS): A new approach to polypharmacology and virtual screening
Violeta Isabel Perez Nueno, violeta.pereznueno@inria.fr, Vishwesh Venkatraman, Lazaros Mavridis, David W. Ritchie, Orpailleur Team, INRIA Nancy – Grand Est, Vandoeuvre-lès-Nancy, France
We previously introduced a spherical harmonic (SH) approach to compare the 3D shapes of ligands and target binding pockets [1][2][3]. Here, we present a novel extension of this approach to predict relationships between drug classes, which we call Gaussian Ensemble Screening (GUESS). This allows promiscuous ligands and targets to be predicted rapidly without requiring thousands of bootstrap comparisons as in current promiscuity prediction approaches [4]. When using GUESS to find relationships between drug classes in a subset of the MDDR, our approach detects interesting relationships between targets such as GABA A and tyrosine-specific protein kinase, and ACE and neutral endopeptidase, for example, whose dual inhibitors have both been confirmed experimentally [5][6]. Hence, GUESS is a useful way to study polypharmacology relationships, and could provide a novel approach for drug repositioning.
1. Using spherical harmonic surface property representations for ligand-based virtual screening. Pérez-Nueno, V. I. Venkatraman, V. Mavridis, L. Clark, T. Ritchie, D.W. (2011) Molecular Informatics 30, 151-159.
2. Using Consensus-Shape Clustering to Identify Promiscuous Ligands and Protein targets and to Choose the Right Query for Shape-Based Virtual Screening. Pérez-Nueno, V. I. Ritchie, D.W. (2011) J. Chem. Inf. Model. 51, 1233-1248.
3. Predicting drug polypharmacology using a novel surface property similarity-based approach. Pérez-Nueno, V. I. Venkatraman, V. Mavridis, L. Ritchie, D.W. (2011) Journal of Cheminformatics 3 (Suppl 1), O19.
4. Predicting new molecular targets for known drugs. Keiser, M. J. et al. (2009) Nature 462, 175-181.
5. Regulation of GABAA receptor by protein tyrosine kinases in frog pituitary melanotrophs. Castel H, Louiset E, Anouar Y, Le Foll F, Cazin L, Vaudry H. (2000) J Neuroendocrinol. 12, 41-52.
6. Dual ACE and Neutral Endopeptidase Inhibitors: Novel Therapy for Patients with Cardiovascular Disorders. Tabrizchi, Reza (2003) Drugs 63, 2185-2202.
10:35 5 Physical binding site modeling for quantitative prediction of biological activities
Rocco Varela, rocco.varela@ucsf.edu, Ajay N Jain, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94158, United States
Prediction of polypharmacology can be useful at the level of binary putative target identification. Additional utility derives from estimation of the potency of an off-target effect. The latter goal is challenging: in the surprising cases of off-target effects, the off-target ligands will be dissimilar to on-target ligands being designed. Many QSAR methods have utility in making predictions within highly related chemical series, but cannot generally be fruitfully applied for off-target activity quantification due to limited domains of applicability. The Surflex-QMOD approach has been demonstrated to produce accurate and scaffold-independent predictions of binding affinity by constructing an interpretable physical model of a binding site based only on the structures and activities of ligands. Results will be presented establishing that QMOD-derived models produce accurate predictions in cases ranging from limited data of heterogeneous scaffolds to ample data containing related scaffolds. The potential for quantifying off-target effect potencies will also be examined.
10:55 6 Virtual screening of multi-target agents from large chemical libraries by machine learning approach
Yu Zong Chen, phacyz@nus.edu.sg, Pharmacy, National University of Singapore, Singapore, Singapore
Selective multi-target agents have been increasingly explored for enhanced therapeutic efficacy. Because they are more sparsely distributed in the chemical space, more efficient methods are needed for searching them.
We have explored machine leaning methods for searching active compounds from large chemical libraries, Here we present our recent work in exploring machine learning methods for searching dual-target kinase inhibitors and serotonin reuptake inhibitors from large chemical libraries. The dual inhibitor yields, target selectivity, and false hit rates of our methods, trained on individual target inhibitors, are 25%-57% (majority >36%), 95%-99% (against inhibitors of other family members), and 0.007%-0.1% (against 13.56M-17M PubChem and168K MDDR compounds) and 0.0%-3% (against MDDR compounds similar to the dual-inhibitors. They outperformed Surflex-Dock, DOCK Blaster, kNN and PNN in searching 1.02M Zinc clean-leads or MDDR dataset.
Machine learning methods are potentially useful to complement conventional methods for facilitating multi-target drug lead discovery from large chemical libraries.
11:15 7 Structure-based identification of a dual FAAH/COXs inhibitor: Tackling inflammation with a single molecule acting synergistically on multiple proteins
Angelo D Favia, angelo.favia@iit.it, Andrea Cavalli, Marco De Vivo, Department of Drug Discovery and Development, Istituto Italiano di Tecnologia, Genoa, GE 16163, Italy
The modulation with a single compound of diverse proteins involved in a complex disease represents one of the frontiers of drug discovery programs.1, 2 Here, the rational structure-based identification of a dual-target hit that simultaneously inhibits Fatty Acid Amide Hydrolase (FAAH) and Cyclooxygenases (COXs) is reported. The dual hit, identified through an interdisciplinary drug discovery effort, active in the low υM range versus both FAAH and COXs represents a suitable starting point for the rational design of a novel drug with superior therapeutic profile. Progresses on the computer-assisted step-wise growth of the hit are here reported.
References
1. Morphy, R. Rankovic, Z., Fragments, network biology and designing multiple ligands. Drug Discov. Today 2007.
2. Bottegoni, G. Favia, A. D. Recanatini, M. Cavalli, A., The role of fragment-based and computational methods in polypharmacology. Drug Discov Today 2011.

Section B
San Diego Convention Center
Room 25C

Instructional Tools for Chemical Information
C. Huber, Organizer, Presiding
8:15   Introductory Remarks.
8:20 8 Embedding chemistry information literacy skills into the curriculum at James Madison University
Meris A Mandernach1, manderma@jmu.edu, Barbara A Reisner2, reisneba@jmu.edu. (1) Libraries & Educational Technologies, James Madison University, Harrisonburg, VA 22807, United States (2) Department of Chemistry and Biochemistry, James Madison University, Harrisonburg, VA 22807, United States
Historically, chemistry information literacy skills have been taught in an independent course at James Madison University. As part of program assessment, we discovered that students were not making significant gains during the course because much of the content was covered in earlier courses. In 2011, a chemistry librarian and chemistry faculty member revised the course content in order to map critical information literacy skills, as identified by both ACS and Special Libraries Association, into the core chemistry curriculum. Through the use of online tutorials and web guides, content was integrated into courses where students use the information. In this presentation, we will describe how these skills and content have been mapped into individual courses. We will detail the creation of online content and its delivery to several courses with learning management software. We will also discuss successes and roadblocks we have encountered and preliminary assessment data.
8:40 9 Resources for introducing crystal structure information into undergraduate teaching
Gary Battle, battle@ccdc.cam.ac.uk, Cambridge Crystallographic Data Centre, Cambridge, United Kingdom
Visualizations and analyses of crystallographically-determined 3D molecular structures can greatly enhance student learning, and are ideally suited to teaching fundamental chemistry concepts including stereochemistry, conformation, chirality and reaction mechanisms. In spite of this, crystallography has historically been poorly represented in University teaching curricula and at best is simply noted at high school level. As a consequence, there is sometimes a limited understanding of how to retrieve and interpret crystallographic information. The Cambridge Structural Database (CSD) serves as the worldwide repository of experimentally-measured 3D crystal structures. Built over 45 years and containing more than half-a-million structures the CSD is a vast and ever growing compendium of accurate 3D molecules and is finding increasing application in chemical education. This talk will focus on continuing efforts to broaden the use of crystallographic data, and in particular to make CSD structures accessible and comprehensible to students and teachers. We will showcase a teaching subset of more than 500 CSD structures created specifically to illustrate key chemical concepts, and a growing collection of teaching materials that make use of this subset in classroom and laboratory environments.
These educational tools are freely available on the web and their utility has recently been recognised by the Chemistry Division of the Special Libraries Association and ACS Division of Chemical Information in their publication Information Competencies for Chemistry Undergraduates: the elements of information literacy. 2nd ed. Sept 2011.
9:00 10 If they build it, will they use it: Using input from students in a chemical literature class in the redisgn of the library's chemistry webpages
Allan K Hovland1, akhovland@smcm.edu, Rob C Sloan2. (1) Department of Chemistry and Biochemistry, St. Mary's College of Maryland, St. Mary's City, MD 20686, United States (2) Library, St. Mary's College of Maryland, St. Mary's City, MD 20686, United States
The chemical information resources located on the college library's webpages were arranged in a less-than user-friendly fashion. In searching for models for an improved interface, the chemical information instructor (AKH), found several interesting websites and noticed the common denominator was the use of the Springshare software Libguides. Coincidently, the college had just gotten a license to use this software. In discussionswith the library's science liasion (RCS), it was decided to have the introduction to chemical literature class participate in building the new chemistry webpages. A first class assignment had the students go to websites of schools using the Libguide software and required them to identify features they liked and didn't like. An objective for this project is to have the students design pages that will be most useful to their fellow chemistry classmates. Our thought is that if they build it, they will use it.
9:20 11 Explore chemical information teaching resources (XCITR)
Guenter Grethe1, ggrethe@att.net, Grace Baysinger2, Rene Deplanque3, Gregor Fels4, Ira Fresen3, Andrea Twiss-Brooks5, Gregor Zimmermann3. (1) Consultant, Alameda, CA 94502, United States (2) Swain Chemistry and Chemical Engineering Library, Stanford University, Stanford, CA 94305, United States (3) FIZ Chemie Berlin, Berlin, Germany (4) Department of Chemistry, University of Paderborn, Paderborn, Germany (5) John Crerar Library, University of Chicago, Chicago, ILL 60637, United States
Several years ago, the Division of Chemical Information of the American Chemical Society and the Division of Computer-Information-Chemie of the German Chemical Society established a Collaborative Working Group to foster a transnational dialogue in order to develop a shared approach for the access, exchange and management of chemical information. Within the larger context of the overall approach, the working group developed XCITR, an international repository of chemical information educational material to be used by librarians and instructors in chemical information. XCITR makes full use of features in Web 2.0 technology and is meant to be a hub in which instructors at all levels can deposit and access important teaching materials. We will discuss the history and organization of XCITR, describe technical details and provide examples from this freely available source.
9:40 12 Engaging the wired generation
Jessica A Parr1, parr@usc.edu, Norah Xiao2. (1) Department of Chemistry, University of Southern California, Los Angeles, Ca 90089, United States (2) Science and Engineering Library, University of Southern California, Los Angeles, CA 90089, United States
Undergraduates are so used to being plugged in and having all desired information at their fingertips. How often has a first years lab report had Wikipedia as a reference? This is not a bad place to start, but not the best source for scientific information. An Information Literacy program has been developed to introduce new students to the scientific databases and the resources available to them in the Science and Engineering Library. With a combination of lecture from the Chemistry librarian and interactive activities, the students had a good time while learning the basics of gathering scientific information. An online tutorial has also been developed to help students, faculty and staff become acquainted with the information resources available to them. This talk will report the details and results of this program, as well as plans to expand it to a larger audience.
10:00   Intermission.
10:10 13 Chemical information instruction at ETH Zurich: Review and trends
Martin P. Braendle1, braendle@chem.ethz.ch, Engelbert Zass1, Lukas Korosec2, Peter A Limacher2, Hans P. Luethi2. (1) Chemistry Biology Pharmacy Information Center, ETH Zuerich, Zuerich, Switzerland (2) Laboratory of Physical Chemistry, ETH Zuerich, Zuerich, Switzerland
Given the tightly subject matter packed curricula, it is often difficult for librarians to obtain time for scientific information instruction. At ETH, we have developed a two-pillar strategy that tries to meet the students where they have information needs, and to provide tools offering improved access to sources.
We will review our approach that includes problem-oriented units integrated in lab courses (Bachelor course). They are complemented by supporting material for major databases on the web site and individual end-user support. We also investigated ways to improve how students assess information and report of a large study with second-semester students who rated the German Wikipedia and Roempp Online chemistry encyclopedia content with regards to chemical thermodynamics.br /> Because our instruction is focused on the most important sources, it is complemented with information services that support the user in locating appropriate sources, e.g. a recently introduced textbook portal connecting to our library navigator.
10:30 14 One-shot wonder: Integrating chemical information literacy throughout the curriculum
Linda M Galloway, galloway@syr.edu, Library, Syracuse University, Syracuse, NY 13244, United States
Integrating chemical information literacy into an undergraduate chemistry program via a series of guest librarian presentations is proposed as a viable alternative to a formal course. A recent large-scale analysis conducted by Hong Kong Baptist University demonstrated a statistically significant correlation between student performance and library instruction, but only if a certain minimum amount of instruction is provided. This finding, coupled with desired proficiencies articulated in Information Competencies for Chemistry Undergraduates by the Special Libraries Association, Chemistry Division, spurred the development of a set of “one-shot” instruction sessions tied to explicit information competencies. This skill-specific sequence of instruction, attached to defined classes, will enable chemistry students to develop a thorough understanding of chemical literature and how it fits into scholarly communication in the sciences. This paper will detail a plan to systematically integrate, at point-of-need, information literacy skills into the chemistry undergraduate curriculum.
10:50 15 Feedback and training examples from user communities using Elsevier's Reaxys
Christine Flemming, c.flemming@elsevier.com, Elsevier, New York, NY, United States
Securing the maximum institutional value for any chemistry resource is reliant on the successful introduction to, and training of, the entire user community on the appropriate use of the e-resource. This presentation will show some methods and examples developed based on user feedback and will highlight how professionals in academia use Reaxys as a teaching tool for chemistry. Included in this presentation will be examples of the use of social networks, online forums, and other current practices in developing and supporting user communities.
11:10 16 Teaching new graduate students: Chemical information as a research tool
Bonnie L. Fong1, bonnie.fong@rutgers.edu, Darren B Hansen2, dbhansen@rutgers.edu. (1) John Cotton Dana Library, Rutgers University, Newark, NJ 07102, United States (2) Department of Chemistry, Rutgers University, Newark, NJ 07102, United States
Chemistry majors do not always have the opportunity to learn information-seeking skills while they are undergraduates. However, as graduate students, they are expected to know how to find chemical information. This session will discuss a collaborative effort between a chemistry professor and a physical sciences librarian at Rutgers University to design a mini-course that helps the students achieve this goal. It will focus on course development, such as: identifying which resources to include (e.g., reference materials, databases), selecting hot topics to discuss (e.g., data management), crafting appropriate assignments, deciding on any supporting readings, etc. Attempts at encouraging collaborative learning, with the assistance of an online site, will also be addressed.
11:30 17 SpringerMaterials: The world's largest resource for chemical and physical properties in materials science
Mikail Shaikh, email.mikail@gmail.com, eProduct Management, Springer Science and Business Media, new york, New York 10013, United States
As part of “Springer Databases”, SpringerMaterials is currently one of the world's largest resources for materials properties in chemistry and physics, based off the famous Landolt-Bornstein book series. It contains over 100,000 documents about 250,000 compounds and 3000 compounds in a database format, ranging from nuclear and molecular data to multi-phase systems and advanced materials! With a collection of such magnitude, the ability to efficiently search and navigate becomes almost as important as the content, if not more. Springer Science is actively committed to interacting with researchers to aid the learning curve through sample searches, social media discussion, conference participation, newsletters, flash tutorials, webinars, contests, etc and with librarians through newsletters, surveys, statistics, etc. We also constantly strive to follow the cadence of science, and develop our content in sync. Find out more about how having a scientifically trained e-Product team encourages a three way conversation between the scientific, information and publishing communities!
11:50   Concluding remarks.
Drug Discovery Receptors Not Big-box Stores,
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Molecular Mechanics Electrostatics and Polarization. The New Black
Sponsored by COMP, Cosponsored by BIOL, CINF, MEDI, and PHYS

SUNDAY AFTERNOON

Section A
San Diego Convention Center
Room 27A

Drug Polypharmacology Prediction and Design Cosponsored by LIFE
S. Zhang, S. Ekins, Organizers, Presiding
1:30   Introductory Remarks.
1:35 18 Develop novel predictive polypharmacology models with high-quality data
Shuxing Zhang, shuzhang@mdanderson.org, Department of Experimental Therapeutics, MD Anderson Cancer Center, Houston, TX 77054, United States
Prediction of drug polypharmacology is of great importance. To this end, we embarked on the construction of high quality chemical and biological databases. Starting from 11,863 co-crystallized ligands, we curated through a rigorous workflow 643 multi-targeting ligands available in 2,131 high-resolution crystal structures with known binding affinities. These data were then employed to build polypharmacology models using our novel diffusion-based target prediction (DTaP) algorithm which is account for the whole multi-targeting profiles for each ligand. With these models, we demonstrated that at least one real target is ranked on the top 5 for 46 known ligands while at least two real targets are ranked on the top 5 for 31 ligands. Interestingly we also found that dasatinib is ranked on the top as an ACK1 inhibitor, and the following-up binding assay confirmed this discovery. These results demonstrated the predictive power of our polypharmacology modeling method trained with high quality data.
1:55 19 Predicting drug polypharmacology using secondary structure element information
Oliver Koch, oliver.koch@msd.de, BioChemInformatics, Intervet Innovation GmbH, Schwabenheim, GermanyMOLISA GmbH, Magdeburg, Germany
The protein interactions in protein-ligand binding and protein-protein interfaces can be regarded based on structural similarity of the secondary structure elements. The most prominent example is the protein fold of a protein domain that is more conserved than the amino acid sequence. Proteins with similar fold but dissimilar sequence and function can bind similar ligands and interact with similar proteins. The next level corresponds to the spatial arrangement of the secondary structure elements around the ligand binding site ("ligand-sensing cores") or in the protein interface ("interface-sensing surfaces"). These similarities in otherwise unrelated proteins can be useful in predicting drug polypharmacology. The successful applications in drug design using predicted polypharmacology in protein-ligand binding will be shown and the analogy in the design of protein-protein interface inhibitors and the potential of polypharmacology prediction will be discussed.
Reference: Koch, O. Future Med. Chem. 2011; 3(6): 699-708.
2:15 20 Assessing drug target association using semantic linked data
David Wild, djwild@indiana.edu, Bin Chen, School of Informatics and Computing, Indiana University, Bloomington, IN 47401, United States
The rapidly increasing amount of public data in chemistry and biology relating to drug discovery provides new opportunities for large-scale data mining for drug discovery. Systematic integration of these heterogeneous sets and provision of algorithms to data mine the integrated sets permits investigation of complex mechanisms of action of drugs. In this work we integrated and annotated data from public datasets relating to drugs, chemical compounds, protein targets, diseases, side effects and pathways, building a semantic linked network consisting of over 200,000 nodes and 1.5 million edges. We developed a statistical model to assess the association of drug target pairs based on their relation with other linked objects. Validation experiments demonstrate the model can identify direct drug target pairs with high precision (AUROC=0.92). Indirect drug target pairs (for example drugs which change gene expression level) are also identified but not as strongly as direct pairs. We further calculated the association scores for 174 drugs from 10 disease areas against 1683 human targets, and measured their similarity using a 174*1683 score matrix. The similarity network indicates that drugs from the same disease area tend to cluster together in ways that are not captured by structural similarity, with several potential new drug pairings being identified. This work thus provides a novel, validated alternative to existing drug target prediction algorithms.
2:35   Intermission.
2:45 21 3D Pharmacophore-based activity profiling for multitarget screening
Gerhard Wolber1, gerhard.wolber@fu-berlin.de, Fabian Bendix2, Goekhan Ibis2, Thomas Seidel2. (1) Department of Pharmacy, Freie Universitaet Berlin, Berlin, Germany (2) Inte:Ligand GmbH, Vienna, Austria
With the improved performance of computer hardware, virtual screening methods tend at aiming to increase throughput and not at improving prediction quality. While high restrictivity is suitable for single-target screening campaigns, sensitivity becomes more important in case of activity profiling where the virtual screening process is reversed, i.e. a small number of molecules is screened versus a big number of targets. Current enrichment metrics are not suitable for measuring prediction performance any more and need to be investigated in a more differentiated way. We discuss the challenge of multi-target screening using 3D pharmacophore based activity profiling and suggest methods, protocols and visualization techniques to use activity prediction model collections for virtual screening against multiple targets.
3:05 22 Where have all the good drugs gone?
Gisbert Schneider, gisbert.schneider@pharma.ethz.ch, Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland, Zurich, Switzerland HCI H411, Switzerland
It has been realized that 'druglike' compounds often bind to multiple macromolecular targets. Consequently, predicting target profiles of hit and lead structure candidates has become both a challenge and an opportunity for computer-assisted drug design. No longer is ligand interaction with an individual target the sole objective in molecular design, but multi-dimensional functions guide in silico compound assembly and selection. We will present and discuss the concept of 'adaptive' fitness landscapes and their potential for drug discovery. The computational framework of this approach is based on data projection and Gaussian density estimation, which results in a probabilistic multi-dimensional 'SAR landscape' as a visual aid for compound prioritization. A related method employs self-organizing map representations of chemical space, which were successfully used for a selection of compounds exhibiting desired activity. Target profile prediction represents a consequent next step towards fully automated de novo drug design that satisfies multiple objectives in parallel.

[1] Reutlinger, M. Guba, W. Martin, R. E. Alanine, A. I. Hoffmann, T. Klenner, A. Hiss, J. A. Schneider, P. Schneider, G. Neighborhood-preserving visualization of adaptive structure-activity landscapes and application to drug discovery. Angew. Chem. Int. Ed. 2011 , doi: 10.1002/anie.201105156.
[2] Schneider, G. Geppert, T. Hartenfeller, M. Reisen, F. Klenner, A. Reutlinger, M. Hähnke, V. Hiss, J. A. Zettl, H. Keppner, S. Spänkuch, S. Schneider, P. Reaction-driven de novo design, synthesis and testing of potential type II kinase inhibitors. Future Med. Chem. 2011 , 3, 415-424.
[3] Schneider, P. Stutz, K. Kasper, L. Haller, S. Reutlinger, M. Reisen, F. Geppert, T. Schneider, G. Target profile prediction for a Biginelli-type dihydropyrimidine compound library and practical evaluation. Pharmaceuticals 2011 , 4, 1236-1247.
[4] Schneider, G. Tanrikulu, Y. Schneider, P. Self-organizing molecular fingerprints: a ligand-based view on drug-like chemical space and off-target prediction. Future Med. Chem. 2009 , 1, 213-28.

3:25 23 Finding promiscuous old drugs for new uses
Sean Ekins1,2, ekinssean@yahoo.com, Antony J Williams3. (1) Collaborations in Chemistry, Fuquay Varina, NC 27526, United States (2) Collaborative Drug Discovery, Burlingame, CA 94010, United States (3) Royal Society of Chemistry, Wake Forest, NC 27587, United States
In the last 6 years high-throughput screening has been used to identify FDA approved drugs that are active against multiple targets (also termed promiscuity). We have identified 34 studies that have screened libraries of FDA approved drugs against various whole cell or target assays. Each study has identified one or more compound with a new bioactivity that had not been previously described. Thirteen of these drugs were active against more than one additional disease, thereby suggesting a degree of promiscuity. The 109 molecules identified by screening in vitro were statistically more hydrophobic than orphan designated products with at least one marketing approval for a common disease indication or one marketing approval for a rare disease (FDA rare disease research database). We have created a database of in vitro data on old drugs for new uses that could be applied for repositioning these or other molecules for neglected and rare diseases.
3:45   Concluding remarks.

Section A
San Diego Convention Center
Room 27A

CINFlash
R. Guha, Organizer, Presiding
4:00   Discussion.

Section B
San Diego Convention Center
Room 25C

Instructional Tools for Chemical Information
C. Huber, Organizer, Presiding
1:30 24 Use of course reserves as a gentle introduction to the chemical literature
Donna T. Wrublewski, dtwrublewski@ufl.edu, George A. Smathers Libraries, University of Florida, Gainesville, Florida 32611, United States
For most upper level chemistry undergraduates, the Physical/Biophysical chemistry laboratory course is their first required exposure to reading and citing chemical literature, specifically in regards to preparing experimental reports. A basic library literacy lecture is scheduled early on in the semester, designed to assist them with these tasks. Working with the rotating team of instructors who teach these courses, all references in the laboratory manuals were checked for accuracy, updated where needed, and verified to be accessible through online subscriptions or the course reserves system. Specific course reserves system instruction was included in the literacy lecture as a way to guide students to the recommended references and familiarize them with accessing library resources. This ease of access, in conjunction with targeted in-class instruction, should lead to (1) increased library usage and (2) improved quality of students' reports.
1:50 25 Faculty-librarian collaboration yields innovative chemistry seminar program
Valerie K. Tucci1, vtucci@tcnj.edu, Benny Chan2, chan@tcnj.edu, Lynn Bradley2, Stephanie Sen2, Abby R. O'Connor2. (1) Library, The College of New Jersey, Ewing, New Jersey 08628-0718, United States (2) Chemistry, The College of New Jersey, Ewing, New Jersey 08628-0718, United States
Faculty-librarian collaboration at The College of New Jersey has created an innovative Chemistry Seminar Program which enhances the real-world skills of undergraduate chemistry majors and raises the students' awareness of the value of a commitment to lifelong professional and personal enrichment. The seminar program consists of three interrelated segments: Advising, Chemical Information Literacy and Good Laboratory Practice (GLP). Advising begins as prescriptive advising and grows into developmental advising with emphasis on resume writing, interviewing skills and career options. Chemical information literacy consists of formalized instructional sessions followed by three assessments. The first assessment evaluates student knowledge of basic library skills, the second assessment concentrates on evaluating SciFinder searching capabilities and the third assessment features patent searching using the USPTO database and SciFinder. The CAS Learning Solutions tutorials are integrated into the instructional sessions along with online practice. GLP begins with teaching fundamental principles and often culminates in supporting faculty-student research.
2:10 26 Blind assessment: The unexpected benefits of peer review in a classroom setting
Judith N. Currano, currano@pobox.upenn.edu, Chemistry Library, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6323, United States
Peer review is an important part of the scholarly communication process but one in which students receive relatively little formal training. Students in the University of Pennsylvania's graduate-level chemical information course are required to prepare a guide to the literature on a subject of their choice, with the goal of teaching someone to search Penn's resources. The addition of double-blind peer review to the project, in which students reviewed one another's term projects as though they were scientific reviewers for a journal, gave the students practice assessing one another's work. The assignment had the unexpected result of improving the overall quality of the term projects.
2:30 27 Faculty-librarian partnership for a student research presentation in a physical chemistry laboratory course
Donna T Wrublewski1, dtwrublewski@ufl.edu, Mine G Ucak-Astarlioglu2. (1) George A. Smathers Libraries, University of Florida, Gainesville, Florida 32611, United States (2) Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
For most upper level chemistry undergraduates, the Physical/Biophysical chemistry laboratory course is their first experience searching and citing chemical literature. A basic library literacy lecture is given early in the semester. At the invitation of one of the class instructors, the chemistry librarian participated in the planning and evaluation of a new library component for the Fall 2011 semester. Currently students are asked to prepare a talk on a paper from the chemical literature; the new component requires students to give a second talk to explain paper selection, describe search strategies, and evaluate additional subject competency gained. Based on the second talk, expected student outcomes are (1) increased chemical literature literacy and general subject knowledge; (2) improved abilities using search engines and correct keywords; (3) improved library course content from developing and adapting more efficient search techniques; (4) improved education quality and student empowerment from fostering closer chemistry department/library collaborations.
2:50   Intermission.
3:00 28 CAS learning solutions: Training at the point of need
Jayne A. Knoop, jknoop@cas.org, Department of Learning and Support Solutions, CAS, Columbus, OH 43204, United States
As the world's authority for chemical information, CAS analyzes and organizes complex content for an international customer base. SciFinder is the chemistry research tool of choice for students at more than 1,800 academic institutions and research scientists at government and commercial organizations worldwide. Patent experts rely on the advanced search and analysis capabilities of STN. To most effectively serve the needs of its diverse customers, CAS offers a variety of training options, including in-product Help, e-learning tutorials, webinars, virtual classes, workshops, patent forums, and custom private training. Session attendees will discover the substantial value that CAS training adds at the point of need for SciFinder and STN customers.
3:20 29 Chemistry Referece Resolver: A tool to simplify reference retrieval
Oleksandr Zhurakovskyi, oleksandr.zhurakovskyi@chem.ox.ac.uk, Department of Chemistry, University of Oxford, Oxford, United Kingdom
High quality research requires thorough literature screening and numerous databases are available to ease this process. However, chemists regularly need to find a paper manually from its reference details. We have developed Chemistry Reference Resolver (http://chemsearch.kovsky.net) as a tool to facilitate this process. This tool accepts, as input, a reference in a number of styles (ACS, NPG style, DOI, etc) and redirects the user directly to the corresponding online abstract/PDF link. As currently configured, the Resolver has a number of plugins available for major browsers.
3:40 30 Using LibGuides to enhance large-enrollment chemistry lab courses
Jeremy R Garritano, jgarrita@purdue.edu, Purdue University Libraries, Purdue University, West Lafayette, IN 47906, United States
At a large research university, teaching chemical information literacy skills to the thousands of students in chemistry lab courses can be difficult. In the fall of 2011, using LibGuides, the M.G. Mellon Library of Chemistry at Purdue University created a site to support students in first- and second-year chemistry labs and to help with knowledge retention between the two years and beyond. Gathering feedback from faculty, teaching assistants, advisors, and others within the Chemistry Department, the site is a collaborative effort designed to bring together information on writing, communication, and information seeking skills related to the chemistry lab courses. This site helps to supplement sections of the lab manual and presents a single, stable resource for the different versions of first- and second-year chemistry lab courses taught at the University. Implementation, usage statistics and user feedback will be shared.
4:00 31 Learning about cheminformatics through an education wiki
Martin A Walker1, walkerma@potsdam.edu, Aileen E Day2, Antony J Williams2, Lorna M Thomson2. (1) Department of Chemistry, State University of New York at Potsdam, Potsdam, New York 13676, United States (2) Royal Society of Chemistry, Cambridge, United Kingdom
The Royal Society of Chemistry (RSC) recently unveiled a new chemical education wiki, called RSC LearnChemistry:Share. This site is designed to bring the features of RSC ChemSpider into an education site, as well as to create a workspace where educators can share their resources (quizzes, lab experiments, tutorials, etc.).
The wiki is built around a cheminformatics platform, and educators will naturally be exposed to basic cheminformatics concepts as they use the site and contribute content. For example, InChIs can be generated in the site and entered by teachers, to define answers to their quiz questions. Substance pages are found by InChIKey-directed structure searches. These and other pages can display “live” ChemSpider data, including spectra. As a community of educators develops on the site, these active contributors will learn to use cheminformatics tools in the best way – by using them to educate their students.
4:20 32

Bringing faculty, students and librarians together: Lessons and opportunities for ACS on campus after two years
S. Sara Rouhi, s_rouhi@acs.org, Library Relations, ACS Publications, Washington,, DC 20009, United States
ACS on Campus, the Publications Divisions' campus outreach program is now almost two years old. The curriculum was developed by ACS Pubilcations Library Relations to reintroduce faculty and students to their libraries' resources. As the program enters its third year questions arise: Is ACS on Campus meeting the needs of the librarian community? To what extent will librarians continue to embrace a program that requires outreach work on their part? Does the lack of chatter about the program in the librarian community indicate weaknesses within the program that need to be addressed?

Rouhi

This presentation provides an overview of the program including feedback from librarians, students, and faculty. It will examine potential weaknesses within the program and opportunities for improvement in light of expansion opportunities. The presenter will be looking for direct feedback from session participants about the relevance of the program in light of their current situations at their institutions.

Chemical Networks in Biology
Sponsored by LIFE, Cosponsored by BIOL, BIOT, CINF, and MEDI
Collaborative Drug Discovery for Neglected Diseases
Sponsored by COMP, Cosponsored by BIOL, BIOT, CINF, MEDI, TOXI, and YCC
Drug Discovery Target-based is Sooooo Cool
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI

SUNDAY EVENING

Section A
San Diego Convention Center,
Hall D

CINF Scholarship for Scientific Excellence
G. Grethe, Organizer, Presiding
6:30 - 8:30
  33 Use of screening results to validate a diversity subset of an HTS library
Rohan S Patil1,2, rspatil@umail.iu.edu, Maureen Beresini1, Nicholas Skelton3. (1) Department of Biochemical and Cellular Pharmacology, Genentech, South San Francisco, California 94080, United States (2) Department of Chemical Informatics, Indiana University, Bloomington, Indiana 47408, United States (3) Department of Computational Chemistry and Cheminformatics, Genentech, South San Francisco, California 94080, United States
High-throughput screening (HTS) is an important element in the discovery of small molecule leads. We have routinely screened the full Genentech library with over 1 million compounds. It is not always possible or desirable to screen the full library. Screening a portion of the library may speed the screening and streamline the hit triage process. Additionally, there may be cases in which reagent limitations or lower assay throughput warrant a smaller compound set. Consequently, an effort was undertaken to select five diverse subsets of the entire library, then compare the performance of each of them with that of others as well as to that of the entire library. Finally, we identify the diversity compound set that could be used in a subset screen in place of a whole library screen.
  34 Development of a screening informatics system at the UNM Center for Molecular Discovery, an NIH MLP specialty center
Jeremy J Yang1,2,3, jjyang@salud.unm.edu, Oleg Ursu1,2, Stephen L Mathias1,2, Cristian G Bologa1,2, Anna Waller2, Annette M Evangelisti2, Gergely Zahoransky-Kohalmi1,2, Tudor I Oprea1,2. (1) Department of Biochemistry and Molecular Biology, University of New Mexico, Albuquerque, NM 87131, United States (2) Center for Molecular Discovery, University of New Mexico, Albuquerque, NM 87131, United States (3) School of Informatics & Computing, Indiana University, Bloomington, IN 47405, United States
Robotic, automated, molecular screening against biological targets is not new, nor is the need for supporting informatics systems. However, continued advances in contributing technologies, plus increased expectations, pose continuing challenges for informatics systems developers. These advances include: (1) New methodology, such as high-content and multiplex bioassays, (2) More relevant public data, and (3) New privacy and collaboration models, and (4) Advances in cheminformatics and bioinformatics methodology. In this poster we present the screening informatics system developed at the University of New Mexico Center for Molecular Discovery, which combines industry-standard commercial with open-source software components, and custom code developed at UNM. The choices made as to components and overall design were rationally and pragmatically driven, with time and resource constraints combining with technical objectives, resulting in a novel, hybrid solution. A functioning system was required continuously, for ongoing projects, necessitating an evolutionary approach. These challenges are both difficult and typical for advancing informatics systems at productive organizations, where upgrading infrastructure takes place in a contex of operational imperatives.
  35 Thermodynamical properties of small Pd clusters on the stoichiometric and defective TiO2 (110) surfaces studied with first-principle methods
Jin Zhang, jinzhang@chem.ucla.edu, Anastassia Alexandrova, Department of Chemistry and Biochemistry, UCLA, Los Angeles, California 90095, United States
Using first-principle methods, we studied the adsorption properties of the sub-nano Pd clusters on the stoichiometric and the defective titania surfaces. In particular, we mapped the potential energy surfaces (PES) of a Pd atom on three types of titania surfaces. With the data obtained from these calculations, we constructed a square lattice model describing the movement of Pd monomers with the Monte Carlo method to simulate the cluster growth and sintering processes at various temperatures. We found that on the stoichiometric surface or surface with Ti-interstitial atom, the Pd monomers tend to sinter into larger clusters, whereas the Pd dimer, trimer and tetramer appear to be relatively stable below 600 K. This result agrees with the standard sintering model of transition metal clusters and experimental observations.
  36 Development of a hybrid method combining quantum mechanical calculations and discrete molecular dynamics for metallo-protein modeling
Manuel Sparta, sparta@chem.ucla.edu, Anastassia N Alexandrova, Department of Chemistry and Biochemistry, UCLA, LOS ANGELES, CALIFORNIA 90095-1569, United States
Natural metallo-enzymes are known for their outstanding catalytic dexterity. The quality modeling of metallo-enzymes is highly desirable, both to understand the mechanism of their proficiency, and to eventually design artificial metallo-enzymes. However, the challenge of this modeling is great, because it is needed to account for both the cooperative dynamic moves of the protein, and electronic structure of the metal. We report a new method that combines an extensive statistical mechanical sampling of the protein, achieved with Discrete Molecular Dynamic (DMD), and the quantum mechanical (QM) description of the active site, QM/DMD. The method is fast and robust. Testing of QM/DMD at several levels was done on Rubredoxin. QM/DMD successfully predicts the structures of the Fe(II) and Fe(III) forms of Rubredoxin, and their mutants, properly describes the unusual weak H-bonds between SCys and aliphatic C-H groups near the active site, and captures the response of the RedOx potential to mutations.
  37 Learning to predict more chemical reactions: Model extensions and an expanded training set
Matthew A Kayala, mkayala@ics.uci.edu, Pierre Baldi, Department of Computer Science, University of California, Irvine, Irvine, California 92627, United States
In previous work, we introduced a machine learning approach to predict productive mechanistic reactions. The representations and methods presented allowed for practical mechanistic reaction prediction over the set of polar organic reaction. Here we describe several improvements to our previous approach. First, we describe extensions to the orbital interaction representation to cover pericyclic, radical, and stereospecific reactions. Next we describe a new larger and more diverse training set of chemical reactions, derived from extending the use of the Reaction Explorer expert system along with manual curation from literature and graduate level texts. Then, we show how our general machine learning approach, with the new representations and expanded dataset, exhibits excellent performance results. Finally, a multi-step pathway prediction application is made available.
  38 COBRA: Computational brewing approach to predicting the molecular composition of organic aerosols
David R Fooshee1, dfooshee@uci.edu, Tran B Nguyen2, Sergey A Nizkorodov2, Julia Laskin3, Alex Laskin4, Pierre Baldi1. (1) Department of Computer Science, University of California, Irvine, Irvine, CA 92697, United States (2) Department of Chemistry, University of California, Irvine, Irvine, CA 92697, United States (3) Chemical and Materials Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States (4) Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
We introduce a novel Computational Brewing Approach (COBRA) to model oligomerization chemistry stemming from repetitive condensation and addition reactions of monomers in isoprene photooxidation organic aerosols. COBRA takes two sets of data as input: a list of the initial chemical structures making up the starting molecular pool, and a list of reaction rules defining potential chemical transformations within the system. The reactions are propagated through several iterations, with products of previous iterations serving as reactants for the next one. A set of four reactions including esterification, aldol condensation, and hemiacetal formation, along with 27 seed molecules, were used to predict products of oligomerization in isoprene photooxidation secondary organic aerosol (SOA). The simulation generated thousands of unique structures in the mass range of 120-500 Da, and correctly predicted greater than 70% of the peaks observed by high-resolution mass spectrometry (HR-MS) of isoprene SOA. Selected structures predicted by the simulation were confirmed with tandem mass spectrometry (MSn). The model aids in structure elucidation from tandem mass spectrometry by offering up to a 100-fold reduction in the number of possible isomers for a given molecular formula. COBRA is not limited to atmospheric aerosol chemistry; it can also be applied to the prediction of reaction products in other environmental complex mixtures for which reasonable reaction mechanisms and seed molecules can be supplied by experimental or theoretical methods.
  39 High-throughput 3D structure prediction of small molecules
Peter Sadowski1,2, peter.j.sadowski@uci.edu, Arlo Randall1,2, Pierre Baldi1,2. (1) Department of Computer Science, University of California Irvine, Irvine, CA 92697, United States (2) Institute for Genomics and Bioinformatics, Irvine, CA 92697, United States
Although databases such as PubChem contain 3D structures for millions of molecules, there is a much larger space of small drug-like molecules that researchers wish to explore. Next-generation drug discovery projects require a high-throughput way of predicting accurate 3D structures of virtual molecules. State-of-the-art density functional theory (DFT) methods are accurate but slow, while lower levels of theory such as molecular mechanics models are insufficient for describing the complex bonding of organometallic molecules. A recent system named COSMOS has demonstrated its ability to quickly predict 3D structures for millions or billions of virtual molecules given a large library of precomputed rigid fragments. Here we present an extensible library of 100,000 unique, highly accurate fragment structures that we have produced from isomeric SMILES codes, using a combination of pattern matching, molecular mechanics, and DFT.
  40 Predicting inactive and active conformations of the dopamine D2 receptor
Fan Liu, liufan.brooks@gmail.com, Ravinder Abrol, Dennis A Dougherty, William A Goddard III, Chemistry, California Institute of Technology, Pasadena, California 91125, United States
G-protein coupled receptors (GPCRs) achieve their functional versatility by adopting various structural conformations defined by different orientations of their characteristic seven transmembrane helices. We predicted inactive and active ensembles of conformations for the dopamine D2 receptor using GEnsemble, which efficiently samples trillions of possible conformations based on different helix orientations. Using the predicted conformational ensembles for inactive and active receptor states, we predicted the binding sites for dopamine, which provides insights into potential receptor activation mechanisms. The binding sites suggest mutagenesis experiments involving the D2 receptor that will provide validation to the binding sites and the activation mechanism proposed based on those sites.
  41 Impact of retractions on the chemical literature
Elsa Alvaro, ealvaro@indiana.edu, School of Library and Information Science, Indiana University, Bloomington, Indiana 47405, United StatesChemistry Library, Indiana University, Bloomington, Indiana 47405, United States
Article retractions have recently attracted a lot of scholarly and popular interest. While most of the studies are focused on the biomedical literature, an analysis of the extent, impact, and causes of retractions in the chemical literature is still missing.
In this work, we report a longitudinal study of retractions across chemistry journals starting from 1990. We have carried out statistical analysis on data collected from retraction notices, including reasons for retraction, agents, and rate, and studied potential correlations with parameters such as the impact factor of the journal. We have also performed bibliometric studies and applied network theory principles to understand the impact and propagation of invalid research in the chemical literature. The results of this work show that while some of the findings are consistent with those reported in other fields, others appear to be distinct of the chemical literature.
  42 Cheminfomatic modeling of human CC chemokine receptorome
Terry-Elinor Reid1, yasmanii@hotmail.com, Huzefa Rangwala2, Samantha McCullough1, Muhammad Habib1, Simon Wang1. (1) Pharmaceutical Sciences, Howard University, United States (2) George Mason University, United States
CC chemokine receptors (CCRs) represent one subfamily of chemokine receptors. Among them, CCR2 is implicated in the inflammatory responses while CCR5 acts as the primary co-receptor by which HIV infects human T cells. Thus CCRs represent important targets for modern drug discovery. To study the complex binding profiles of CCRs we have collected data sets of structurally diverse molecules with known affinities for the whole human CC chemokine receptorome. They had been rigorously curated prior to the molecular descriptors calculation. Externally predictive cheminformatic models were developed using multiple algorithms and further validated by five-fold external validation. We also employ advanced principles such as semi-supervised learning and multi-task learning in order to capture the inter-targets (CCRs subtypes) information. The multitude of predictive models at the receptorome scale provide valuable tools for virtual screening of chemical libraries to identify structurally novel ligands as well as to address the complex selectivity issues.
  43 Evolutionary computational modeling of β-diketo acids for virtual screening of HIV-1 integrase inhibitors
Gene M Ko1, gko@sciences.sdsu.edu, A. Srinivas Reddy2, Rajni Garg1, Sunil Kumar3, Ahmad R Hadaegh4. (1) Computational Science Research Center, San Diego State University, San Diego, CA 92182-1245, United States (2) La Jolla Bioengineering Institute, San Diego, CA 92121, United States (3) Department of Electrical and Computer Engineering, San Diego State University, San Diego, CA 92182-1309, United States (4) Department of Computer Science and Information Systems, California State University San Marcos, San Marcos, CA 92096-0001, United States
We have used a differential evolution-binary particle swarm optimization (DE-BPSO) feature selection method to develop a QSAR model for 91 structurally diverse β-diketo acids which are potent HIV-1 integrase inhibitors. DE-BPSO is a novel feature selection method which requires fewer generations than BPSO to select a good small subset of descriptors. These descriptors are then used for developing QSAR models. 387 constitutional, geometrical, topological, electrostatic, and quantum-chemical descriptors were computed for each of the 91 structures and QSAR models were developed. The top ranked model satisfying predictive statistical constraints (r2 > 0.6, r2validation > 0.5, r2test > 0.5) was considered for analysis of the physiochemical features of β-diketo acids conducive for inhibition of HIV-1 integrase. The model suggests that molecular volume of the chemical compounds plays a dominant role in the inhibition of HIV-1 integrase. We also used this model successfully as a virtual screening tool to predict the biological activities closer to their experimental values of 18 2-pyrrolinone derivatives and 32 rhodanine containing compounds. This model can be used to identify novel compounds which may have similar structural properties as β-diketo acids with inhibitory effects towards HIV-1 integrase. We believe that DE-BPSO is a novel feature selection method for QSAR model development of other chemical compounds.
  44 Molecular dynamics of the Hsp70 chaperone in response to nucleotide and substrate: A coarse-grained perspective
Ewa I. Golas1,2, ewa.golas.chem@gmail.com, Gia G. Maisuradze2, Patrick Senet2,3, Stanislaw Oldziej2,4, Cezary Czaplewski1,2, Harold A. Scheraga2, Adam Liwo1,2. (1) Department of Chemistry, University of Gdansk, Gdansk, Poland (2) Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, United States (3) Laboratoire Interdisciplinaire Carnot de Bourgogne, Universite de Bourgogne, Dijon Cedex, France (4) Laboratory of Biopolymer Structure, Intercollegiate Faculty of Biotechnology, University of Gdansk, Gdansk, Poland
The 70kDa heat-shock (Hsp70) proteins form a class of chaperones recognized for their diverse and essential roles in the domain of protein repair, folding assistance, and agglomerate prevention. The focus of the present work was to determine and characterize the motion of the bacterial Hsp70 from Escherichia Coli via canonical molecular dynamics simulation. The UNRES forcefield was used to model the whole chaperone, with the implicit placement of nucleotide in the nucleotide binding domain (NBD), and the explicit introduction of a guest peptide in the substrate binding domain (SBD). The definition of an 'implicit' nucleotide was achieved though the application of harmonic restraints on the NBD. The characterization of the observed motions included an analysis of internal angles, distances, and inter-domain interactions. A comparison of the behavior of the chaperone with previous simulations preformed without substrate (earlier work) was also realized. Two systems with variant guest peptide were independently studied.
  45 Searching putative targets in silico for anti–prion compounds
Jorge Valencia1, lip09jmv@sheffield.ac.uk, Beining Chen2, Val Gillet1. (1) Information School, The University of Sheffield, Sheffield, South Yorkshire S1 4DP, United Kingdom (2) Chemistry, The University of Sheffield, Sheffield, South Yorkshire S3 7HF, United Kingdom
Transmissible Spongiform Encephalopathies are fatal neurological disorders caused by a proteinaceous infectious particle (PrPSC). The PrPSC transforms the normal prion isoform (PrPC) to the infective conformation by a mechanism which remains unknown. In previous work we identified a set of anti-prion compounds with EC50 in the range 1–10 nM in a mouse cellular model, however, the target(s) of these compounds is unknown. In this project, we describe an in silico protocol for target prediction using inverse docking. A set of 333 differentially expressed genes involved in the transformation and identified using microarray analysis were collected from the literature; 168 corresponding structures were downloaded from the PDB; and a diverse set of five anti-prion compounds were docked to the proteins using GOLD. From the results, we have identified a set of putative targets shared by the compounds. Next, we aim to corroborate the results through a proteomics analysis.
  46 On the accuracy of chemical structures found on the internet
Andrew D. Fant1, andrew.fant@unc.edu, Eugene Muratov1, Denis Fourches1, Antony J. Williams2, Alexander Tropsha1. (1) Division of Chemical Biology and Medicinal Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7363, United States (2) Royal Society of Chemistry, Wake Forest, NC 27587, United States
The Internet has been widely lauded as a great equalizer of information access. However, the absence of any central authority on content places the burden on the end-user to verify the quality of the information accessed. We have examined the accuracy of the chemical structures of ca. 150 major pharmaceutical products that can be found on the internet. We have demonstrated that while erroneous structures are commonplace, it is possible to determine the correct structures by utilizing a carefully defined structure validation workflow. In addition, we and others have shown that the use of un-curated structures affects the accuracy of cheminformatics investigations such as QSAR modeling. Furthermore, models built for carefully curated datasets can be used to correct erroneously reported biological data. We posit that chemical datasets must be carefully curated prior to any cheminformatics investigations. We summarize best practices developed in our groups for data curation.
  47 Spectral clustering of chemical data: A Lanczos-based approach
Sonny Gan1, liq09sg@shef.ac.uk, Valerie J Gillet1, Eleanor J Gardiner1, David A Cosgrove2. (1) Information School, University Of Sheffield, Sheffield, United Kingdom (2) AstraZeneca, Alderly Park, United Kingdom
The application of traditional clustering algorithms to partition chemical datasets is well established. Recently, clustering methods which partition a dataset based upon the eigenvectors of an input matrix have gained considerable attention in computer vision, providing excellent results for a variety of tasks. Despite this, their application to chemical data has been limited. A non-overlapping spectral clustering approach (L-NOSC), which utilizes a modified Lanczos algorithm to identify the eigenpairs of a matrix, has been developed. This L-NOSC affords considerable computational advantages when compared with other spectral clustering methods that rely on a full matrix diagonalization procedure. The ability of the L-NOSC algorithm to cluster several activity datasets, described by five different descriptors, has been evaluated using the Quality Clustering Index. Finally, the performance of the algorithm has been compared to both the leading traditional clustering methods and a spectral clustering algorithm which uses a full matrix diagonalization, with promising results.
  48 Structure based pharmacophore screening for new P-gp inhibitors
Freya Klepsch1, freya.klepsch@univie.ac.at, Katharina Prokes1, Zahida Parveen2, Peter Chiba2, Gerhard F Ecker1. (1) Department of Medicinal Chemistry, University of Vienna, Vienna, Vienna 1090, Austria (2) Institute of Medical Chemistry, Medical University of Vienna, Vienna, Vienna 1090, Austria
Overexpression of the xenotoxin transporter P-glycoprotein (P-gp) is one major reason for the development of multidrug resistance (MDR) leading to the failure of antibiotic and cancer therapies. Inhibitors of P-gp have thus been advocated as promising candidates for overcoming the problem of MDR.
By applying an exhaustive docking protocol that implied SAR information into the pose selection process, a validated binding hypothesis for propafenone analogs in P-gp could be determined. The docking complex was further used for the generation of a structure based pharmacophore model that comprised important interaction points and exclusion volumes. The model was validated by our in house data set as well as by spiked DUD sets. Furthermore, screening the Life Chemicals database retrieved a number of hits that were tested experimentally for their P-gp inhibiting activity. Among those, four compounds showing new chemical scaffolds were found to be active in the µmol range.

MONDAY MORNING

Section A
San Diego Convention Center
Room 27A

Computer-Aided Drug Design: Hopes, Reality and Prospects How Has Computational Chemistry Transformed Drug Discovery, and What Can Increase its Impact Cosponsored by COMP
C. Corbeil, J. Cross, Organizers
O. Ravitz, Organizer, Presiding
8:15   Introductory Remarks.
8:20

49

Perspective in computational approaches applied to drug discovery problems
Christine Humblet, christine.humblet@comcast.net, Discovery Chemistry Research & Technologies, Lilly Research Laboratories, Indianapolis, IN 46285, United States
Computational chemistry and cheminformatics have seen considerable progress over the past decades. This presentation will highlight progress seen over time, illustrate current best practices, and build a forward-looking perspective based on a practitioner's point of view.
9:05 50 Rational, data-driven approach to lead optimization
Dan J Warner, Dan.Warner@astrazeneca.com, R&D Montreal, AstraZeneca, St. Laurent, QC H4S 1Z9, Canada
At the very heart of the role of a medicinal chemist or drug designer is the ability to link chemical structure to molecular properties. The traditional approach to deriving these structure-property relationships (SPR) has been to encode known compounds in the form of molecular 'descriptors' and link them to experimentally determined properties in a quantitative fashion. At AstraZeneca, we have been at the forefront of the emerging field of matched molecular pairs analysis (MMPA) or inverse-QSAR, with a number of recent publications in the area1,2. As the name implies, this turns the traditional approach of investigating these relationships on its head in the identification of changes in structure that correspond to a desirable changes in properties3. The presentation will attempt to summarize the current state-of-the-art with respect to the literature, AstraZeneca's in-house system for MMPA, and how we can use it to expedite the multi-objective optimization of chemical leads.
1. Griffen, E. Leach, A. G. Robb, G. R. Warner, D. J. Matched Molecular Pairs as a Medicinal Chemistry Tool. J. Med. Chem. Perspectives. 2011, A.S.A.P.
2. Warner, D. J. Griffen, E. J. St-Gallay, S. A. WizePairZ: A Novel Algorithm to Identify, Encode, and Exploit Matched Molecular Pairs with Unspecified Cores in Medicinal Chemistry. J. Chem. Inf. Model. 2010, 50, 1350-1357.
3. Leach, A. G. Jones, H. D. Cosgrove, D. A. Kenny, P. W. Ruston, L. MacFaul, P. Wood, J. M. Colclough, N. Law, B. Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure. J. Med. Chem. 2006, 49, 6672-6682.
9:50   Intermission.
10:05 51 WOMBAT and WOMBAT-PK: Ten years
Tudor I Oprea, tudor@sunsetmolecular.com, Sunset Molecular Discovery LLC, Santa Fe, NM 87505, United States
In 2001, the first version of WOMBAT was assembled (over 20,000 entries). This database, now with over 300,000 entries, indexes medicinal chemistry literature. WOMBAT is used for predictive chemical biology efforts. Some use cases for WOMBAT (from literature) will be highlighted. WOMBAT-PK, initially centered on pharmacokinetics data, was developed in 2003 to take advantage of the wealth of information for approved drugs. The use of WOMBAT-PK, most recently in relationship to BDDCS (Biopharmaceuticsl Drug Disposition and Classification System) and its application in drug discovery and development, will be detailed.
10:50

52

Developing “Best Practices” in predictive cheminformatics for drug-discovery applications
Curt M Breneman1, brenec@rpi.edu, Michael Krein1, Margaret McLellan1, Tao-wei Huang1, Lisa Morkowchuk1, Dimitris K. Agrafiotis2. (1) Department of Chemistry & Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, United States (2) Johnson & Johnson Pharmaceutical Research & Development, LLC, Spring House, PA 19477, United States
After decades of development and nearly continuous use (and abuse) in support of drug discovery efforts, QSAR and related descriptor-based statistical learning methods have earned mixed reviews throughout their checkered past. Why is this? In practice, it is necessary for users to choose from a long list of available descriptors and then select a machine learning method with the hope that the resulting model will have a chance of representing the physical effects that actually control the endpoint of interest. All too often, the resulting models might have been over-trained using small datasets, or could have limited applicability domains. Inappropriate descriptor choices can also doom such efforts. Why and when does this happen? In this talk, the evolution of “Best Practices” in predictive cheminformatics will be illustrated by way of a series of example scenarios, concluding with some guidelines and recommendations for both model builders and end users.

Section B
San Diego Convention Center
Room 25C

Joint CINF-CSA Trust Symposium Beyond Small Molecules: Pushing the Envelope for Chemical Structure Representation Financially supported by Chemical Structure Association Trust
K. Taylor, Organizer, Presiding
8:00   Introductory Remarks.
8:05

53

Cheminformatics for material discovery: Representation, searching and screening of porous materials
Richard L Martin, richardluismartin@lbl.gov, Maciej Haranczyk, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
We summarize the recent advancements in material discovery facilitated by the application of cheminformatics concepts to crystalline porous materials. We discuss challenges involved in developing descriptors and comparison techniques for these structures, and comment on the role of the similar property principle in material analysis. Our approach is to focus on void space, rather than the overall structure – this allows us to describe a material from the point of view of a guest molecule. We perform the Voronoi decomposition to obtain a periodic graph representation of a material's void space, and construct novel descriptors – Voronoi holograms – by automatically inspecting the graph. Through application of a modified Tanimoto similarity coefficient and MaxMin diversity selection, we illustrate the calculation of (dis)similarity using this descriptor, and the retrieval of a diverse and representative subset of promising candidate materials for CO2 capture not obtainable through the use of existing structural descriptors.
8:30 54 New strategies to normalize chemical structure representations and weed-out impractical small molecules
Evan Bolton, bolton@ncbi.nlm.nih.gov, National Center for Biotechnology Information, Bethesda, Maryland 20894, United States
What is a reasonable chemical structure representation? A typical chemist reply is: one that I can accurately and unambiguously interpret. A practical answer, however, a chemist interpretation and a computer interpretation may be at distinct odds when defining what “reasonable” means. There are many implicit vs. explicit aspects to a chemist interpretation lost when interpreting a chemical structure. There are missing formal charges (e.g., nitro groups represented by “*N(=O)O”), implied resonance by using double bonds (e.g., carboxylic acid represented by “*C(=O)=O” rather than “*C(=O)O”), and so on. A chemist might overlook and implicitly understand and correct these aspects in their understanding, but a computer is often confused and happy to reject a chemical structure as being unreasonable without substantial chemical intuition programmed into the interpretation. Alternatively, a chemical structure may seem completely reasonable by all known valence rules, etc., to a compute but is completely reject by a chemist as being impossible to exist (e.g., “OOOOOOOOO”, poly-peroxides).
This presentation will outline new strategies being explored at PubChem to determine whether a chemical structure is unreasonable or impractical. The basic approach is to take advantage of a comprehensive survey of first, second, and third order nearest-neighbor environments and develop a self-learning and automated statistical-based approach to reliably predict the likelihood a chemical is reasonable (or its better representation!) without the need for developing and maintaining a set of transforms (e.g., SMIRKS).
8:55 55 Efficient perception of proteins and nucleic acids from atomic connectivity
Roger A Sayle, roger@nextmovesoftware.com, NextMove Software, Cambridge, Cambridgeshire CB4 0EY, United Kingdom
A common problem in the conversion of molecular data file formats is the annotation of amino acid and nucleic acid residues not explicitly represented in “small molecule” file formats describing only element type and 3D co-ordinates or atomic connectivity. This problem has limited the interoperability between chemical information processing programs and has led to the situation where molecular graphics programs currently treat the same molecule differently depending upon the file format that it is stored in. An algorithm has been developed to rapidly identify polypeptides and nucleic acids from simple connectivity that can assign standard atom names, residue names, residue numbers and chain identifiers to each atom, and bond orders to each bond. One of the features of this algorithm is a very efficient method for identifying a sidechain from a set of rooted graphs, which has running time linear in the number of atoms in the sidechain.
9:20 56 Organization and analysis of information for biotherapeutics research
Hugo O Villar1, hugo@altoris.com, Mark R. Hansen1, Eric Feyfant2. (1) Altoris, Inc., La Jolla, CA 92037, United States (2) Global Biotherapeutic Technologies Dept, Pfizer, Inc., Cambridge, MA 02140, United States
For decades, different chemoinformatics tools have guided the identification of structure activity relationships in chemical series towards the development of new therapeutic agents. Now, active research aimed towards the development of biopolymers as therapeutic agents has generated the need for new tools to organize and identify structure activity relationships (SAR) in large volumes of data being generated as has been done historically in medicinal chemistry. Many of the concepts and visualization tools used in chemoinformatics can be adapted to deal with the challenges presented with peptides, proteins and nucleic acids. We will present a new desktop application, SARvision|Biologics to mine and visualize trends in data generated in biologics research. We will show how some concepts used in cheminformatics research need to be tailored and can be adapted for biologics research.
9:45   Intermission.
9:55 57 Markush structure usability in patent and combinatorial chemistry: New approaches and software tools
Wei Deng, ddeng@chemaxon.com, Szabolcs Csepregi, ChemAxon, United States
Markush structures are widely used in combinatorial libraries and patents. However, the flexibility and complexity of Markush structures make them difficult to create, index, visualize, search, and enumerate. Recent improvement of various ChemAxon applications to toggle this problem will be introduced in this presentation.
It will be shown how Markush structures can be created automatically from a library of specific chemical structures, their static and dynamic structures visualized (Markush Viewer, Enumeration and Reduction) and searched. Interactive navigation and searching of Thomson Reuters patent content will also be described, including Markush and specific structures and other patent data. Recent developments make the handling of these databases easier, faster and more accurate. The query features and various visualization options of search results all help the casual or more experienced users to understand the vast amount and complex data that are contained in the patent literature.
10:20

58

Rendering the stages of structure elucidation: ACD/Labs Markush representation
Andrey Yerin, erin@acdlabs.ru, Ian Peirson, Advanced Chemistry Development, Inc. (ACD/Labs), Toronto, Ontario M5C 1T4, Canada
The Markush structure is a favourite tool of patents, allowing a large number of discreet structures by definition of a single object. While recent cheminformatics tools provide a possibility to operate with Markush structure, it remains almost totally the object for patents, being beyond everyday chemical applications. The workflow of metabolite identification or impurity and degradant profiling now demands the rendering of stages corresponding to the specific degree of knowledge about the chemical structure.
ACD/Labs have developed several tools that allow encoding and visualization by Markush tools multiple variable substitution points, mass and formula modifications. The ability to create and search a database of such structures extends possibilities to retain, extract and leverage knowledge in the organization. The implemented structure representation can be encoded by traditional structure formats and may become a standard tool for the exchange of partially defined structures between various chemical applications.
10:45

59

New developments in Markush structure searching
Donald Walter, don.walter@thomsonreuters.com, Thomson Reuters, New York, NY 10036, United States
Patents protect compounds that are specifically disclosed (e.g. butylated hydroxytoluene) and disclosed as families of compounds (e.g. hindered phenols), called Markush structures. Since a single Markush structure can represent an enormous number of specific structures, representing and searching Markush structures represent special challenges. Furthermore, determining which embodiments of the Markush match your query is often a time consuming process. This talk will focus on powerful new ways to search Markush structures and analyze them, all as part of an integrated cheminformatics system to speed discovery and legal evaluation.
11:10

60

Representing and retrieving non specific structures
Keith T Taylor, keith.taylor@accelrys.com, Accelrys Inc, San ramon, California 94583, United States
The valence model has allowed the encoding of most of the structures used in lifescience research. In some cases, it is necessary to standardize on a form for a substructure but these do not present significant challenges to understanding. Substances that are used in industrial, and consumer products industries often present significant challenges to simple representations; examples include polymers, and substances derived from nature such as vegetable oils. The growing importance of biotherapeutics brings significant challenges due to their size, and natural and unnatural post translational modifications.
The management of substances and their representation in electronic systems will be reviewed and remaining challenges identified.
11:35   Concluding remarks.
Drug Discovery Data Alchemy
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Drug Discovery Structural Bioinformatics: Modeling Protein-Protein Interactions and Novel Drug Targets
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Perspectives in Applied Computational Methods
Sponsored by COMP, Cosponsored by CINF and MEDI

MONDAY AFTERNOON

Section A
San Diego Convention Center
Room 27A

Computer-Aided Drug Design: Hopes, Reality and Prospects Cosponsored by COMP
C. Corbeil, J. Cross, Organizers
O. Ravitz, Organizer, Presiding
1:00   Introductory Remarks.
1:05 61 Toward a computational pipeline from antibody homology modeling to docking to design
Jeffrey J Gray, jgray@jhu.edu, Chemical & Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
This talk will discuss the successes and limitations of homology modeling and protein structure prediction methods in general, what impact they have had on drug discovery, and how they will transform drug design in the future.
1:50 62 MD simulations in pharmaceutical research - examples and lessons learnt
Hannes G Wallnoefer1,2, Klaus R Liedl2, Clara Christ1, Daniel Seeliger1, Thomas Fox1, thomas.fox@boehringer-ingelheim.com. (1) Computational Chemistry, Lead Identification and Optimization Support, Boehringer Ingelheim Pharma GmbH & Co KG, Biberach, Germany (2) Institute of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
Due to methodological advances and increased computer power, molecular dynamics (MD) based techniques have become feasible for application in drug design. MD simulations and Free Energy calculations can be used to investigate conformational variability of target proteins and to estimate ligand-receptor binding free energies.
We will show examples how the results of such simulations can be used in lead identification and optimization of NCEs. In the NBE field, we use MD simulations to study the conformational dynamics of antibodies and nanobodies, in particular to assess the flexibility of CDRs. The trajectories, for instance, reveal the average solvent accessibility of residues that are prone to post-translational modification, e.g. oxidation of Methionine or deamidation of Asparagine. Insights obtained from these analyses influence sequence optimization considerations.
Whether these techniques will become standard tools in drug discovery will depend not only on the accuracy of their results, but also on the availability of standardized and reliable setup and analysis procedures, as often human time rather than computing time turns out to be the bottleneck.
2:35   Intermission.
2:50 63 Fragment-to-lead using fragment molecular orbital QM calculations
Richard J Law, richard.law@evotec.com, Osamu Ichihara, Michael P Mazanetz, Michelle Southey, Mark Whittaker, David Hallett, Evotec, United Kingdom
Screening of low molecular weight weak binders, “fragments”, and obtaining hits is a well understood process that can be achieved by many different assay techniques. Less well defined is how to proceed once a hit is obtained. Computational chemistry, and the application of multiple techniques, plays a vital role in understanding and ranking the many potential routes for fragment expansion design. Protein-ligand interactions are routinely investigated by docking and the results are often ranked using molecular mechanics (MM) based scoring functions. MM scoring functions have many limitations and as a consequence scoring functions do not adequately predict ligand binding affinity nor do they describe the interactions in sufficient detail as to accurately and illustratively guide medicinal chemistry. To rationalize binding at a quantum level, we demonstrate the application of the fragment molecular orbital (FMO) method as a novel computational methodology and its use in structure-based drug design to guide medicinal chemistry. As well as using FMO to prioritize fragment hits for expansion and rank docking results, it can also be used to perform virtual fragment expansion to help guide subsequent rounds of fragment-to-lead chemistry. The method can also be applied to the scoring of molecular probes, such as water, to assess the nature of unoccupied pockets within proteins in order to further guide compound design.
3:35 64 Docking: This might be heaven or this might be…
Martha S Head, Martha.S.Head@gsk.com, Computational and Structural Chemistry, GlaxoSmithKline Pharmaceuticals, United States
GSK has somewhat famously conducted and published an evaluation of docking programs. In that publication, we argued that in general docking programs can find well-docked poses but cannot reliably score those poses, can for at least some protein targets identify actives in a virtual screen but can not a priori be expected to do so for any new protein targets, and cannot rank order molecules by potency. And yet, I continue to assert that docking is a technology that can have impact in structure-based lead optimization. This talk will discuss that apparent paradox and will discuss the role of expertise in maximizing the utility of less-than-perfect computational tools.
4:20   Concluding Remarks.
4:25   Intermission.
4:30   CINF Open Meeting

Section B
San Diego Convention Center
Room 25C

Mobile Space and E-Books
R. Apodaca, Organizer, Presiding
12:45   Introductory Remarks.
12:50 65 Having a mobile app presence - necessary or nice to have?
Steven M Muskal, smuskal@eidogen-sertanty.com, Eidogen-Sertanty, Inc., Oceanside, CA 92056, United States
The growth of mobile computing devices from smart-phones to tablets has been explosive, routinely enabling 24x7x365 connectivity and access into workflows previously constrained to the office. Coupled with cloud computing environments (e.g. Amazon's EC2 and RDS environments), mobile devices and their respective apps have become necessary tools in day-to-day communication and scientific workflow. We will discuss the advantages and disadvantages of native- vs. web-based apps running on mobile devices as well as lessons learned over the last two years after having developed and deployed several mobile apps including iKinase(Pro), iProtein, MobileReagents, Reaction101, Yield101, and others.
1:25 66 Molecular visualization apps in education and research
Jason Vertrees, Blaine Bell, Woody Sherman, woody.sherman@schrodinger.com, Schrodinger, Inc., New York, NY 10036, United States
The power and ubiquity of mobile devices has increased dramatically in recent years, allowing for high-performance molecular visualization deployable to the masses. Here, we present the current status of molecular visualization on mobile devices, with a focus on two apps. The first, Ball & Stick, is geared toward middle school and high school educational. Common tasks are made very simple and a wizard-like workflow manager guides students through lessons. The second, Mobile PyMOL, is a recent development to port the most powerful and highly used desktop molecular visualization software to mobile devices. Mobile PyMOL is geared toward college-level education and research at all levels. We discuss the advantages of these apps for various tasks and provide a roadmap for each app moving forward. We also discuss the challenges of developing in mobile environments and potential ways to overcome those challenges.
2:00 67 Building a mobile app ecosystem for chemistry collaboration
Alex M. Clark, aclark.xyz@gmail.com, R&D, Molecular Materials Informatics, Montreal, Quebec H3J2S1, Canada
The number of mobile apps for chemistry has grown rapidly in the last year. Some of these apps feature cheminformatics capabilities that are powerful and mature enough for integration into a real world workflow. Apps have many mechanisms for sharing data, including interprocess communication, remote procedure calls, mailing attachments, and cloud data storage. This presentation will focus on recent new features and new apps, with a particular emphasis on data sharing, collaboration, and finding ways to modularly substitute mobile apps for traditional cheminformatics tools.
2:35   Intermission.
2:45 68 Chemistry made mobile – the expanding world of chemistry in the hand
Antony Williams, williamsa@rsc.org, Department of Informatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States
Mobile devices are now mainstream handheld computers providing access to computational power and storage that a decade ago was available only on desktop computers. In terms of chemistry informatics the majority of capabilities that were previously found only on desktop computers is fast migrating to mobile devices making use of the combination of powerful visualization capabilities, fast cloud-based calculations, websites optimized for the mobile platforms, and delivering “apps”. This presentation will provide an overview of how access to chemistry continues to be made increasingly mobile and specifically on how the Royal Society of Chemistry is contributing to this computing environment.
3:20 69 ChemDoodle Mobile: Leveraging mobile apps in chemistry
Kevin J Theisen, kevin@ichemlabs.com, iChemLabs, LLC., Piscataway, NJ 08854, United States
The mobile devices market introduces new dynamics for start-up and established companies alike. Benefiting from this market requires significant investment. ChemDoodle Mobile is an HTML5 based mobile app that is popular on both the Apple iTunes Store and the Android Market. The tools involved in the development and deployment of ChemDoodle Mobile are discussed, with a focus on four libraries, ChemDoodle Web Components, jQuery Mobile, Sencha Touch and PhoneGap. After an app is created, it is still a difficult task to make it a successful product. Given the limited scientific market, alternative means of revenue generation are discussed. We will review some data for ChemDoodle Mobile. If done in an affordable manner, mobile apps can provide a company with a significant product to attract and satisfy customers.
3:55 70 Mobile apps for drug discovery
Antony J Williams1, Sean Ekins2, ekinssean@yahoo.com, Alex Clark3. (1) Royal Society of Chemistry, Wake Forest, NC 27587, United States (2) Collaborations in Chemistry, Fuquay Varina, NC 27526, United States (3) Molecular Materials Informatics, Montreal, Quebec H3J 2S1, United States
Mobile hardware and software technology continues to evolve very rapidly and presents drug discovery scientists with new platforms for accessing data and performing data analysis. Smartphones and tablet computers can now be used to perform many of the operations previously addressed by laptops or desktop computers. Although the smaller screen sizes and requirements for touch screen manipulation can present user interface design challenges, especially with chemistry related applications, these limitations are driving innovative solutions. We will present an introduction to some of the mobile apps we have been involved with most closely. One example is the Green Solvents app which utilizes data created by the ACS Green Chemistry Institute Pharmaceutical roundtable. We will also describe a wiki to capture information about scientific mobile apps (www.scimobileapps.com) and provide our perspective on what mobile platforms may provide the drug discovery scientist in the future as this disruptive technology takes off.
Drug Discovery Structural Bioinformatics: Exploring Structure-Function Relationships
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Perspectives in Applied Computational Methods
Sponsored by COMP, Cosponsored by CINF and MEDI

MONDAY EVENING

Section A
San Diego Convention Center
Hall D

Sci-Mix
R. Bienstock, Organizer
8:00 - 10:00
  1 See previous listing
  4 See previous listing
  5 See previous listing
  7 See previous listing
  12 See previous listing
  14 See previous listing
  16 See previous listing
  17 See previous listing
  18 See previous listing
  19 See previous listing
  24 See previous listing
  27 See previous listing
  29 See previous listing
  33 See previous listing
  34 See previous listing
  37 See previous listing
  38 See previous listing
  40 See previous listing
  42 See previous listing
  43 See previous listing
  44 See previous listing
  45 See previous listing
  47 See previous listing
  48 See previous listing
  53 See previous listing
  66 See previous listing
  71

Efficient one-pot preparations of PI3Kd inhibitors using algorithmic network detection
Chris M. Gothard, cgothard@northwestern.edu, Nosheen A. Gothard, Siowling Soh, Bartosz A. Grzybowski, Department of Chemistry, Northwestern University, Evanston, IL 60208, United States
One-pot reactions are central to the development of efficient chemical syntheses of complex and biologically important substrates in modern industrial-scale chemistry. Algorithmic detection of 'one-pot' reaction sequences can assist synthetic chemists in developing novel chemical processes and serve as a good starting point in the search for tandem reactions. We have identified novel multistep one-pot routes to medicinally important targets, such as PI3Kd inhibitors. Optimization of the synthetic routes has led to the efficient and high yielding preparation of these compounds.

Gothard

  72 Statistical analysis of microarray gene expression data from a mouse model of Toxoplasmosis
Shrikant d Pawar, pawar1550@gmail.com, Claire Rinehart, Cheryl Davis, Department of Biology, Western Kentucky University, Bowling Green, KY 42101, United States
Toxoplasmosis, caused by the protozoan parasite, Toxoplasma gondii is a major cause of morbidity and mortality in patients with AIDS and an important cause of miscarriage, stillbirth and congenital disease in newborns. Previous studies have provided evidence that dietary supplementation with vitamin E and selenium is harmful during experimental toxoplasmosis in mice, whereas a diet deficient in vitamin E and selenium results in decreased numbers of tissue cysts in the brain and dramatically reduced brain pathology. The overall goal of the present study was to determine the impact of dietary supplementation with antioxidants on gene expression in the brains of non-infected mice and in mice infected with T. gondii using microarray analysis. RNA was isolated from the brains of C57BL/6 mice, and an Agilent Oligo Whole Mouse Genome Microarray (Agilent Technologies, Inc.) was performed. A total of 48 chips were normalized by calculating Z scores. Differentially expressed genes were identified by performing ANOVA and forming patterns. These differentially expressed genes and their respective fold change ratios were used in Ingenuity Pathway Analysis (IPA) software to analyze the pathways involved with these genes.
  84 See later listing
  88 See later listing
  91 See later listing
  122 See later listing
  132 See later listing
  133 See later listing

TUESDAY MORNING

Section A
San Diego Convention Center
Room 27A

Recent Advances in Reaction Searching
R. Schenck, D. Evans, Organizers, Presiding
8:30   Introductory Remarks.
8:35 73 Synthetic information challenges for the medicinal chemist
Haiying He, he_haiying@wuxiapptec.com, Department of Medicinal Chemistry, WuXi AppTec Co., Shanghai, Waigaoqiao Free Trade Zone 200131, China As a contract research organization, our chemists may be responsible for many aspects of the drug development process: from the synthesis of lead candidates to the start of clinical trials. Since it can take less time to design a drug-like molecule than to synthesize it in the lab, much of our chemists' time is spent with electronic information products developing a synthetic action plan. With a broad range of problems to solve, our chemists need information sources that cover a broad range of chemistry. This talk will cover the information sources that we have acquired and discuss how they are used to solve the problems that our synthetic chemists face on a daily basis.
9:00 74 SOS 4.0: Advances in text, structure, and reaction searching
M. Fiona Shortt de Hernandez1, fiona.shortt@thieme.de, Rolf Hoppe1, Guido F. Herrmann1, Peter Loew2. (1) Thieme Publishers Stuttgart, Stuttgart, Germany (2) InfoChem GmbH, Munich, Germany
We have taken a major chemistry reference work in print and designed an interactive Web version from first principles. The online product combines full-text browsing functionality together with InfoChem's modern structure and reaction search capabilities. Science of Synthesis is a unique, structure/reaction searchable, full-text resource that provides the user with expert-evaluated methods and reactions.
Science of Synthesis covers synthetic methodology developed from the early 1800s to-date for the entire field of organic and organometallic chemistry. World-renowned chemists have chosen important molecular transformations for a class of organic compounds and elaborated on their scope and limitations. The logical, structured order of content within Science of Synthesis means that it is simple to gain an overview of the wider context in a particular subject field.
The user can search for specifically defined full-text fields and advanced text searching options are available. It is possible to search the manually prepared named reaction index, which associates transformations with specific named reactions even if they are not mentioned as such in the full text i.e. deep indexing.
A number of filters, called fields are available allowing the hitlist to be refined. The hitlist generated by a text search can be filtered depending upon the location in the full text that the result occurs e.g. in the title, full text, or references. It may also be sorted by relevance - using an algorithm that weights the text results – or by publication date.
The hitlist generated by a structure search may be filtered by the role of the structure e.g. product, catalyst, or solvent. It may also be filtered by the best match criteria (exact or substructure result). The hitlist itself is ordered by relevance, using an internal algorithm that weights exact structure and substructure results. Sophisticated search operators (rating!) allow for efficient and convenient reaction searching.
9;25 75 Automated extraction of reactions from the patent literature
Daniel M Lowe, daniel_lowe_uk@yahoo.co.uk, Peter Murray-Rust, Robert C Glen, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire CB2 1EW, United Kingdom
We have created a pipeline of recently enhanced open source components for extracting chemical reactions from full text chemical literature. OSCAR4 is used to recognise chemical entities and resolve to structures where appropriate. OPSIN is used to resolve systematic chemical names to structures. Chemical Tagger performs part of speech tagging allowing the interpretation of phrases in chemical syntheses. The final output is a semantic representation (chemical components and their roles, reaction conditions, actions including workup, yield and properties of the product). We then attempt to map all atoms in the product(s) to reactants. If successful we also attempt to calculate the stoichiometry of the reaction. The system has been deployed on over 56,000 USPTO patents published since 2008. The level of recall is useful and most extracted reactions make chemical sense. The pipeline is generally applicable to reactions in chemical literature including journals and theses.
9:50   Intermission.
10:00 76 Efficient searching and similarity of unmapped reactions: Applications to pharmaceutical ELN analysis
Roger A Sayle1, roger@nextmovesoftware.com, Thierry Kogej2, David Drake3. (1) NextMove Software, Cambridge, United Kingdom (2) DECS, AstraZeneca, Molndal, Sweden (3) RDI, AstraZeneca, Alderley Park, Cheshire, United Kingdom
Complex queries of reaction databases typically require every database entry to be atom-to-atom mapped. Such atom mapping provides insights in to the reaction mechanism and conveniently allows determination of the location of the reaction center and which bonds are made or broken during the course of a reaction. Unfortunately, such annotations are not routinely captured in some real world applications, such as Electronic Lab Notebooks (ELNs), precluding ready analysis of the reaction transformations described and reducing the knowledge exploitation from these data sources. Although automated atom mapping algorithms exist, their performance on difficult (e.g. unbalanced) reactions and ambiguous alternatives make methods for processing unmapped reactions desirable. We describe several algorithms for efficiently searching such “noisy” reaction databases, including a measure of reaction similarity that does not require prior explicit atom mapping. This measure is used to cluster synthesis experiments in an ELN, identifying areas of related or novel chemistry.
10:25 77 Novel tools and techniques in reaction searching: “Name Reaction” and “All-In-One” reaction searches
Valentina Eigner-Pitto, ve@infochem.de, Hans Kraut, Heinz Saller, Heinz Matuszczyk, Peter Loew, InfoChem GmbH, Munich, Germany
Beginning in the late 1980s InfoChem started to develop a profound understanding of the storage and handling of chemical structure and reaction information. A major challenge in reaction searching emerged in 1989, when InfoChem acquired an exclusive license to a reaction database (SPRESI) of (initially) 1.8 million reaction records. Since the reaction database management systems (REACCS and ORAC) commercially available at that time could not handle more than 500,000 records, InfoChem was forced to conceive concepts for the selection of meaningful subsets of reaction databases and for performing reaction search algorithms. Based on a high quality reaction center detection module (ICMAP), InfoChem developed a sophisticated reaction type classification application (CLASSIFY) that is still unique to this day. Besides clustering of reaction databases and linking of different reaction databases, this software allows a reaction 'similarity' search (RTS). The world's major vendors of chemical information have adopted this technology to enhance the reaction retrieval capabilities of their products. More recent developments at InfoChem have resulted in a processing tool for the algorithmic detection of name reactions in any reaction database, and the development of smart reaction search operators such as the “All-in-one” reaction search, both of which are based on the company's fundamental software and technology assets. This talk will briefly outline the background and technology of these algorithms, and present some end-user orientated applications derived from these technologies in detail: “ICNameRXN” and “all-in-one” reaction search (RSA).
10:50 78 Catalyzing information retrieval for organometallic and metal-mediated reactions
Judith N. Currano, currano@pobox.upenn.edu, Chemistry Library, University of Pennsylvania, Philadelphia, PA 19104-6323, United States
Organometallic substances can be extremely difficult to find in many databases, given the inconsistencies in data entry. This makes retrieving reactions involving organometallics even more challenging, particularly organic reactions that are catalyzed by a metal-containing species. This paper presents some interesting ways of getting around the limitations of today's tools and finding reactions catalyzed by classes of organometallic substances, as well as introducing methods of going beyond substructure searching to locate reactions in which both starting material and product contain metals.
11:15 79 Finding synthetic chemistry in global literature and patents
Kurt Zielenbach1, kzielenbach@cas.org, Jeffrey M Wilson2, Jeffrey D Schloss2, Bryan J Harkleroad2. (1) Marketing, CAS, Columbus, Ohio 43202, United States (2) Product Development, CAS, Columbus, Ohio 43202, United States
CAS, the world's authority in chemical information, has seen steady growth in chemists' need for synthetic pathways. With more than 50 million chemical reactions now available in the CAS databases, chemists are asking for useful ways to navigate through large answer sets to find the best reaction. This talk will focus on the features that SciFinder provides to solve the synthetic research problems typically faced by chemists. Among the features that chemists need, and this presentation will illustrate, are relevancy ranking, direct access to experimental procedures, and tools for organizing search results, creating synthetic schemes from individual reactions, and communicating proposed reaction schemes with their peers.

Section B
San Diego Convention Center
Room 25C

Systems Chemical Biology and Other "Systems" Approaches in Chemistry and Biology
T. Oprea, J. Kuras, Organizers, Presiding
8:00   Introductory Remarks.
8:05 80 WITHDRAWN
8:35 81 Development of a human diet interactome map
Irene Kouskoumvekaki, irene@cbs.dtu.dk, Department of Systems Biology, Technical University of Denmark, Kgs Lyngby, Denmark
Similar to pharmaceuticals, food contains compounds that act as modifiers of biological functions. However, the level of complexity is increased by the simultaneous presence of a variety of components, with diverse chemical structures and numerous biological targets. Nowadays, it is widely recognized that systems chemical biology has the potential to increase our understanding of how small molecules interact with biological systems. A fruitful strategy to approach and explore the field of nutritional research is, therefore, to borrow methods that are well established in pharmaceutical research.
We have recently initiated a project at CBS/DTU, where we used text mining to construct a unique database with state-of-the-art information concerning food and its molecular components. During the talk, I will present the steps we followed for developing a database that consists of 1,500 food types and 35,000 small molecules and I will highlight applications through linking the nutritional chemical space with the human proteome and disease.
9;05 82 Studying the chemical interactome space between the human host and the genetically defined metabotypes of our gut
Gianni Panagiotou, gpa@bio.dtu.dk, Department of Systems Biology, Technical University of Denmark, Lyngby, Europe 2800, Denmark
The bacteria that colonize the gastrointestinal tracts of mammals represent a highly selected metagenome that has a profound influence on human physiology by shaping the host's metabolic and immune network activity. Despite the recent advances on the biological principles that underlie microbial symbiosis in the gut of mammals, mechanistic understanding of the contributions of the gut microbiome and the links of variations in the metabotypes to the host health are obscure. Here we mapped the entire biosynthetic potential of the gut microbiome based on metagenomics sequencing data that derived from fecal samples of 267 European individuals. These metabolic signatures were used to study the signaling cascade triggered in humans through chemical (bacterial metabolites)-protein interaction networks and provide evidence of how specific changes in the gut microbial community/metabolism might affect or counteract the development of IBDs, obesity and related diseases.
9;35 83 Comparative study of small molecule inhibition of Mycobacterium tuberculosis and Francisella tularensis
Sandra V Bennun, Elebeoba E May, eemay@sandia.gov, Nanobiology Department, Sandia National Laboratories, Albuquerque, NM 87185, United States
The ability of intracellular pathogens to persist depends on their capacity to biochemically adapt to changes in the host's intracellular environment modulated by immune response mechanisms. Targeted introduction of small molecules can interfere and reduce the pathogen's metabolic capacity. We use computational systems chemical biology (SCB) methods to comparatively investigate the effects of inhibitory molecules on the tricarboxylic acid cycle (TCA) of M. tuberculosis and F. tularensis (Ft), two pathogens that infect host macrophages. The reconstructed metabolic pathway for Ft is missing reactions present in Mtb, a slower growing pathogen. Differences in metabolic capacity may impact several factors including intracellular localization, response to oxidative stress, and potentially response to small molecule inhibitors. We evaluate inhibition of isocitrate lyase (ICL) during aerobic and oxidative stress conditions, and simulate the metabolic consequence of ICL disruption in both systems. We will discuss observations regarding Mtb and Ft's response given the variations in their representative metabolic models.
10:05   Disucssion.
10:20   Intermission.
10:30 84 Identifying druggable targets by mining open chemical biology data
Yanli Wang, ywang@ncbi.nlm.nih.gov, National Institutes of Health, National Center for Biotechnology Information, Room 5S506, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States
Molecular target identification plays a central role for drug discovery and small molecule probe development. A in-silicon approach is presented to suggest molecular target and chemical-target association network based on comparing and combining multiple bioactivity endpoints from chemical biology experiments and target annotations available in open-access databases. A clustering analysis based on bioactivity profile similarity reveals strong correlations between chemical structures, across-panel biological responses, and chemical-target associations, suggesting novel compound candidates with desired pharmacological properties may be identified by bioactivity profile comparison. A computational approach was further developed based on the BioActivity Profile Similarity Search (BASS) to mutually identify compound-target associations among neighbor compounds with similar bioactivity spectrum. An overall success rate of 45% was obtained for the predicted compound-target associations. Analysis shows that BASS not only could identify structurally similar compounds but also could suggest novel chemical scaffolds for the aimed targets.
11:00 85 Exploiting semantic networks of public data for systems chemical biology
David J Wild, djwild@indiana.edu, School of Informatics and Computing, Indiana University, Bloomington, IN 47405, United States
We have developed a systems chemical biology data resource called Chem2Bio2RDF (www.chem2bio2rdf.org) that integrates publicly available datasets pertaining to chemical compounds, drugs, drug side effects, targets, genes, pathways, diseases and scholarly publications. The dataset is semantically annotated using ontologies including a new chemogenomic ontology called Chem2Bio2OWL. We have developed a variety of graph-based and other network algorithms to look for chemogenomic and other associations in this data, including association search tools, integration of the literature with a novel BioLDA topic modeling method, a method called SLAP for missing link prediction, and rule-based inference of new relationships. In this talk, I will describe the Chem2Bio2RDF resource and give an overview of the algorithms and how they are being applied in drug discovery problems.
11:30 86 Enhancing chemoinformatics with pathway analysis tools: An integrated approach to drug discovery
Tatiana Khasanova, tatiana.khasanova@thomsonreuters.com, Eugene Myshkin, Sirimon O'Charoen, Yuri Nikolsky, Svetlana Bureeva, svetlana.bureeva@thomsonreuters.com, IP and Science, Thomson Reuters, Carlsbad, California 92008, United States
Modern approach to drug discovery encompasses multifauceted, integrated consideration of both chemical and biological processes. System biology methods provide tools necessary to analyze complicated relationships between molecular entities in both normal and disease states. This approach enables understanding fine details of a drug's mode of action, which keeps shifting from consideration of a target to an analysis of a whole affected pathway. Integrating system biology approach with chemoinformatics methods is a key to a successful application of this new approach to drug discovery process. Over the last four years, GeneGo, a Thomson Reuters company, has created a unique systems pharmacology suite (MetaDrug™) that leverages the power of systems biology (target CVs, OMICs data analysis), flexibility of classical chemical tools (QSAR, metabolic rules) and reliability of a comprehensive manually curated “knowledge base” for analysis of biological effects of new and known small molecules. The multi-step analysis workflow proceeds from compound targets to affected pathways, from pathways to associated diseases and toxicities. The same approach enables researchers to go backwards and identify targets for compounds with known phenotype or effect on pathway level. In our presentation, we will discuss several use cases that illustrate how this approach can be used for drug repositioning, solving mechanism of action and discovery of synergistic drug combinations.
Drug Discovery Looking for a Few Good Methods? We Got 'Em Here
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Perspectives in Applied Computational Methods
Sponsored by COMP, Cosponsored by CINF and MEDI

TUESDAY AFTERNOON

Section A
San Diego Convention Center
Room 27A

Recent Advances in Reaction Searching
R. Schenck, D. Evans, Organizers, Presiding
1:30   Introductory Remarks.
1:35

87

Updated tools, techniques and data sources for effective reaction retrieval in support of synthetic methodology and drug discovery
Matthew A Kellett, matthew.kellett@thomsonreuters.com, Life Sciences Editorial Team, Thomson Reuters, Philadelphia, PA 19130, United States
Chemical reaction databases are a key element in support of synthetic chemistry research and multiple stages of the drug discovery and development process. While substructure searching provides useful results, newer tools in conjunction with the integration of enhanced indexing allow for more effective analysis of results. One example is the inclusion of reaction mapping and synthesis planning capabilities which enable the incorporation of these results directly into the workflow process. The use of separate databases that include unique features such as: detailed condition information, keywords, biological activity indexing, reference citations, multiple examples of new methodologies, and/or specific drug and natural product syntheses improve the overall retrieval of relevant chemical literature and synthetic methods.
1:55 88 Extremely rapid searching of in-house reaction databases: Turning ELN data into a searchable library
Philip J Skinner, philip.skinner@perkinelmer.com, Scott Flicker, Joshua Wakefield, Sean Greenhow, Megean Schoenberg, Kate Blanchard, Phil McHale, Sandra Sessoms, Robin Smith, PerkinElmer Informatics, United States
Electronic Laboratory Notebooks (ELNs) are increasing used to capture experiments, a majority of early deployed systems to capture chemical reactions. However, ELNs optimized for data capture are less so for data searching, that organizations with the largest databases and the most to benefit from their in-house ELN reaction libraries are those least able to access them. To facilitate reaction searching we developed a novel fast algorithm applied to data extracted from an ELN. We use fragment-based fingerprinting to determine transformation fingerprints. Substructure search algorithms, used to determine fragments within a molecule, generate the transformation fingerprints used in searches as strings. Searches compare target reactions with pre-calculated transformations to find similar reactions. Lookups for transformation strings are performed in a cascading similarity order. Results are bucket-sorted, displaying closest matches first. This results in extremely rapid return of relevant search results, allowing researchers to quickly mine the ELN for information hitherto unavailable.
2:15 89 Understanding search results: From a single reaction to scope and limitations of the reaction route in the ChemInform Reaction Library (CIRX)
Yana Steudel, steudel@fiz-chemie.de, Ulrike Schramke, ChemInform Databank, FIZ CHEMIE Berlin, Berlin, Germany
The main goal of a synthetic chemist is to design new compounds with specific properties. Most databases deliver synthetic procedures for the preparation of known compounds. ChemInform with its unique full reaction schemes can be used to plan the preparation of yet unknown derivatives. The reaction scheme provides the full information on the scope and limitations of a given reaction with one glimpse and imparts a better understanding of the general synthetic value of a reaction to the chemist. The presentation will demonstrate the possibilities offered by ChemInform to implement reaction search results in the current research projects.
2:35 90 Dealing with chemical reality: Handling reactions plus associated data and branching reaction schemes
Jonathan S Brecher, Harold Helson, Phil J McHale, phil.mchale@perkinelmer.com, PerkinElmer Informatics, Cambridge, MA 02140, United States
Experiments in chemical ELNs include the synthetic route, text, and other compounds or reactions involved (e.g. a side reaction or by-product). Intelligent retrievability requires storing this information as a unified entity, so the structures, information and context can be found. Many systems store these pieces of information separately, thus losing their inter-relationships. Another complication is branched synthetic routes. Most systems can handle single step reactions and linear multi-step reactions: but branched reactions are problematic. We describe methods for unified indexing of reactions and associated data and structures and for handling branched reaction schemes in the following categories:
• Divergent: reaction product is a reactant in two or more different reactions;
• Convergent: compound is the product of two or more different reactions;
• Cyclic: reaction scheme starts and ends with the same compound.
We have implemented these methods in a chemical cartridge and exposed them through a chemistry ELN.
2:55   Intermission.
3:05 91 Reaction searching for compounds which do not even exist yet
Carsten Detering, detering@biosolveit.de, Christian Lemmen, Marcus Gastreich, BioSolveIT GmbH, St. Augustin, NRW 53757, Germany
We present an industry-proven method which – extremely rapidly – searches through textbook or corporate reaction recipes and assembles novel molecules similar to any real or virtual query/starting molecule.
This is remarkable because due to the combinatorial nature of reactions the search space is gigantic and the needed time for computation is extremely little: Only one reaction (for example, 10 acids A1-A10 and 10 different amines N1-N10) can formally form 100 products (here: amides) already; the search space amounts up to 1013virtual molecules which are searched in a few minutes only.
The technology thus generates new intellectual property (IP), alongside with the recorded reaction information – proposing how to have the associated synthesis.
The paper will explain the basics of the similarity concept (FTrees-FS [1]) and the search technology and highlight dozens of successful examples including many from the pharmaceutical industry.
[1] www.biosolveit.de/FTrees
3:25 92 Helping you make the right choices for your next synthetic route!
Juergen Swienty-Busch1, J.Swienty-Busch@elsevier.com, David A. Evans2. (1) Elsevier Information Systems, Frankfurt, Germany (2) Elsevier Properties SA, Neuchatel, NE 2000, Switzerland
With a steadily increasing amount of data chemists today face the challenge of finding the best possible path to a desired molecule and to optimize the synthesis. While the size of a reaction database is important to find enough examples for making a good choice it is equally important that this information can be clustered, ranked, categorized and visualized intuitively to get to the desired result quickly. This talk will discuss a few examples showing how Reaxys addresses these challenges.
3:45 93 Advanced reaction searching: A comprehensive treatment of stereoselectivity in reactions
Peter Johnson1, p.johnson@leeds.ac.uk, Anthony P Cook1, James Law2, Aniko Simon2, Orr Ravitz2. (1) Department of Chemistry, University of Leeds, Leeds, W Yorks LS17 8JQ, United Kingdom (2) Simbiosys Inc, Toronto, Ontario M9W 6V1, Canada
The ARChem system for automated retrosynthetic analysis makes use of rules describing retrosynthetic transformations which are generated by automated mining of reaction databases. The application of these rules in a controlled fashion minimises the potential combinatorial explosion of possible routes to a target structure. In the past the system has ignored any stereochemical designators in the target molecule and the retrosynthetic rules were devoid of any stereochemical information, a major omission, given the huge advances in enantioselective synthesis in the past few decades. Recent work addresses this problem through:
a) comprehensive perception of stereochemical information in targets, available starting materials, and reactants and products of literature examples
b) inclusion of the stereochemical course of each reaction into an abstracted rule so that retrosynthetic searches provide stereochemically sound suggestions Details of the work will be given as will examples of the retrosynthetic analysis of stereochemically complex targets.
4:05 94 Algorithmic network detection of reaction sequences: From novel "one-pot" reactions to unanticipated synthetic routes to chemical weapons
Chris M. Gothard, cgothard@northwestern.edu, Nosheen A. Gothard, Siowling Soh, Bartosz A. Grzybowski, Department of Chemistry, Northwestern University, Evanston, IL 60208, United States
Using the known network of chemical reactions, computational searches across the network have enabled us to discover (1) novel tandem reactions (e.g. 'one-pot' syntheses) and (2) unanticipated synthetic routes to dangerous substances (e.g. Chemical Weapons). One-pot reactions are central to the development of efficient chemical syntheses of complex and biologically important substrates in modern industrial-scale chemistry. By screening reactions for both compatibility among functional groups and reaction conditions, we have identified multistep one-pot synthetic routes to medicinally important PI3Kd inhibitors. Optimization of these reaction sequences has led to efficient and high yielding preparations of PI3Kd inhibitors. In addition to medicinal targets, we have also employed network-based detection to uncover unanticipated routes to dangerous substances. Although common precursors to chemicals weapons are well regulated, there exists' alternative routes that utilize only unregulated substances. Detection of these unanticipated routes is an important strategy in limiting access to chemical weapons.
4:25   Concluding Remarks.

Section B
San Diego Convention Center
Room 25C

Systems Chemical Biology and Other "Systems" Approaches in Chemistry and Biology
T. Oprea, J. Kuras, Organizers, Presidingg
1:30

95

Framework for systematic prediction of pharmacologically relevant targets of small molecules
Emmanuel R Yera, eyera@ucsf.edu, Ann E Cleves, Ajay N Jain, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, United States
Drug discovery requires the design of molecules that modulate the activity of specific biological targets with minimal effects on other targets. To systematically predict pharmacologically relevant relationships between small molecules and protein targets, including undesirable off-targets, we have developed a probabilistic framework based on molecular similarity. Small molecules may be quantitatively compared based on 2D or 3D characteristics, the latter being directly related to binding. Given a new molecule along with a set of molecules sharing some biological effect, a single score based on the comparison to the known set is produced, reflecting either 2D similarity, 3D similarity or their combination. The results of a systematic application to a large set of drugs will be presented along with a critical analysis examining what can be learned about drug pharmacology based on different molecular similarity methods. The potential for association of phenotypic effects with specific biological targets will also be discussed.
2:00 96 Designing ligands against multi-target profiles
Andrew L Hopkins, a.hopkins@dundee.ac.uk, College of Life Sciences, University of Dundee, Dundee, Scotland DD1 5EH, United Kingdom
The prospect of multi-target drug design has been recently advanced by the development of computational polypharmacology prediction method. We describe a new approach for the automated design of ligands against profiles of multiple drug targets. The method is demonstrated by the evolution of an approved enzyme inhibitor drug into, brain penetrable ligands with specific polypharmacology or exquisite selectivity profiles for G-protein couple receptors. Overall, 800 ligand-target predictions, of prospectively designed ligands, were tested experimentally, of which 75% were confirmed correct. The method demonstrates automated design can be a useful method to solve the complexity of optimising multiple structure-activity relationships. The validated method shows promise to be a potential source of drug leads where multi-target profiles are required to achieve either selectivity over other drug targets or a desired polypharmacology.
2:30 97 CARLSBAD (Confederated Annotated Research Libraries for Small molecule BioActivity Data): A database and its platform
Gergely Zahoranszky-Kohalmi, GZahoransky-Kohalmi@salud.unm.edu, Jeremy J Yang, Cristian G Bologa, Stephen L Mathias, Oleg Ursu, Jarrett Hines-Kay, Tudor I Oprea, Department of Biochemistry and Molecular Biology, University of New Mexico, Albuquerque, New Mexico 87131, United States
Identifying key interactions and the determinant structural patterns between small molecules and their biological targets remains a major challenge in the field of drug discovery and drug repurposing. Here we introduce CARLSBAD, a platform and database designed to guide the network-based discovery of such complex patterns with the help of maximal overlapping substructures and hierarchical scaffolds. Bioactivity data are delivered on a basis of consensus of leading bioactivity databases (IUPHAR-DB, PDSP, WOMBAT, PubChem, ChEMBL) enhanced by systematic confidence annotations. A web application and a Cytoscape plugin provide a convenient interface for exploring and analyzing interactions between millions of molecules and thousands of biological targets. Applications using the Cytoscape plugin based on CARLSBAD data will be presented.
3:00   Intermission.
3:10 98 Drug combinations to reduce adverse drug reactions and improve intrapatient differences in response
John P Overington, jpo@ebi.ac.uk, Computational Chemical Biology, EMBL-EBI, Hinxton, Cambs CB10 1SD, United Kingdom
ADRs are a major cause of hospitalization and a large cost burden to healthcare systems. A further complicating factor is that many ADRs are rare, but serious, events, dependent on factors such as metabolic state, drug-drug interactions (DDIs) and pharmacogenetic variation within the patient population. We have applied a simple systems-based theoretical model of poly-/network-pharmacology and PK/PD models, combined with a data-mining approach, to the generation of a new strategy to identify drug combinations that have improved safety profiles. This drug combination approach is also theoretically predicted to reduce intra-patient pharmacogenetic-based differences in drug response/efficacy, as an additional emergent property. The theoretical basis for this approach will be outlined, along with representative examples of specific drug combinations with synergistic safety features. Finally, we outline our plans to test this hypothesis across a variety of cardiovascular diseases.
3:40 99 Integrating targets, drugs and clinical outcomes into systems medicine
Tudor I Oprea, toprea@salud.unm.edu, Department of Biochemistry and Molecular Biology, UNM School of Medicine, Albuquerque, NM 87131, United States
For the therapeutic management of chronic diseases, the systems medicine approach shows great promise. Starting from such therapeutic indications that are relevant for complex, multifactorial chronic diseases (e.g., diabetes, asthma, cancer), we examine the relationship between drugs, targets, current indications and counter-indications, as well as serious adverse reactions. The analysis is aided by the use of controlled vocabularies such as those available in MedDRA and ICD-10, and benefits from manual curation. Practical aspects such as single vs. multiple (chronic) dosing, as well as temporal patterns (evolution of disease states) will be considered. Such aspects are likely to pave the way towards a systems medicine approach and improve our understanding of the therapeutic control of chronic diseases.
4:10   Panel discussion.
4:55   Concluding remarks.
Drug Discovery Methods Make Us Smile
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Molecular Mechanics Methodologies: That is All .
Sponsored by COMP, Cosponsored by BIOL, CINF, MEDI, and PHYS
Perspectives in Applied Computational Methods
Sponsored by COMP, Cosponsored by CINF and MEDI

WEDNESDAY MORNING

Section A
San Diego Convention Center
Room 27A

InChI Symposium
A. Tropsha, Organizer
A. J. Williams, Organizer, Presiding
8:30   Introductory Remarks.
8:35 100 IUPAC InChI project: A status report
Stephen Heller, steve@hellers.com, IUPAC, Silver Spring, MD 20902, United States
The current status and use the InChI algorithm will be presented. Future use and extension of the algorithm will be described.
9:05 101 Great promise of navigating the internet using InChIs
Antony J Williams, williamsa@rsc.org, Department of Informatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States
The InChI, the International Chemical Identifier, has been the basis of both indexing and deduplication of the ChemSpider database since the inception of the platform. When the InChI was adopted we envisaged a future whereby the identifier would proliferate across journals, databases and the internet in general providing us a basis for “structure searching the internet”. This presentation will provide an overview of how the InChI has facilitated the integration of ChemSpider to chemistry on the internet, some of the surprising findings that have resulted from this work and extrapolate the influence of InChIs into the future for a chemically enabled web.
9:35 102 InChI names and keys: Do they add value to commercial software and databases
Keith T Taylor, keith.taylor@accelrys.com, Carmen Nitsche, Accelrys Inc, San Ramon, California 94583, United States
InChI names and keys provide a general identifier for a chemical structure that can be used to correlate chemical information in public and private data repositories. They are exposed, generated, and processed in many of Accelrys' software products and they are recorded in Accelrys' commercial databases. Although often thought of as a unique identifier, in many cases this is not true. How their uniqueness (or lack of uniqueness) creates opportunities and problems will be discussed.
10:05   Intermission.
10:25 103 Use of InChI in wikis
Martin A Walker1, walkerma@potsdam.edu, Aileen Day2. (1) Department of Chemistry, State University of New York at Potsdam, Potsdam, New York 13676, United States (2) Royal Society of Chemistry, Cambridge, United Kingdom
The International Chemical Identifier (InChI) provides a useful shortcut representation of chemical structures. On Wikipedia the InChI (and then InChIKey) was originally provided as a service to chemists, but now it can draw traffic from web searches (for example, from ChemSketch). It also provides a reference point, intersecting article with structure when performing structure validations. In ShareChemistry, the new RSC education wiki, new extensions have been written to make the wiki “structure-friendly”. In conjunction with the Ketcher drawing tool, the InChI is used to provide full structure searching within the wiki. In addition, the InChI is used in “predict the product” quiz questions or similar, where the student has to draw the product structure; this also utilizes the InChI structure to identify “nearly correct” answers and provide appropriate feedback. By allowing quizzes to go beyond simple multiple choice questions, the use of InChI can greatly enhance student learning.
10:55 104 InChIKey collision safety: Experimental estimation for algorithmically generated structure libraries
Andrey Yerin, erin@acdlabs.ru, Kirill Blinov, Advanced Chemistry Development, Inc. (ACD/Labs), Toronto, Ontario M5C 1T4, Canada
The InChIKey is a hash-based fixed length representation of the IUPAC International Chemical Identifier (InChI) and has growing importance in chemical informatics as a basis for searching and indexing chemical structures. Since it is composed of 22 variable letters the InChIKey theoretically has an extremely low collision rate but certainly cannot uniquely encode the whole of chemical space. While InChIKey collisions have already been reported experimental tests of collision rates for extremely large databases have not yet been performed. A protocol allowing for the generation of InChIKeys for algorithmically created virtual structure databases has been launched at ACD/Labs. We will report on our work analyzing large generated data sets and provide reliable statistical estimations of InChIKey collisions.
11:25 105 InChI here, InChI there, InChIs everywhere
Juergen Swienty-Busch1, J.Swienty-Busch@elsevier.com, David A. Evans2. (1) Elsevier Information Systems, Frankfurt, Germany (2) Elsevier Properties SA, Neuchatel, NE 2000, Switzerland
We will describe how Elsevier is using InChIs everyday in SciVerse ScienceDirect and Reaxys

Section B
San Diego Convention Center
Room 25C

Beyond the Database: New Models of Scholarship in an eScience World
P. Bourne, Organizer, Presiding
9:00   Introductory Remarks.
9:05 106 New searching paradigms in drug discovery enabled by semantic integration of public data
David J Wild1, djwild@indiana.edu, Erik A Stolterman1, Michael S Lajiness2. (1) School of Informatics and Computing, Indiana University, Bloomington, IN 47405, United States (2) Eli Lilly, Indianapolis, IN 46285, United States
The recent explosion of publicly available sources of data relating to drug discovery, along with electronic access to journal articles offers many new possibilities for knowledge discovery, but navigating the millions of data points relating to compounds, drugs, targets, genes, diseases and publications spread over hundreds of online datasets can be overwhelming. We will present work done at Indiana to semantically integrate many of these public data sources to provide a framework for identifying networks of information spread across datasets and publications of interest to a particular researcher, research question or hypothesis (for example, what can we find in public data that pertains to the relationship of Ibuprofen to Parkinson Disease?). Examples will be given of how prototype tools developed at Indiana and Eli Lilly can be used to effect these kinds of search, and what this might reveal about the future of search tools and paradigms for drug discovery. We will also discuss some of the barriers to performing these kinds of advanced search effectively, including data quality issues, properly accessing data in the text of journal articles, and extracting important relationships from “background noise”.
9:25 107 Collaborative computational technologies for biomedical research: An enabler of more open drug discovery
Sean Ekins1, ekinssean@yahoo.com, Antony J Williams2. (1) Collaborations in Chemistry, Fuquay Varina, NC 27526, United States (2) Royal Society of Chemistry, Wake Forest, NC 27587, United States
The current paradigm in the pharmaceutical industry is that products can only be created and developed by massive collaborative teams. Each company has to build their own costly R&D platforms and IT infrastructure. Other research industries realized decades ago that they had to share data and methods because of cost. The pharmaceutical industry has been slow to realize this. Expanding beyond our recent book (Collaborative Computational Technologies for Biomedical Research) in which a growing number of technologies, consortia, precompetitive initiatives and complex collaboration networks are described, we suggest a more open drug discovery is being enabled by collaborative computational technologies. Academia however, is not training the next generation of scientists to practice open science or even collaborate, this represents challenges and opportunities. We will describe our observations and make recommendations that impact everyone from technology developers to granting agencies. This may enable future discoveries to be made outside traditional institutions.
9:45 108 Enabling biomolecular simulation data sharing across institutions using a Grid architecture
Julien C Thibault1, julien.thibault@utah.edu, Thomas E Cheatham2, Julio C Facelli1,3. (1) Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah 84112, United States (2) Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah 84112, United States (3) Center for High-Performance Computing, University of Utah, Salt Lake City, Utah 84112, United States
Our work presents a Grid-based infrastructure that can enable biomolecular simulation data sharing across research labs. Raw and derived data include molecular dynamics atom trajectories and energies as a function of time. This would facilitate the development of new models (e.g. coarse-grain representations, force fields) and their assessment. Each data node is managed by the iBIOMES system, which creates a virtual data warehouse at the researcher's site by managing distributed file servers. The current implementation of iBIOMES offers a command-line interface that can be used to register simulation files into the system. An interactive web client is used to present simulation data to the researchers. External systems can query iBIOMES nodes through the Java API or the RESTful web service interface. Data queries are metadata-driven and supported by the iRODS framework. Using the caGrid toolkit, we will connect iBIOMES nodes together and enable federated queries across institutions.
10:05   Intermission.
10:15 109 Representing chemical information by URLs: The chemical identifier resolver as a general chemoinformatics tool
Marc C Nicklaus, mn1@helix.nih.gov, Markus Sitzmann.NCI-Frederick, National Cancer Institute, NIH, Frederick, MD 21702, United States
Traditional chemistry databases are siloed off as far as the data are concerned, and require, as an additional component, a specific user interface to access these data. Furthermore, this access is often an either-or proposition in that it either allows usage by a human on a compound-by-compound basis, or download of datasets in bulk from computer to computer. The Chemical Identifier Resolver (CIR) of the NCI/CADD Group, in contrast, allows direct access into its entire data store of currently 120 million structure records as well as to its chemical transformation capabilities. Access is by straightforward URLs that can be put together as easily by a human as by other web services, programming packages or scripting languages. We will present how CIR can be used to represent chemical structure data from, or through, InChI[Keys], chemical names, IUPAC names, SMILES, tautomeric forms, many different chemical file formats as well as calculated (physicochemical) properties.
10:35 110 Publication@Source: The Lab as a database
Jeremy G Frey1, j.g.frey@soton.ac.uk, Mark I Borkum1, Simon J Coles1, Tim Parkinson2. (1) Department of Chemistry, University of Southampton, Southampton, Hants SO17 1BJ, United Kingdom (2) Department of Electronics and Computer Science, University of Southampton, Southampton, Hants SO17 1BJ, United Kingdom
All data exists within a context: the human or machine that generated the data; the processes that were enacted; and the environments in which it occurred. The capture and dissemination of the context of data is of vital importance affording the data new potential for creating fresh value. In contrast to the top-down software development approaches, where the structure and semantics of the data are informed by the implementation of the software applications that manage the data, the Smart Research Frameworks (SRF) project software suite of data-agnostic software applications and frameworks when used in conjunction with a novel scientific methodology, which focuses on the explicit description of scientific intent and action, facilitates the automated and semi-automated capture and dissemination of richly structure data in context. By providing an appropriate query mechanism (SPARQL endpoint) the laboratory notebooks collectively effectively form a super database of raw and analysed data.
10:55   Concluding remarks.
Computational Approaches to Spectroscopy Analysis Spectroscopy of Small Things
Sponsored by COMP, Cosponsored by ANYL, CINF, and PHYS
Drug Discovery No Madness, Just Methods
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Molecular Mechanics Methodologies are Still Cool
Sponsored by COMP, Cosponsored by BIOL, CINF, MEDI, and PHYS

WEDNESDAY AFTERNOON

Section A
San Diego Convention Center
Room 27A

InChI Symposium
A. J. Williams, Organizer
A. Tropsha, Organizer, Presiding
1:00 111 InChIs as building blocks for complex substance identifiers
Yulia Borodina, yulia.borodina@fda.hhs.gov, Lawrence Callahan, Frank Switzer, Office of the Commissioner, FDA, Silver Spring, MD 20993, United States
The FDA Substance Registration System (SRS) registers and assigns UNique Ingredient Identifiers (UNII) to substances which may be simple chemicals or complex substances such as chemically modified biopolymers or synthetic polymers. These complex substances cannot be identified by a single InChI string however InChI strings may be used for identification of the structural elements these complex substances are comprised of. These structural elements include monomers, modifying agents and fragments. We will discuss the possibility of using InChI strings as building blocks for creating the more complex identifiers needed for registration of these complex substances.
1:30 112 Accessing NCI/CADD web resources by InChI
Markus Sitzmann, sitzmann@helix.nih.gov, Marc C. Nicklaus, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute/Frederick, NIH, DHHS, Frederick, MD 21702, United States
IUPAC's International Chemical Identifiers (InChIs/InChIKeys) are a vital tool to enable web-based linking between different sources of chemical content. We present an overview about how InChI/InChIKeys can be used to access our NCI/CADD web services which are part of our Chemical Identifier Resolver (CIR). The service is publically available at http://cactus.nci.nih.gov/chemical/structure and provides a simple and programmatic URL API to access a broad range of chemical structure information and chemical structure representation formats linked to a specific InChI/InChIKey.
At the time of writing, the database utilized by CIR indexes approx. 120 million structure records which have been aggregated from various small-molecule databases and which, after careful structure normalization including calculation of our NCI/CADD Chemical Structure Identifiers, comprise a set of approx. 80 million unique chemical structures. For the entire set of normalized structures, Standard InChI/InChIKeys and those using various non-standard sets of configuration flags of the InChI algorithm have been calculated. On basis of these different identifier sets, we will discuss the differences in structure identification between the InChI/InChIKey and NCI/CADD identifier sets that we have observed in the database used by CIR, and what these discrepancies can tell us about definition and design, scope, limitations and problems of chemical structure identifiers.
2:00 113 InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
Daniel M Lowe, daniel_lowe_uk@yahoo.co.uk, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire CB2 1EW, United Kingdom
Features of IUPAC nomenclature that cannot be represented in Standard InChI will be examined to draw caution to cases where the use of standard InChI (and even in some cases non-standard InChI) may result in a loss of information. These areas include the representation of tautomers and mixtures of stereoisomers.
2:10 114 InChI adoption at the Royal Society of Chemistry
Richard Kidd, kiddr@rsc.org, Royal Society of Chemistry, Cambridge, Cambs. CB4 0WF, United Kingdom
This flash talk will cover how the InChI standard has been used within the RSC and our contribution to the development of the standard.
2:20 115 Registration system of mcule: InChI is the key
Ferenc Szalai, rkiss@mcule.com, Robert Kiss, Mark Sandor.mcule.com, Budapest, Hungary
Mcule provides virtual screening services on the web to help identifying novel drug candidates by screening different databases. For these databases, it is essential to have a robust molecule registration system not depending on different drawing conventions, tautomeric states, etc. It is critical to assure that the same compounds get the same IDs and, most importantly, different compounds never get the same ID. To the best of our knowledge, InChI provides the best solution for this problem. In this presentation we would like to summarize how InChI is implemented into the mcule registration system and how it is used effectively with our vendor database and open registration services.
2:30   Intermission.
2:45 116 "UniChem": A prototype unified chemical structure cross-referencing and identifier tracking system
Jon Chambers, jon.chambers@ebi.ac.uk, Anna Gaulton, Anne Hersey, Mark Davies, John P Overington, Computational Chemical Biology Group, European Bioinformatics Institute, Cambridge, CAMBS CB10 1SD, United Kingdom
ChEMBL is an online database of bioactivity data for a large number of organic, drug-like compounds. These data are abstracted from the primary published literature, and are utilized to address a wide range of drug-discovery and chemical biology problems. Chemical structures within ChEMBL are standardized using a series of business rules, and replicated compound structures from different publications and sources are normalized on the basis of identical standard InChIs. Cross-referencing of these structures with identical structures in other chemistry databases is useful for the purposes of comparison and integration (for example, the creation of web links between database interfaces). Unfortunately, however, the process of creating and maintaining these cross-references often involves script-based, semi-manual steps. To assist in the automation of this process within our own institution, we have developed a prototype system (called 'UniChem') for archiving and cross-referencing of chemical structures and their identifiers from multifarious sources. The design of the system is modeled on that of the UniParc database, which serves a similar cross-referencing and archival function for protein sequences. UniChem uses the standard InChI as a means of normalizing between different sources, and in addition to providing up to date cross-linking information, is also able to track changes in identifier assignments over time.
3:15 117 Update on project to introduce InChI to researchers in the Department of Chemistry at Louisiana State University
William W Armstrong, notwwa@lsu.edu, Karen L Salazar.LSU Libraries, Louisiana State University, Baton Rouge, Louisiana 70803, United States
In order to be effective, InChI must be understood and used by researchers, educators, and publishers on a large scale. At the 2011 ACS meeting in Denver, the authors provided details of their plans to create a teaching methodology successful at enabling researchers at Louisiana State University to understand InChI and the need it fills as a universal, non-proprietary method of identifying chemical compounds that can take full advantage of new web-based communication and search technologies. Equally important, this teaching methodology will be designed to help researchers bridge the gap from understanding to application and actually begin to integrate InChI into their regular workflows. In this talk, the authors will review their progress and provide an analysis and evaluation of the results to date, along with the next steps. The final product can be employed at similar institutions worldwide.
3:45 118 Past, present and future of the InChI Trust
Jason N Wilde, j.wilde@nature.com, Nature Publishing Group, London, United Kingdom
The InChI Trust was established in 2009 with the aim to develop and support the non-proprietary IUPAC InChI standard and promote its use to the scientific community. Over the last 3 years the Trust has approved new versions of the InChI algorithm and established working parties to investigate the technical solutions required to tackle some of the more complex problems related to chemical structure representation i.e. Markush, Polymers and Mixtures etc… This talk summarises the impact/success of this work and sets out a road map for future development of the InChI standard.
4:15 119 InChiKey insertion technique for compound-specific and any-compound proximity search
Stephen K. Boyer, skboyer@gmail.com, Thomas Griffin, Alfredo Alba, Su Yan, Ying Chen, Scott Spangler, Eric Louie, Jeff Kreulen, Almaden Research Center, IBM Research, San Jose, California 95120, United States
The combined technologies of text analytics and name-to-structure conversions for reading and processing molecular structures provide researchers the ability to build large databases of structures and derive important relationships previously inaccessible, a capability important to discovery and innovation. Our previous work took this approach to produce SMILES strings that represented chemical structures used as input for subsequent applications, rendering the scientific and patent literature searchable by structure/substructure programs. We now report the additional ability to detect, normalize, and replace chemical names in documents with InChiKeys and then index the combined text and embedded InChi's using SOLR, a Lucene-based full text-indexing engine. The resulting index supports Boolean combinations of chemical compounds and regular text words and phrases. It also supports proximity searching. The net result is that we can now perform searches for exact chemical structures or even unspecified chemical structures within a specified context.
4:35 120 Exploring almost every InChI of nature.com
Laura J Croft, l.croft@nature.com, Nature Publishing Group, London, United Kingdom
With the launch of Nature Chemistry in April 2009 the number of chemical structures published on nature.com has increased rapidly. Since then we have made efforts to increase the discoverability for our readers of information relating to chemical structures both within our own article pages and elsewhere on the web. This talk will showcase some of the trials and tribulations of using InChIs to represent chemical structures on nature.com.
4:55   Concluding remarks.

Section B
San Diego Convention Center
Room 25C

Libraries and Institutional Research Evaluation
L. Solla, Organizer
A. Twiss-Brooks, Organizer, Presiding
1:30   Introductory remarks.
1:35 121

Finding the future: Using research analytical tools with journal article databases and social media data to identify high-impact research leaders and programs
Elizabeth A. Brown, ebrown@binghamton.edu, Libraries, Binghamton University, Binghamton, NY 13902-6012, United States
Research administrators need to identify promising researchers and future growth areas and recognize established research innovators. Journal article indexes measure past research accomplishments through citation and reference counts. This data is inadequate for identifying and ranking future high-impact research activities and programs. Research analytical tools built from literature databases can readily identify future high-impact activities, institutional peers and potential collaborators and recruits. Libraries can use collections data to assess and identify future collections priorities. Examples of how the libraries can implement and use these research analytical tools will be shown. Informal scholarship activities provide data on research impact. This includes the blogosphere, shared data sites, social networks, subject digital repositories, and popular media presence. Site metrics provide data on high-impact activities among peers and supplement research and library database tools. Examples of these activities and their impact will be shown.

Presentation (pdf)

2:00 122

Providing comparative data on published research impact (internally and externally)
Donna T. Wrublewski, dtwrublewski@ufl.edu, Denise B. Bennett, Valrie I. Davis, Michelle Leonard, George A. Smathers Libraries, University of Florida, Gainesville, Florida 32611, United States
Science librarians at the University of Florida have recently been involved in two data collection projects requested by university departments. The first required the collection and evaluation of comparative indicators of research productivity, with a five-day turnaround, comparing UF with selected Chemical Engineering departments at five other institutions. The second involved the development of publication lists for the University of Florida Clinical and Translational Science Institute to use in their Executive Council meetings to identify and analyze output of affiliated Institute researchers. Both projects required negotiated understanding of the needs and expectations of each department request and involved manual manipulation and analysis of data. These studies established rough guidelines for procedures that can be used in the future for institutional requests, and the methodological issues encountered are broadly applicable to many “fact-finding” scenarios.

Presentation

2:25 123

Social networking tools as public representations of a scientist
Antony J Williams, williamsa@rsc.org, Department of Informatics, ChemSpider, Royal Society of Chemistry, Wake Forest, NC 27587, United States
The web has revolutionized the manner by which we can represent ourselves online by providing us the ability to exposure our data, experiences and skills online via blogs, wikis and other crowdsourcing venues. As a result it is possible to contribute to the community while developing a social profile as a scientist. At present many scientists are still measured by their contributions using the classical method of citation statistics and a number of freely available online tools are now available for scientists to manage their profile. This presentation will provide an overview of tools including Google Scholar Citations and Microsoft Academic Search and will discuss how these are and other tools, when integrated with the ORCID identifier, may more fully recognize the collective contributions to science. I will also discuss how an increasingly public view of us as scientists online will likely contribute to our reputation above and beyond citations.

Presentation (html)

2:50   Intermission.
3:00 124

Next era of research productivity evaluation: A multidimensional research assessment framework
Daniel Calto, d.calto@elsevier.com, Atyab Tahir, Elsevier Inc., United States
There is a clear need for performance measures related to research productivity. But which types of content and metrics should be included in such measures? What do reasonable performance measures look like, and how should they be applied? Many advocates of new metrics state that measurements need to be more quantitative and objective. Others argue that qualitative metrics capture critical aspects about a researcher's overall effectiveness.
We propose that no single metric is adequate to capture the true dynamics of a researcher's productivity, but that an approach combining qualitative and quantitative metrics can lead to a reasonable and standards-based approach to evaluating research productivity. We would like to speak on a matrix created by Elsevier's bibliometric expert that provides the evaluator with a flexible framework to identify which elements are measured and which metrics to use, and shows how the purpose of the evaluation helps to determine the structural elements of the assessment.

Presentation (pdf)

3:25 125 Measuring research: Beyond H
Daniel Hook, daniel@symplectic.co.uk, Symplectic Limited, London, United Kingdom
Research information management (RIM) systems give us more possibilities than ever before to capture and analyze data about the research taking places in academic institutions. The key challenge in making RIM systems successful is ensuring that data is collected efficiently. To ensure quality, faculty are often involved in this process: There needs to be a tangible benefit to them and assurances that data, once collected, won't be used negatively.
A concerning issue for many faculty is use H-Index. This is symptomatic of a more general fear of inappropriate or uninformed use of bibliometric measures. Classic bibliometric measures, such as H-Index or Impact Factor, might typically be thought of as the equivalent of "mean averages". This causes unease in faculty as these averages lack context. We will consider the role of the library in data collection and in educating faculty about bibliometric measures, which implicitly contextualise data and go beyond H.
3:50 126

Methods and solutions for measuring and benchmarking the impact of research
Daphne Grecchi, Daphne.Grecchi@thomsonreuters.com, IP & Science, Thomson Reuters, Philadelphia, PA,, PA 19130, United States
Academic institutions are increasingly tasked with demonstrating research productivity via objective measurements. The library can play an important role in quantitative research evaluation, in terms of implementing and educating. Bibliometrics has been the primary method for decades. Traditional indicators focused on journal literature are most applicable to the sciences. The analysis of books and patents as further research outputs can augment traditional journal metrics and new approaches such as network-based metrics offer additional perspectives on the research landscape. Bibliometrics should be used in context and in conjunction with other research performance measures. Thomson Reuters has worked in the field of citation indexing and analysis for over 50 years, beginning with our roots as the Institute for Scientific Information. We offer a suite of options from web-based evaluation tools like InCites and Research In View, to customized reports and engineered systems. Examples of metrics and methods from these tools will be discussed.

Presentation (pdf)

Computational Approaches to Spectroscopy Analysis Spectroscopy of Slightly Bigger Things
Sponsored by COMP, Cosponsored by ANYL, CINF, and PHYS
Drug Discovery Stomping Bugs, Drug Style (Anti-infectives)
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Material Science
Sponsored by COMP, Cosponsored by CINF, PHYS, PMSE, and POLY
Molecular Mechanics Applications: No Abbreviations
Sponsored by COMP, Cosponsored by CINF, MEDI, and PHYS

THURSDAY MORNING

Section A
San Diego Convention Center
Room 27A

General Papers Chemical Databases, Drug Discovery, and Chemical Structure Representation
R. Bienstock, Organizer, Presiding
8:30 127 Can we really do computer-aided drug design?
Matthew D Segall, matt.segall@optibrium.com, Optibrium Ltd., Cambridge, United Kingdom
We will explore the accuracy of current computational methods in drug discovery, including 2D and 3D QSAR, docking, pharmacophore, molecular dynamics and quantum mechanical approaches. Based on this, we will address the question of whether we are truly operating in a drug design paradigm. We will compare this with the application of computational methods to the discovery of new drugs. From this alternative perspective, computational methods can add significant value to guide decisions about which chemistry to pursue and which can be rejected with confidence; focussing resources on the chemistry that is most likely to succeed, while avoiding missed opportunities. This is particularly important in the multi-parameter optimisation of high quality drug candidates that require a balance of many properties to succeed downstream.
8:50 128 Where screening starts: Effective preprocessing of chemical libraries
Matthias Hilbig, Adrian Kolodzik, Sascha Urbaczek, Matthias Rarey, rarey@zbh.uni-hamburg.de, Center for Bioinformatics, University of Hamburg, Hamburg, Germany
Today, every scientist has electronic access to more than 13 million commercially available compounds via ZINC [1] and even more when structure collections like PubChem [2] are used. Although computing speed still increases substantially, modeling tasks like docking, pharmacophore and 3D-similarity searching remain demanding tasks. Furthermore, they all rely on high quality structures including correct protonation and tautomeric states. Preprocessing chemical libraries, especially filtering them by restricting simple scalar properties and applying substructure-based exclusion rules, is therefore a frequently applied task. This step is often performed with scripts or pipelining tools, making the adaption of the library to the individual target and modeling task difficult. We propose to use a more interactive process in order to tailor-make compound collections on a case-by-case basis. We developed a tool named Mona supporting this process by handling a multitude of descriptors for large data sets in an efficient database so that compound collections can be customized on the fly.
[1] ZINC library, http://zinc.docking.org
[2] PubMed, http://pubchem.ncbi.nlm.nih.gov
9:10 129 Toward a gold standard: Improving the quality of public domain chemistry databases
Antony J Williams1, Sean Ekins2, ekinssean@yahoo.com. (1) Royal Society of Chemistry, Wake Forest, NC 27587, United States (2) Collaborations in Chemistry, Fuquay Varina, NC 27526, United States
In recent years there has been a dramatic increase in the number of freely accessible online databases serving the chemistry community such that the internet now has a rich array of chemistry data. This is useful for data-mining, computer modeling, and for integrating into other systems to expand data accessibility and aid drug discovery. With this improved data accessibility comes a responsibility to ensure that it is as high quality as possible. This will prevent scientists from wasting time performing erroneous searches, creating flawed computational models etc. Improved discoverability of online resources should not be marred by the delivery of incorrect data. We will describe our experiences with multiple chemical compound databases and other online resources. We will suggest approaches to collaborate to deliver definitive reference data sources for researchers and additionally describe the creation of a new wiki for the community to contribute and rank databases (www.scidbs.com).
9:30   Intermission.
9:40 130 ChemSpider as a knowledge base
Valery Tkachenko1, tkachenkov@rsc.org, Antony Williams1, Aileen Day2, Jon Steel2. (1) Department of Informatics, ChemSpider, Royal Society of Chemistry, Wake Forest, NC 27587, United States (2) Department of Informatics, ChemSpider, Royal Society of Chemistry, Cambridge, United Kingdom
The amount of information on the internet is proliferating at such a speed that it is difficult to comprehend how much data will be available online in the coming years. The domain of Chemistry is surprisingly complex using its own multiple languages of chemical names, chemical structures, terminologies and ontologies. Data encoded in these forms is already on the web. Numerous efforts have been made to capture and host these data and enable them to be discoverable. ChemSpider has previously focused on being a "structure-centric database" but efforts are now afoot to extend the system into a "chemistry knowledgebase". Traditionally machine-to-machine communications were facilitated by the use of web services but the diversity of chemistry-related information makes it hard to provide comprehensive web services layer for knowledge bases. An alternative approach using semantic web technologies such as SPARQL has been implemented and this presentation will report our work.
10:00 131 ChemSpider as a chemical term resolver
Valery Tkachenko, Antony Williams, WilliamsA@rsc.org, Department of Informatics, ChemSpider, Royal Society of Chemistry, Wake Forest, NC 27587, United States
In recent years, in parallel with the general broad trend of information proliferation, many tens of public chemical databases have been created and made available using internet technologies. In many cases fluent data exchange has occurred between these various databases as they source information from one another. While this has the advantages of linking together multiple data sources the results also include the proliferation of errors across the various databases. The lack of a public authority to resolve such errors significantly affects the quality of freely accessible chemical information. While ChemSpider has previously allowed a crowdsourcing approach to curation efforts have now migrated to addressing this problem using a "federated resolver" approach. This presentation will report on our work in this area.
10:20 132 How to design chemical patterns easily with an interactive editor
Karen T. Schomburg, schomburg@zbh.uni-hamburg.de, Lars Wetzer, Matthias Rarey, Center for Bioinformatics, University of Hamburg, Hamburg, Germany
Chemical patterns are descriptions of generic chemical structures. They are essential for methods like searches in molecule databases or filtering of datasets, which practicing chemists employ frequently. However, due to their background, the representations of patterns (as linear languages like SMARTS1) are optimized for efficient computational interpretation. Their regular expression-like constitution makes them hard to use and creates an impediment to work with many fundamental chemoinformatic methods. A graphical interface to chemical patterns similar to structure diagrams is more adequate to the standards in chemical society. It supports the understanding of single patterns as well as becoming familiar with the concept of chemical patterns. The SMARTSviewer provides a visualization concept following the graphic standards of structure diagrams, along with an interactive editor, allowing intuitive design of chemical patterns from scratch. No specific knowledge about the SMARTS language is needed to understand or design a chemical pattern.
[1] James, C. A., Weininger D., Daylight Theory Manual. Daylight Chemical Information Systems, Inc. of Aliso Viejo, CA, available at www.daylight.com
10:40 133 Lexichem TK 2.1.0
Edward O Cannon, ed.cannon@eyesopen.com, OpenEye Scientific Software, Santa Fe, NM 87508, United States
Lexichem is a chemical nomenclature toolkit created by OpenEye Scientific Software that provides a fast and reliable way of converting chemical names to chemical structures and back in over 10 different languages. This presentation aims to give an overview of the new developments introduced in Lexichem toolkit version 2.1.0. A new performance metric based on percentage round tripping of canonical isomeric smiles is introduced. The advantages of this metric, and why it will be used as a benchmark in future software releases will be discussed.
Using this new metric, we report significant increases (up to 39%) in performance from our previous release of Lexichem (v2.0.2) over four databases.
Drug Discovery Inside of a Ligand
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Material Science
Sponsored by COMP, Cosponsored by CINF, PHYS, PMSE, and POLY
Molecular Mechanics Proteins are Just Plain Interesting
Sponsored by COMP, Cosponsored by BIOL, CINF, MEDI, and PHYS

THURSDAY AFTERNOON

Drug Discovery Talking About Ligands
Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI
Material Science
Sponsored by COMP, Cosponsored by CINF, PHYS, PMSE, and POLY
Molecular Mechanics Application of Our Hip Methodologies
Sponsored by COMP, Cosponsored by BIOL, CINF, MEDI, and PHYS
Molecular Mechanics Proteins: There is Nothing Plain and Simple About 'Em.
Sponsored by COMP, Cosponsored by BIOL, CINF, MEDI, and PHYS