Technical Program (abstracts)

ACS Chemical Information Division (CINF)
Fall, 2012 ACS National Meeting
Philadelphia, PA (August 19 - 23)

CINF Symposia

R. Bienstock, Program Chair

SUNDAY MORNING

Philadelphia Marriott Downtown
Room 302/303

When Chemists and Computers Collide: Putting Cheminformatics in the Hands of Medicinal Chemists Cosponsored by MEDI
Matthew Segall, Organizer, Presiding
8:30   Introductory Remarks.
8:35 1 Intersection of chemists, computers, and automation: A fully integrated approach to SAR generation
David M Parry1, dave@cyclofluidic.co.uk, Christopher N Selway1, Willem P Van Hoorn2. (1) Cyclofluidic Ltd, Welwyn Garden City, Herts AL7 3AX, United Kingdom, (2) Accelrys Ltd, 334 Cambridge Science Park, Cambridge CB4 0WN, United Kingdom
Medicinal chemistry involves an iterative process of design, synthesis, biological assay and analysis of molecules to feed into the next learning cycle. Cyclofluidic is developing an integrated microfluidic platform enabling the automated and rapid generation of SAR data in medicinal chemistry. A key element of this approach is the ability to close the loop i.e. utilising the biological data as it is generated in the design of the next iteration of synthesis and screening. A number of design methods have been developed, including simple maximising potency and multi-parameter optimisation to apply to different medicinal chemistry scenarios. An overview of the Cyclofluidic approach will be provided with a particular focus on the design methods, their validation and use on the platform.
9:25 2 Data in, data out - visualization of SAR data organized through an ELN
Anis Khimani, Anis.Khimani@PERKINELMER.COM, Philip Skinner, philip.skinner@perkinelmer.com, Megean Schoenberg, Phil McHale, Michael Swartz, Kate Blanchard. PerkinElmer Informatics, Cambridge, MA 02140, United States
The increasing footprint of Electronic Lab Notebooks in biopharmaceutical organizations has led to an evolving role for such systems. ELNs initially provided mostly document management (Word, Excel) capabilities along with specific stoichiometric tools for synthetic chemistry. As systems evolved however, there was an increasing need for the ELN to simultaneously capture both the structured assay data generated in an organization and the supporting experimental information previously captured. This supports an efficient workflow, but it also creates visualization challenges, as the biological data is stored within a notebook's experiment-based data hierarchy, whereas visualization is more project, target or chemical structure based. We will describe the addition of a structured data model into an ELN which allows for the facile capture and management of in vitro and in vivo data. We will describe how that data is combined with relevant chemical information and exposed for visualization and exploration of Structure Activity Relationships.
9:35 3 MPO for the masses: Practical application of multiparameter optimisation to guide compound design and selection
Edmund Champness, ed@optibrium.com, Matthew Segall, Chris Leeding, James Chisholm, Iskander Yusof. Optibrium Ltd., Cambridge, CB25 9TL, United Kingdom
A successful drug must have a balance of many physicochemical and biological properties; potency against a therapeutic target is not sufficient. The simultaneous optimisation of multiple factors is commonly described as 'multi-parameter optimisation' (MPO) and remains a major challenge for drug discovery. It is almost impossible to juggle many parameters in your mind at one time, while searching a potentially vast and complex 'chemical space' for an optimal compound. Data visualisation can help, but is not enough to easily draw conclusions, due to the complexity of the available data. Computational approaches have been developed to support drug discovery scientists to achieve true MPO and improve the efficiency and productivity of drug discovery. We will discuss approaches to MPO and show how it can be made accessible in an intuitive way to guide confident, objective decisions on the design and selection of high quality compounds throughout the drug discovery process.
10:05   Intermission.
10:15 4 Computation tools for medicinal chemists: Increasing the dimensions of drug discovery
Robert Scoffin, rob@cresset-group.com, Cresset BioMolecular Discovery Ltd, BioPark Hertfordshire, Welwyn Garden City, Herts AL7 3AX, United Kingdom
There is widespread understanding and acceptance that 3D structure and related properties (shape & electrostatics) are key to understanding absolute and relative compound activities; however, there is a relative dearth of successful examples where 3D properties of molecules have been successfully rendered and handled in an environment which is amenable to widespread use in medicinal chemistry. We present examples of tools which have a proven track record in the hands of medicinal chemists, and which have been used to enhance compound design, strengthen IP positions and improve discovery project success rates
10:45 5 Application of automated and validated virtual screening workflows: A hand-tool for medicinal chemists to generate and/or evaluate ideas
Ferenc Szalai, Márk Sándor, Zoltán Szalai, Gáspár Körtesi, Enikő Dorogi, Róbert Kiss, rkiss@mcule.com. mcule.com, Budapest, Budapest H-1096, Hungary
Efficient collaboration of medicinal and computational chemists is critical for successful drug discovery projects. Computational chemists can help to find the rationale behind the biological data and recognize structure-activity relationships. To make the hit identification, hit-to-lead and lead optimization phases more efficient it is critical that computational chemists build predictive models to prioritize compounds for purchase or synthesis. Data mining, model building and validation can be, however, very time consuming. Here we show how this process can be facilitated by an automated approach. Mcule provides automated and validated virtual screening workflows for a large number of targets that can be used to prioritize synthesis ideas and compounds for purchase during the optimization process. These workflows can be easily run by medicinal chemists less familiar with molecular modelling. Furthermore, these screening workflows can be applied to the up-to-date chemical supplier database of mcule. Top ranked compounds can serve as valuable chemical starting points for early phase projects and are ready to order at mcule.com.
11:05 6 Competitive data science: A tale of two web services
David C Thompson2, david.thompson@boehringer-ingelheim.com, Joerg Bentzien1, Ingo Muegge1, Ben Hamner3, ben.hamner@kaggle.com. (1) Medicinal Chemistry, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, Connecticut 06877, United States, (2) Public Affairs and Corporate Communications, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, Connecticut 06877, United States, (3) Kaggle, Inc., San Francisco, CA 94107, United States
On March 16th 2012 Boehringer Ingelheim, in partnership with data science company Kaggle, launched a crowd-sourcing competition to predict a biological endpoint. At the time of writing this abstract, less than a full week into the competition, there are 85 teams, comprised of 103 players who have made 327 entries; 29 of which are 'better' or more predictive than the best initial benchmark. During this presentation we will describe the provenance of this project, the complete and full structure of the underlying data set and competition, and some thoughts as to the broader utility of such 'gamification' approaches to the field of modeling in the pharmaceutical industry. Following completion of the competition, through use of our BIpredict platform, a MOE/web server based tool that provides in silico descriptor and model calculations to the scientist's desktop, we will describe how these models are subsequently exposed to the medicinal chemist.
11:25 7 Putting chemical informatics in the hands of prospective medicinal chemists: QSAR modeling of Plasmepsin II inhibition, an undergraduate project
Jeremy G Frey, j.g.frey@soton.ac.uk, Jonathon W Essex, Simon J Cioles. Department of Chemistry, Univeristy of Southampton, Southampton, United Kingdom
We report on the success of running a high level undergraduate course introducing computational drug design as part of chemoinformatics. The students are given a series of lectures on chemical information, chemical informatics and some chemometric technqiues and introduced to several QSAR modelling techniques. A significant part of their assessment is a project in which they build a QSAR model based on the molecules from K. Ersmark etal, J. Med Chem. 2005, 48 , 6090-6106), which provide the activity test data resulting from tests of a series (of related) molecules that have been proposed as potential anti-malarial drugs which interact with the haemoglobin degrading aspartic proteases plasmepsin II (Plm II) as the target protein. The students are encouraged to access online chemical informatics resources and build a statistical model using the JMP software. This optional course has run for over 5 years and will soon become an integral part of the mainstream undergraduate course. The resulting projects (including the report, a presentation and a web site) have shown that the students develop very significant insight into the modelling process.
11:45 8 Better 2D reports for 3D decisions: Communicating with medicinal chemist
Tamsin E Mansley, tamsin@eyesopen.com, Krisztina Boda, Bob Tolbert. OpenEye Scientific Software, Inc., Santa Fe, NM 87508, United States
Effective project decision making from complex molecular modeling data can be challenging. 3D visualization applications go some of the way towards putting this information in the hands of medicinal chemists. However, chemists are more productive using 2D structural representations. Projecting 3D molecular properties onto 2D depictions opens up a novel way of converting information to knowledge for chemists. Reports made with the new GraphemeTK provide representations that allow visualization of complex 3D molecular properties and scores in a clear and coherent 2D format. The reports allow comparison of molecules in the context of their 3D properties and facilitate effective molecule selection and decision making. Examples will be presented highlighting the application of these reports for data analysis from virtual screening and lead optimization applications.

 Philadelphia Marriott
Conference Room 307

Science and the Law: Analytical Data in Support of Regulation in Health, Food, and the Environment Cosponsored by AGFD, ANYL, CHAL, ENVR, MEDI, PROF, TOXI
William Town, Organizer, Presiding
8:30   Introductory Remarks.
8:40 9 Honey analysis by high-sensitivity cryo-13C-NMR
Istvan Pelczer, ipelczer@princeton.edu, Frick Chemistry Laboratory, Princeton, New Jersey 08544, United States
Honey is a very complex, biologically extremely valuable product of honeybees. It consists of hundreds of components, most of those are various sugar derivatives. Detailed analysis of honey is a formidable task, especially that many of the components are unstable and/or don't behave well in chromatography-MS conditions. Quality of honey is a very important safety and market issue both, so reliable characterization of honey samples is an essential requirement. NMR spectroscopy has the unique advantage of being naturally quantitative and non-discriminative. Sample preparation is usually simple, while the measurement is robust and highly reproducible. Mixture analysis by NMR is often done by using 1H-NMR, however, for complex biomaterials, such as honey, 13C-NMR is much more suitable. Given the extended chemical shift regime (up to 200 ppm) and the singlet nature of each carbon resonance, both dispersion and resolution are superior to that available in proton. In addition, 13C chemical shifts are better correlated with structural features making identification of components more feasible. Detection of natural abundance 13C-NMR is of low sensitivity by nature, but recent developments in optimized cryoprobe technology make this approach truly competitive. In our laboratory quantitative 13C-NMR of a typical honey sample can be run within an hour. If only qualitative analysis is necessary the time required falls back to ten minutes or less. Using the resolving power of 13C-NMR we can distinguish the sugar components in-situ, without any separation or pre-treatment of the sample required; the sample is studied in its natural condition. In addition, large proportion of the additional components can be identified and analyzed quantitatively. We have been looking at various honey samples of different season and/or different origin using cryo-13C-NMR. The analysis of the data may include statistical methods, as well as direct comparative component analysis using efficient prediction tools and database information. The highly rich, quantitative information provided by cryoprobe-assisted 13C-NMR spectroscopy makes it a prime analytical tool for detailed analysis, identification of its origin, and for quality control of food and food ingredients, such as honey.
9:10 10 Ensuring that nutrition and health claims on foods and dietary supplements are justified and scientifically substantiated
David P Richardson, info@dprnutrition.com, School of Chemistry, Food and Pharmacy, University of Reading, Reading, Berkshire RG6 6UR, United Kingdom
Consumers should be able to make choices based on clear and accurate information and to have confidence in the scientific and regulatory processes used to support nutrition and health claims on foods and dietary supplements. Global regulatory developments relate to mandatory nutrition labelling, the characterisation of foods and food constituents with health benefits (functional foods), the additions of nutrients to foods, the setting of minimum levels and maximum safe levels of vitamins, minerals and other substances with nutritional and physiological effects, and the controls necessary to verify compliance with the relevant labelling requirements. In addition, biomarkers or risk factors used to reflect a physiological health benefit associated with a function or a reduction of risk of disease must be validated both biologically and analytically. The paper will review the scientific, analytical and regulatory issues and highlight the global challenges facing the scientific community, the food industry and enforcement authorities.
9:40   Intermission.
9:55 11 Hunting and gathering: Locating information on the cusp between science and legislation
Judith N. Currano, currano@pobox.upenn.edu, Chemistry Library, University of Pennsylvania, Philadelphia, PA 19104-6323, United States
This talk will take a case study approach to examine methods of finding information on the science and legislation dealing with food, drugs, and the environment. These issues tend to elicit a strong emotional response from scientists, legislators, and the general public, and each group frequently cites "hard facts" to bolster its assertions. We will demonstrate useful techniques to find the data behind the issues, presenting resources accessible to researchers in all areas, as well as the general public.
10:25 12 Contact lens materials and multipurpose solutions: Lessons learned from laboratory research
K. Scott Phillips1, Kenneth.Phillips@fda.hhs.gov, Victoria Hitchins2, Dinesh Patwardhan1. (1) Division of Chemistry and Materials Science, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland 20993, United States, (2) Division of Biology, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland 20993, United States
Since their introduction four decades ago, soft contact lenses have become an extremely popular medical device, composing over 90% of the contact lens market today. Recent microbial keratitis outbreaks among soft contact lens wearers in the United States and around the world prompted regulatory science research at FDA to address this public health challenge. Multidisciplinary research was conducted to support updating the FDA lens grouping system and study material-solution interactions. This talk will discuss our research efforts in the areas of materials chemistry and bioanalytical chemistry, the project's contribution to current regulatory science knowledge, and potential implications that the data has for public health. In particular, we will discuss 1) discoveries about the properties of silicon hydrogel contact lens materials, and 2) new insights into preservative uptake and depletion.
10:55 13 Steps towards the analytical standards required for science-based tobacco product regulation
Derek Mariner, derek_mariner@bat.com, Kevin McAdam, Christopher Proctor. Group R&D, British American Tobacco, Southampton, United Kingdom
The tobacco industry is becoming increasingly regulated since the enactment of the US Family Smoking Prevention and Tobacco Control Act (FSPTCA) in 2009 which gave the Food and Drug Administration jurisdiction over tobacco products, and the World Health Organization Framework Convention on Tobacco Control (FCTC), which came into effect in 2005, and to which more than 170 countries are Parties. This talk will examine the data about tobacco products and tobacco smoke toxicants that have been collected previously, as well as how analytical methods have been developed in this field. This will be contrasted with the data and analytical methods that urgently need to be developed today to ensure that regulatory decisions made regarding tobacco products are informed by a sound scientific base.

 Philadelphia Marriott
Conferene Room 306

Hunting for Hidden Treasures: Chemical Information in Patents and Other Documents Cosponsored by CHAL, SCHB
Wei Deng, Organizer, Presiding
8:30   Introductory Remarks.
8:35 14 Case studies in Markush searching: Using Markush structures in patents for chemical property description
Donald Walter, don.walter@thomsonreuters.com, Intellectual Property Services, Thomson Reuters, Alexandria, VA 22314, United States
Unlike scientific literature, which answers questions based on experimental evidence, patents claim solutions to problems based partly on experimentation and partly on predictions of what should work. Markush chemical structures are a prime example. They can predict and disclose astronomical numbers of structures which may have some application, although in practice only a few of the predicted structures may have the desired activity. This talk will summarize some new tools available for Markush searching, and show how the results of Markush searching can be used to identify the best leads in a patent.
9:00 15 Building pathways to the world's disclosed scientific research
Roger Schenck, rschenck@cas.org, Department of Content Planning, Chemical Abstracts Service, Columbus, Ohio 43202, United States
CAS, the world's authority for chemical information, continues to see growth in patent applications, especially from Asia. Underscoring the fact that nearly half of the new substance registrations in 2011 came from patents, in May of 2011 CAS reported that the 60 millionth CAS Registry Number® was assigned to a novel substance from a Chinese patent application. As more and more chemical information is generally being published in patents, this presentation will cover how CAS is keeping up with this rapid growth. A description of how CAS scientists extract chemical information, including the tools they use, how incomplete descriptions of the chemistry and inconsistencies are addressed, and quality assurance protocols will be covered. The processing of Markush representations will be described. The talk will end with a few examples of how customers access this content in the CAS databases.
9:25 16 Natural products Markush
Jayaraman Packirisamy, jayaram1976@hotmail.com, Yogitha Pathuri. Discovery Informatics, Sristi Biosciences Private Limited, Hyderabad, India
Chemical diversity of nature demands simpler representations for analysing its chemistry as well as its diversity. Heading our discovery research or natural products research towards Decision Centric Research Intelligence (DCRI) is relatively simple when we establish a knowledge management system that is Markush-enabled. Chemistry research tools need ways to create variations like how nature thinks. Mapping the complex bio-synthetic pathways of secondary metabolites can be made simple with Markush representations. Markush structure allows us to offer requisite variations for representing complex bio-synthetic pathways of nature. This signifies a representative Markush flow both linear as well as lateral. Curation of Markush from published information is simple, when the source offers it directly. However from research context, it is important to recognize individual structures from published sources and group it for Markush representations. We have developed a compound class based grouping of markush structures - Natural Products Markush (NPM) and mapped it against bio-synthetic pathways. NPM as a critical decision-making tool for Screening and Drug Discovery.
9:50   Intermission.
10:00 17 Mission impossible? Computer aided extraction of generic chemical structures from patents: A critical review of the technologies applied and some results of the Theseus project "ChemProspector"
Josef Eiblmaier, je@infochem.de, Valentina Eigner-Pitto, Hans Kraut, Larisa Isenko, Heinz Saller, Peter Loew. InfoChem GmbH, Munich, Germany
The research project ChemProspector is based on a sophisticated interaction of several cutting edge technologies in the area of information extraction from documents. Chemical named entity extraction, chemical image recognition, work-up of ChemDraw files and relation mining interact in a concerted action to identify, extract and store Markush structures. This lecture will describe the single components utilized and will demonstrate a series of worked examples. A critical review of the results will be presented. ChemProspector was developed within the framework of THESEUS, currently Germany's largest IT research program which is funded by the German Ministry of Economics and Technology.
10:25 18 Chemical text mining for current awareness of pharmaceutical patents
Daniel M Lowe1, daniel@nextmovesoftware.com, Roger A Sayle1, Paul Hongxing Xie2, Sorel Muresan2. (1) NextMove Software, Cambridge, Cambridgeshire CB4 0EY, United Kingdom, (2) Discovery Sciences, AstraZeneca R&D, Molndal, Sweden
The increasing rate of pharmaceutical patent publication makes keeping current in medicinal chemistry ever more difficult for the practising research chemist. The USPTO alone publishes over 6000 applications each week. Following which of these are relevant to an on-going project in a timely manner is a challenging but critical task. This talk will describe a system for automatically downloading US grants and applications as they are published, extracting and mining relevant information, and storing the results in databases searched by web-based interfaces. Difficulties include handling the variety of file formats used, selecting the pharmaceutically relevant subset for analysis, indexing of structured data and the text mining of unstructured data such as chemicals, targets and diseases. Entities (chemical and Markush structures, R-groups and reactions) are extracted from text and images. A set of key “index” compounds are prioritized and used to cluster documents based on their chemical content.
10:50   Intermission.
11:05 19 Periscope system for encoding and searching chemical Markush structures
David A Cosgrove, david.cosgrove@astrazeneca.com, Jon Winter, Andrew G Leach, Andrew Poirrette, Keith M Green. AstraZeneca, Macclesfield, Cheshire SK10 4TG, United Kingdom
The encoding and searching of Chemical Markush Structure searches has received little attention in the literature of late. We have recently developed a language and associated software for the encoding and searching of Markush structures such as those found in chemical patents. We will describe the system and demonstrate its application to the extraction of structure-activity relationships from a chemical patent by way of a Free-Wilson-style analysis based on the Markush structure and the chemical structures and associated activities disclosed in an example patent. Ref: A System for Encoding and Searching Markush Structures, J. Chem. Inf. Mod. Submitted.

SUNDAY AFTERNOON

Philadelphia Marriott Downtown
Room 302/303

Cheminformatics and Drug Repurposing
Jose Medina-Franco, Rachelle Bienstock, Organizers
Jose Medina-Franco, Presiding
1:30   Introductory Remarks.
1:35 20 Drug repurposing and multitargeted drug design are the positive faces of failures in reductionist mechanistic drug design
Christopher A Lipinski, clipinski@meliordiscovery.com, Scientific Advisory Board, Melior Discovery, Waterford, CT 06385-4122, United States
Drug repurposing or drug repositioning is the term frequently applied to the discovery of new useful activity in an older clinically used drug. Drug discovery clinical success rates of ten percent and preclinical success rates of thirty percent lead to a questioning of every aspect of early drug discovery. Clinically used drugs are never selective for a single target and biological signaling mechanisms are robust and redundant. Most pathway knockouts are phenotypically silent. Hence a clinically effective drug is rare and is likely to have unexpected useful activity beyond that originally envisioned. Unmet medical needs among orphan and rare diseases suggest that drug repurposing from clinically used drugs could be a major solution. Although a bit of an exaggeration; there is a lot of truth in the saying that “we do not need to find new drugs; rather we need to find the patients who can benefit from existing drugs”.
2:00 21 Cheminformatic/bioinformatic mining of large corporate databases for drug repurposing
William Loging, will.loging@boehringer-ingelheim.com, Raul Rodriguez-Esteban,, Jon Hill, Thomas Freeman. Computational Biology and Knowledge Management, Boehringer-Ingelheim, Ridgefield, CT 06480, United States
Frequent failures of experimental medicines in clinical trials question current abilities for predicting drug effects in the human body. Therefore, the approach of drug repositioning is an important consideration for any life science organization. By using knowledge-driven systems in the form of large data stores and applying rational in silico experimental design, researchers have generated workflows that are capable of identifying novel uses for drugs that span the therapeutic pipeline and beyond. Both broadly accessible data, such as Medline and Chembank, in addition to internal proprietary data of the company in the form of gene chip experiments, compound screening databases, and clinical trial information play an important role in the success of drug repositioning. By reviewing how current and past successes have been accomplished along with the data used, important stratagems emerge that can provide a wealth of ideas for novel workflows, as well as provide a guide for future discoveries.
2:25 22 Improving drug development by connecting medicinal chemistry with drug repositioning and modern machine learning methods
Iwona E Weidlich1, iweidlic@umbc.edu, Igor V Filippov2, Ian Thorpe1. (1) Department of Chemistry and Biochemistry, University of Maryland, Baltimore County, Baltimore, MD 21250, United States, (2) Chemical Biology Laboratory, Basic Science Program, SAIC-Frederick, Inc., National Institutes of Health,Frederick National Lab, Frederick, MD 21702, United States
Developing drug candidates from scratch has turned into a billion-dollar expense that is not delivering enough profitable products to market. Novel approaches which merge chemistry with biology and informatics contribute to the development of selective lifesaving drugs needed by patients. We implement machine learning classifiers for HTS Data Analysis, Screening and drug repurposing with high probability of selecting drug candidates eligible for Phase II of clinical study free from ADME/Tox related problems. We used small molecule bioactivity data for HCV RNA Polymerase to train and test QSAR models and apply these robust models for compound ranking and hit identification in drug repositioning techniques. Random Forest and kNN algorithms were used with Morgan fingerprints of 679 small molecules with curated IC50 values. After filtering various drug-like databases (DrugBank, MDL, NIAID NIH, ComGenex) compounds were selected and tested against HCV. We discuss the challenges in drug repositioning faced in academia, government and pharmaceutical industry.
2:50 23 Mining small-molecule screens to repurpose drugs
S. Joshua J Swamidass, swamidass@gmail.com, Department of Pathology and Immunology, Washington University in St Louis, St Louis, MO 63110, United States
Repurposing and repositioning drugs---discovering new uses for existing and experimental medicines---is an attractive strategy for rescuing stalled pharmaceutical projects, finding treatments for neglected diseases, and reducing the time, cost and risk of drug development. As this strategy emerged, academic researchersbegan performing high-throughput screens (HTS) of small molecules---the type of experiments once exclusively conducted in industry---and making the data from these screens available to all.Several methods can mine this data to inform repurposing and repositioning efforts. Despite these methods' limitations, it is hopeful that they will accelerate the discovery of new uses for known drugs, Repurposing and repositioning drugs---discovering new uses for existing and experimental medicines---is an attractive strategy for rescuing stalled pharmaceutical projects, finding treatments for neglected diseases, and reducing the time, cost and risk of drug development. As this strategy emerged, academic researchersbegan performing high-throughput screens (HTS) of small molecules---the type of experiments once exclusively conducted in industry---and making the data from these screens available to all. Several methods can mine this data to inform repurposing and repositioning efforts. Despite these methods' limitations, it is hopeful that they will accelerate the discovery of new uses for known drugs.
3:15 24 Diverse valid 3D-QSAR models of off-target effects from template CoMFA
Richard D Cramer, cramer@tripos.com, Department of Science, Tripos DE, St. Louis, MO 63144, United States
Although the likelihood that multiple valid QSAR models exist for most training sets is generally acknowledged, there are few examples. Template CoMFA newly offers a productive means of exploring such a probable diversity, conveniently performed by varying the template and informatively presented by the familiar contour displays of 3D-QSAR. These concepts will be exemplified by applying template CoMFA to various hERG and CYP training sets.
3:40   Intermission.
3:50 25 Drug design and repositioning using MLSD (multiple ligand simultaneous docking)
Chenglong Li, li.728@osu.edu, Department of Medicinal Chemistry and Biophysics, The Ohio State University, Columbus, OH 43210, United States
My lab has developed a novel MLSD (multiple ligand simultaneous docking) strategy to find the optimal fragment combinations binding to the most important protein target “hot spots,” followed by tethering to generate virtual template compounds. The fragments can be linked to generate novel leads, and we designed two novel lead inhibitors based on one of the lead templates. Existing drugs can be repositioned for the new targets via the new lead templates. Case studies will be presented.
4:15 26 Mining public domain data as a basis for drug repurposing
Antony J Williams1, williamsa@rsc.org, Sean Ekins2, Valery Tkachenko1. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) Collaborations in Chemistry, Fuquay Varina, NC 27526, United States
Online databases containing high throughput screening and other property data continue to proliferate in number. Many pharmaceutical chemists will have used databases such as PubChem, ChemSpider, DrugBank, BindingDB and many others. This work will report on the potential value of these databases for providing data to be used to repurpose drugs using cheminformatics-based approaches (e.g. docking, ligand-based machine learning methods). This work will also discuss the potentially related applications of the Open PHACTS project, a European Union Innovative Medicines Initiative project, that is utilizing semantic web based approaches to integrate large scale chemical and biological data in new ways. We will report on how compound and data quality should be taken into account when utilizing data from online databases and how their careful curation can provide high quality data that can be used to underpin the delivery of molecular models that can in turn identify new uses for old drugs.
4:40 27 Drug-drug relationship based on target information: Application to drug target identification
Dongsup Kim, kds@kaist.ac.kr, Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
Drugs that bind to common targets likely exert similar activities. In this target-centric view, the inclusion of richer target information may better represent the relationships between drugs and their activities. Under this assumption, we expanded the “common binding rule” assumption of QSAR to create a new drug-drug relationship score (DRS). Our method uses various chemical features to encode drug target information into the drug-drug relationship information. Specifically, drug pairs were transformed into numerical vectors containing the basal drug properties and their differences. After that, machine learning techniques—such as data cleaning, dimension reduction, and ensemble classifier—were used to prioritize drug pairs bound to a common target. In other words, the estimation of the drug-drug relationship is restated as a large-scale classification problem, which provides the framework for using state-of-the-art machine learning techniques with thousands of chemical features for newly defining drug-drug relationships. Various aspects of the presented score were examined to determine its reliability and usefulness: the abundance of common domains for the predicted drug pairs, c.a. 80% coverage for known targets, successful identifications of unknown targets, and a meaningful correlation with another cutting-edge method for analyzing drug similarities. The most significant strength of our method is that the DRS can be used to describe phenotypic similarities, such as pharmacological effects.
5:05 28 Application of computational target fishing approaches in drug repurposing
Mohamed D AbdulHameed, mabdulhameed@bhsai.org, Anders Wallqvist, Gregory J. Tawa. Computational Drug Design Group, DoD Biotechnology High Performance Computing Software Applications Institute, US Army Medical Research and Materiel Command, Frederick, MD 21702, United States
Structure-based and ligand-based approaches can be used to facilitate drug repurposing. We have carried out structure-based, target-focused repurposing screens against bacterial targets. But the lack of available X-ray crystal structure for important drug targets like G Protein Coupled Receptors (GPCRs) presents a major limitation for the broad application of structure-based approaches. To address this issue, we have developed a ligand-based target profiling approach using the program, Rapid overlay of chemical structures (ROCS). Approved drug molecules known to interact with a target protein are used as target representatives. The target profile of an arbitrary query compound is generated by using the maximal shape and chemical overlap between the query molecule and target representatives. We will present a validation of this approach using the Directory of Useful Decoys (DUD). We will then show the application of this approach to drug repurposing and systems pharmacology.

 Philadelphia Marriott
Conference Room 307

Science and the Law: Analytical Data in Support of Regulation in Health, Food, and the Environment Cosponsored by AGFD, ANYL, CHAL, ENVR, MEDI, PROF, TOXI
William Town, Organizer, Presiding
1:00   Introductory Remarks.
1:05 29 Rapid screening methods for pharmaceutical surveillance
Lucinda Buhse, Lucinda.Buhse@fda.hhs.gov, Connie M Gryniewicz-Ruzicka, Jamie D Dunn, Sergey Arzhantsev, John A Spencer, Jason Rodriguez, Benjamin J Westenberger, John F Kauffman. US FDA, CDER, Division of Pharmaceutical Analysis, St. Louis, Missouri 63101, United States
Historically, FDA surveillance of pharmaceutical products and ingredients has involved sampling, sending to a laboratory, and waiting for results from extensive testing such as that in the United States Pharmacopeia. An ever increasing percentage of products and ingredients are now coming from overseas, potentially increasing consumer exposure to poor quality, counterfeit and adulterated pharmaceutical products. In response to this situation, the FDA has developed rapid and portable screening methods to assess the quality and safety of pharmaceutical products at ports of entry. This presentation will briefly describe the current spectroscopic methods being utilized by the FDA for field surveillance of pharmaceutical products including Raman, near infrared (NIR), x-ray fluorescence (XRF) and ion mobility (IMS) spectrometries,. Experiences with method development, chemometric data analysis, and field deployment will be discussed.
1:35 30 Analytical procedures and the regulation of new drug development
George Lunn, george.lunn@fda.hhs.gov, Office of New Drug Quality Assessment, Food and Drug Administration, Silver Spring, Maryland 20993, United States
Analytical procedures are critical to the assurance of the quality of new pharmaceuticals that are to be placed on the market. The information that should be submitted to the FDA is governed by the Food, Drug, and Cosmetic Act, Title 21 of the Code of Federal Regulations, and various guidances. This talk will focus on these requirements and recommendations. In addition blinded examples of the type and amount of information supplied and the FDA review process will be discussed.
2:05   Intermission.
2:20 31 Mapping the human toxome for new regulatory tools
Thomas Hartung, thartung@jhsph.edu, Paul Locke. Johns Hopkins University, School of Public Health, Environmental Health Sciences, Center for Alternatives to Animal Testing, Baltimore, Maryland 21205, United States
Today's mechanistic toxicology is effectively relying to large extend on methodologies which substitute or complement traditional animal tests. The biotechnology and informatics revolution of the last decades has made such technologies broadly available and useful. In the US, especially the NAS vision report for a toxicology in the 21st century and its most recent adaptation by EPA for their toxicity testing strategy have initiated a debate how to create a novel approach based on human cell cultures, lower species, high-throughput testing and modeling. A systematic mapping of the entirety of pathways of toxicity, the Human Toxome, has been started. The lecture summarizes the lessons learned from the development, validation and acceptance of alternative methods for the creation of a new approach for regulatory toxicology. Beside the technical development of new approaches, a case is made that we need both conceptual steering, an objective assessment of current practices by evidence-based toxicology (modeled on Evidence-based Medicine) and implementation into legislation.
2:50 32 Cooperation between the US EPA and industry to develop an in vitro ocular hazard testing strategy
Rodger Curren, rcurren@iivs.org, Institute for In Vitro Sciences, Inc., Gaithersburg, Maryland 20878, United States
An example of how scientific data can be incorporated into regulatory policy is provided by recent EPA actions on hazard labeling of anti-microbial cleaning products (AMCP). Although most cleaning products are not EPA regulated, any making an “anti-microbial” claim are considered pesticides and must undergo an EPA registration process. This normally means that AMCPs must be tested in rabbits for eye irritation to provide data for hazard labeling. A suggestion from the EPA's Pesticide Program Dialog Committee to explore non-animal methods to provide the needed eye irritation information led to the formation of an industry consortium to develop a database comparing non-animal versus animal ocular data for AMCPs. EPA scientists were continually updated on the scientific progress of the project and eventually determined that a pilot program for the adoption of a non-animal testing strategy could be established. Success with the pilot program has now led to a permanent guideline.
3:20 33 Environmental databases: A trip down memory lane and new journeys in the 21st century
Frederick W Stoss, fstoss@buffalo.edu, Oscar A. Silverman Library, University at Buffalo--SUNY, Buffalo, NY 14260, United States
This presentation compares the “environmental” content of several STEM bibliographic databases (e.g., BIOSIS Previews; Compendex-Plus: Engineering Index, GeoBase, GEOREF; MEDLINE; SCOPUS and the Web of Science: Science Citation, Social Science Citation, and Arts & Humanities Citation Indexes). Various value-added “analyze” or “refine” function are discussed for each database. Several subject-specific databases for environmental content are similarly analyzed and compared, and include: AGRICOLA; ; Cambridge Scientific Abstracts: ASFA 3: Aquatic Pollution and Environmental Quality, Ecology Abstracts, Environment Abstracts, Environmental Engineering Abstracts, Pollution Abstracts, Risk Abstracts, Sustainability Science Abstracts, Toxicology Abstracts, and Water Resources Abstracts; Environment Complete; ETDE World Energy Database; GreenFile; E&E Publications: E&E Daily, Greenwire, ClimateWire, Land Letter and E&ENews PM, Factiva and GREENR.The presentation also features several new environmental databases: EarthTrends, EnviroFacts (EPA), Environmental Fate Database (SRC), Global Change Master Directory, MapCruzin', National Science Digital Library, Right to Know Network and Scorecard.
3:50   Concluding Remarks.

 Philadelphia Marriott
Conferene Room 306

Hunting for Hidden Treasures: Chemical Information in Patents and Other Documents Cosponsored by CHAL, SCHB
Wei Deng, Organizer, Presiding
1:30   Introductory Remarks.
1:35 34 Imago: Open-source toolkit for chemical structure image recognition
Rostislav Chutkov, rchutkov@ggasoftware.com, Michael Rybalkin, Victor Smolov, Kliton Andrea. GGA Software Services LLC, Saint Petersburg, Russian Federation
We present the open-source Imago toolkit designed for automatic extraction and conversion of chemical structures from raster image formats into a molecular structure representation format used in cheminformatics. We focused on recognition of photographed or scanned images containing noise, various outlines, different spacing, non-straight lines, non-uniform lighting, and etc. The designed recognition procedure is represented as a series of successive approximations, where on each recognition step we try to extract as much useful information as possible and reconstructs logical layout on-the-fly. To resolve different ambiguities we are using optimization tree, based on the distance metric between source and recognized elements.
2:00 35 Recent developments in the CLiDE tool for extraction of chemical structure data from patents and other documents
Aniko T Valko1, Aniko.Valko@keymodule.co.uk, Peter Johnson2, Vilmos A Valko1. (1) IT and software development, Keymodule Ltd., Leeds, West Yorkshire LS17 8JQ, United Kingdom, (2) School of Chemistry, University of Leeds, Leeds, West Yorkshire LS2 9JT, United Kingdom
Chemists routinely communicate information about structures and reactions in the form of 2D structure diagrams, which are easily understood by readers but are not directly accessible for processing by chemical information systems which require a connection table format for chemical structures. CLiDE is an established optical chemical structure recognition (OCSR) tool that aims to address this problem. Recent improvements to the CLiDE system will be presented, including the way CLiDE processes different types of documents such as patents, journal articles and internal reports. New methods for tackling problematic scenarios originating from document quality degradation and difficult drawing features will be discussed as will improvements in the chemical intelligence of CLiDE's structure checker and structure fixer modules. These improvements have a considerable effect on CLiDE's accuracy and speed. A detailed study of CLiDE's performance on some widely available datasets will be presented alongside that of some publically available OCSR systems.
2:25   Intermission.
2:35 36 Advances in automatic chemical spelling correction
Roger A Sayle, roger@nextmovesoftware.com, Daniel M Lowe. NextMove Software, Cambridge, Cambridgeshire CB4 0EY, United Kingdom
With the impressive progress made in chemical name-to-structure software, the major cause of failing to identify chemical entities in documents is decreasingly the complexity of the molecular structure or nomenclature used, but instead the presence of OCR failures, human spelling errors, hyphenation and line breaking issues. This talk will present statistics on the prevalence of typographical issues found in patent documents from different patent offices, and recent progress on algorithms for handling the high errors rates (sometimes ten or more character substitutions per IUPAC-like name) that are frequently encountered in practice.
  • Roger Sayle, Paul Hongxing Xie and Sorel Muresan, "Improved Chemical Text Mining of Patents with Infinite Dictionaries and Automatic Spelling Correction ", Journal of Chemical Information and Modeling (JCIM), Vol 52, No. 1, pp. 51-62, January 2012.
3:00 37 Automated extraction of structure-activity relationships from chemistry patents
Lutz Weber, lutz.weber@ontochem.com, OntoChem GmbH, Halle (Saale), Sachsen-Anhalt 06120, Germany
We have developed a novel, comprehensive technology to automatically extract structure-activity relationships from chemistry patents. First, named entities are annotated: using large dictionaries and name-to-structure tools, chemical entities and compound classes from our chemical ontology are annotated. Similarly, diseases, biological, pharmaceutical and physiological effects are annotated. In a second step, potential anaphora are resolved, e.g. numbers or underdetermined entities are replaced by their more precise meaning. In a third step, sentences that contain relationships about chemical compounds and effects are analyzed for their syntax using automated tools that determine potential relationship types using a fine grained relation specific relationship ontology. As a last step, the output of normalized relationship triples or n-tuples is generated. These results are than analyzed for their quality using statistical and other criteria to derive a validated SAR that could be used as input for databases or search engines.
3:25 38 From chaos to order: Collecting chemical and biologic information in the documentation space
Daniel Bonniot de Ruisselet, dbonniot@chemaxon.com, David Deng. ChemAxon, Hungary
Much chemical information is buried deeply and scattered in a chaos within documents. The structures may take different forms, as names (IUPAC, common, generic ), strings (SMILES, InChI), images, numbers (CAS registry number, Enzyme EC numbers), or embedded objects. The documents may be proprietary or publicly available. They may also exist in various formats (PDF, images, PowerPoint slides, HTML, etc.) In this presentation, we demonstrate how ChemAxon goes beyond their Naming Technology and extract as much chemical and biological information as possible from documents. In addition, location information is also returned to help users pinpoint the specific structure entity. The extraction is automated and integrated with other ChemAxon applications for indexing and searching. A public web service (chemicalize.org) has been set up with an interactive interface for chemical information visualization and extraction from documents.
3:50   Intermission.
4:00 39 Exploring and visualisation of chemistry in patents with Marvin and Instant JChem
Alexander G Klenner, alexander.garvin.klenner@scai.fraunhofer.de, Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI), Fraunhofer Gesellschaft, Sankt Augustin, Hesse 53754, Germany
We present a grid-based solution for chemical named entity recognition (NER) in patent collections that are provided in PDF format. Our architecture identifies and extracts IUPAC and trivial names of chemical compounds and translates them into InChI keys that can subsequently be used to generate structures for each identified entity with Marvin. All structures are stamped into the original PDF as 'pop-up' chemicals together with hyperlinks to corresponding sites of chemspider and pubmed. A generated bookmark tree in the PDF allows access for all identified compounds. Additionally all retrieved chemicals are stored in a ChemAxon JChem database together with a reference to the original patent. JChem enables structural search for the processed patent collection and filtering options. The workflow is based on UIMA and can easily be adapted to incorporate different chemical NER tools. UNICORE is used to access grid resources for efficient parallelization of all processes.
4:25 40 SureChem: Integrating patent chemistry with public and private nonpatent research resources
Nicko Goncharoff, n.goncharoff@digital-science.com, SureChem (Digital Science), Macmillan Publishers Ltd, London, United Kingdom
Recent developments in text mining technology and cloud computing make it possible to reliably extract high value chemistry from full text patents and other documents in an automated fashion. However, this data becomes far more valuable when it can be linked to other resources available to the scientific community. SureChem is automatically generated patent chemistry database that is directly linked to key public databases as well as proprietary publisher content. This presentation explores examples of data intersects resulting from these links and explores potential benefits for researchers.
4:50 41 Bringing structures to the disorder: Adding chemical intelligence to unstructured documents
Kate Blanchard, Kate.Blanchard@PERKINELMER.COM, Philip Skinner, philip.skinner@perkinelmer.com, Phil McHale, Megean Schoenberg, Rudy Potenzone, Scott Flicker, Joshua Wakefiled, Sean Greenhow, Robin Smith. PerkinElmer Informatics, Cambridge, MA 02140, United States
Information stored within organizations is dispersed unpredictably in many different systems and often within unstructured documents. In particular, chemical structures may be stored within native application files, embedded in documents or flattened as images such as in PDFs. These documents may then be stored in formal document management systems, or on file-shares or hard-drives across the organization. These documents were traditionally assumed to be un-searchable, despite holding valuable information locked up in patents, reports, conception and study documents. A central database was created to index chemical structures added into the various file-shares and systems across an organization. The database provided a central location and federated access to multiple repositories of unstructured documents. Addition of an optical structure recognition process incorporated structures hitherto lost in flattened image form. We will describe creation of this tool and application within a major global organization to enable structure-based searching of previously inaccessible sources.
5:15   Concluding Remarks.

SUNDAY EVENING

Sonesta Hotel Philadelphia,
Room Liberty A

CINF Scholarship for Scientific Excellence
Guenter Grethe, Organizer
6:30 - 8:30
  42 Torsion Fingerprint Deviation: A novel measure to compare small molecule conformations
Christin Schärfer1, schaerfer@zbh.uni-hamburg.de, Tanja Schulz-Gasch2, Matthias Rarey1, Wolfgang Guba2. (1) Center for Bioinformatics (ZBH), University of Hamburg, Hamburg, Germany, (2) F. Hoffmann-La Roche Ltd, Basel, Switzerland
Objectivity, intuitive interpretation, and its easy, automated calculation make the relative RMSD the measure of choice for comparing small molecule conformations. However, there are some significant weaknesses in RMSD comparisons, e.g. when averaged over large databases with structurally diverse molecules RMSD loses its intuitive interpretation because it strongly depends on the size of the molecule.
We have developed a novel measure to compare conformations of small molecules called Torsion Fingerprint Deviation (TFD). It compares Torsion Fingerprints from two conformations of a molecule, taking deviations in torsion angles for each acyclic bond and each ring system into account. To give deviations at topologically central bonds or rings a higher influence on the TFD than deviations at terminal bonds or rings, we added a Gaussian weighting scheme to the calculation.
We validated and compared the TFD to relative RMSD. Results show that TFD overcomes major limitations of RMSD while retaining its advantages.
  43 Similarity based virtual screening: Effect of the choice of similarity coefficient
Hua Xiang, huaxiang2010@gmail.com, John Holliday, Peter Willett. Information School, University of Sheffield,, Sheffield, South Yorkshire S1 4DP, United Kingdom
Similarity searching is one of the most common methods for ligand-based virtual screening and any similarity measure that is to be used for similarity-based virtual screening (SBVS) has three principal components: the structure representation, the weighting scheme, and the similarity coefficient [1]. The many previous studies of SBVS that have been carried out have demonstrated that effective screening can be achieved using binary fingerprints and the Tanimoto coefficient. The work reported here compares the Tanimoto coefficient with other coefficients, and demonstrates that one of these, the cosine coefficient, exhibits a much greater degree of robustness in the face of variations in the nature of the fragment weighting scheme [2] that is being used. We also report a comparison of the effectiveness of 44 different similarity coefficients when used for SBVS with binary, unweighted fingerprints and compare these results with those obtained when the coefficients are used for QSAR.
  44 Predicting complex phase behaviour of lyotropic liquid crystals in crystallographic screens
Tu C. Le, tu.le@csiro.au, Xavier Mulet, Charlotte E. Conn, Frank R. Burden, David A. Winkler. Materials Science and Engineering, CSIRO, Clayton South MDC, Victoria 3122, Australia
Novel amphiphilic materials that form bicontinuous cubic phases are generating substantial interest due to a wide range of applications, one of which is to incorporate and support the growth of membrane protein crystals for x-ray structure analysis. However the cubic phase may transit to other lipidic mesophase structures under the influence of the different components within the crystallization screen. Furthermore the mesophases may evolve with time, a process that is poorly understood but critical for controlled crystal growth. Recent advances in high-throughput screening of lipid systems have allowed us to generate a large body of data on the influence of screen components on the cubic phase. However it has been difficult to deconvolute individual effects in the multi-component system present during a crystallisation trial. We have therefore developed robust and predictive models that predict how the phase behaviour of lyotropic liquid crystals changes over time and under the influence of crystallization additives. A state-of-the-art machine learning method using Bayesian regularized neural networks was employed to generate models, and our work demonstrates that the complex phase behaviour of amphiphilic nanostructured nanoparticles can be captured with high accuracy. This approach also allowed us to determine which components were most relevant to the evolving phase behaviour individual mesophases.
  45 Prioritization of docking poses in human serotonin and dopamine transporters by the use of common scaffold clustering
Amir Seddik1, Barbara Zdrazil1, barbara.zdrazil@univie.ac.at, Rene Weissensteiner1, Harald H Sitte2, Gerhard F Ecker1. (1) Department of Medicinal Chemistry, University of Vienna, Vienna, Austria A-1090, Austria, (2) Institute of Pharmacology, Medical University of Vienna, Vienna, Austria A-1090, Austria
Biomolecular docking is a frequent applied molecular modeling technique in order to predict preferred binding orientations of ligands in proteins

[1] In our research group we use experimental data guided docking approaches and common scaffold clustering for the unbiased prioritization of docking poses without relying on energetic scoring functions [2-4]. Recently, we have applied this workflow to amphetamine derivatives in homology models of the serotonin and dopamine transporters. Final aim of the study is the elucidation of the mephedrone binding mode - an amphetamine analog with increasing reports of abuse. We found equivalent binding modes for both transport proteins: One, where the ligands' orientation was perpendicular, another, where it was parallel to the membrane. Both poses might be part of a dynamic transition between two low-energy states during binding. Ongoing 3D-QSAR studies and mutational experiments, will help in further prioritization of one of the poses. [1] Lengauer T, Rarey M (1996) Computational methods for biomolecular docking. Curr. Opin. Struct. Biol. 6 (3): 402-406.
[2] Sarker S, Weissensteiner R, Steiner I, Sitte HH, Ecker, GF, Freissmuth M, Sucic S (2010) The High-Affinity Binding Site for Tricyclic Antidepressants Resides in the Outer Vestibule of the Serotonin Transporter. Mol. Pharmacol. 78 (6) 1026-1035.
[3] Klepsch F, Chiba P, Ecker GF (2011) Exhaustive Sampling of Docking Poses Reveals Binding Hypotheses for Propafenone Type Inhibitors of P-Glycoprotein. PLoS Comput Biol 7(5): e1002036. doi:10.1371/journal.pcbi.1002036
[4] Richter L, de Graaf C, Sieghart W, Varagic Z, Mörzinger M, de Esch IJP, Ecker GF, Ernst M (2012) Diazepam-bound GABAA receptor models identify new benzodiazepine binding-site ligands. Nat. Chem. Biol. Epub ahead of print. doi:10.1038/nchembio.917

  46 Integrated Chemoinformatics approaches to virtual screening in the search of new lead compounds against Leishmania
Rodolpho C. Braga1, rcbraga@gmail.com, Luciano M. Lião2, José C. B. Bezerra3, Marina C. B. Vinaud3, Carolina H. Andrade1, carolina@farmacia.ufg.br. (1) Federal University of Goias - Brazil, (2) Instituto de Quimica, Universidade Federal de Goias, Goiania, Goias 74001-970, Brazil, (3) Instituto de Patologia Tropical e Saude Publica (IPTSP), Universidade Federal de Goias, Goiania, Goias 74605-050, Brazil
One potential new target for leishmaniasis chemotherapy is sterol 14-demethylase (CYP51), a cytochrome P450 enzyme involved in biosynthesis of membrane sterols. In search of inhibitors of CYP51 from Leishmania sp., we used chemoinformatics as a complement to the ligand- and structure-based virtual screening (VS) approaches. A database of over 800,000 drug-like molecules (ChemBridge) was filtered to remove undesirable compounds (FILTER), calculated partial charges using AM1-BCC method (QUACPAC) and assemble exhaustive conformational quality database for screening (OEMGA). To identify and prioritize compounds with optimal properties, we have applied chemometric methods such as cluster analysis, principal component analysis and partial least-square (PLS) using the different scoring functions (TanimotoCombo, Chemgauss4, Chemgauss3, Glide gscore, XP GScore, DrugScore eXtended, Chemical Gaussian Tanimoto), ADMET descriptors and 3D Pharmacophore Fingerprints. Finally, proposals for novel inhibitors are suggested.

MONDAY MORNING

Philadelphia Marriott Downtown
Room 302/303

Future of the History of Chemical Information Cosponsored by HIST
Andrea Twiss-Brooks, Leah Solla, Organizers, Presiding
9:00   Introductory Remarks.
9:05

47

Historical cantilevering
Leah R Solla1, leah.solla@cornell.edu, Peter F Rusch2. (1) Cornell University, United States, (2) Committee on Nomenclature, Terminology and Symbols, American Chemical Society, United States
Even deep participation in much of the transformation of chemical information storage, processing and access over the past thirty-five years can only lead to speculation on the next one-hundred years of development. Still, many trends are clear. These will be reviewed for their likelihood and impact.

Presentation (pdf)

9:20 48

Language and symbolism of chemistry - a historical perspective
William G Town, bill.town@kilmorie.com, Kilmorie Clarke Ltd, London, United Kingdom
The language and symbolism of chemistry is described from the earliest manifestations of chemistry in ancient times. through to the end of the 19th century

Presentation (pdf)

9:45 49

Chemical structures
Philip McHale, philmchale@comcast.net, PerkinElmer Informatics, United States
Representations of chemical structures, whether hand-drawn on a napkin, displayed on a screen or printed in a journal or patent, provide a lingua franca for chemists, and the language of chemical structures has evolved to keep pace with our increasing understanding of the nature of bonds and the spatial arrangements of atoms within molecules. Some early dialects such as linear formulae only conveyed partial information, and the apparently complete descriptions afforded by linear notations were reserved for the cognoscenti and spoken by very few practicing chemists. This talk will survey this evolution in handling structures and illustrate how parallel developments in structural representation, technology (graphics terminals), and informatics (connection tables) have made handling chemical structures a commonplace activity, and have increased the roles which structures can play in chemistry-related endeavours.

Presentation (pdf)

10:10   Intermission.
10:20 50

Evolution of computer based structures from WLN to InChI
Stephen Heller, srheller@nist.gov, NIST/CBRD, United States
Over the past 60+ years chemists have developed a number of approaches to represent chemical structures in a computer readable form. This presentation will provide a practical and opinionated overview of the many activities in the area that have taken place over this time period and explain why some have succeeded while others have failed, mostly for non-technical reasons.

Presentation (pdf)

10:45

51

History of chemical reactions information: Past, present, and future
Guenter Grethe, ggrethe@att.net, Unaffililiated, Alameda, CA 94502-7409, United States
The history of chemical information can be traced back as far as the 4th century when Zosimos of Panopolis published probably the oldest known books on alchemy containing, for example, the experimental descriptions for the preparation of gold from metals. It was not until the 17th century, that Robert Boyle, the father of modern chemistry, published the “Sceptical Chymist”, thereby starting the great age of chemistry, particularly in England, France and Germany. It was almost 200 years later that chemical societies in these countries started publishing proceedings and journals. The first abstracting and indexing source was the Pharmazeutische Zentralblatt published in 1830, later renamed Chemisches Zentralblatt. Though these sources contained information about the preparation of compounds and materials, information sources dedicated to chemical reactions started almost 100 years later with the publication of Houben-Weyl, Methods of Organic Chemistry, in 1909. It took another 80 years until electronic, structure searchable reaction databases became available. Since then, reaction databases have greatly increased in size. Today, modern applications, such as synthesis planning and classification, and improvements of interfaces provide chemists with many ways to use reaction data more efficiently. The presentation will provide an historic overview, provide examples from today's use and take a look at the future of reaction information.

Presentation (pdf)

11:10   Panel Discussion.

 Philadelphia Marriott
Conference Room 307

Open Notebook Science/Open Chemistry/Electronic Lab Notebook
Philip McHale, Jean-Claude Bradley, Organizer
Jean-Claude Bradley, Presiding
8:30   Introductory Remarks.
8:35

52

Perspective on the market for Electronic Laboratory Notebook technology
Michael H Elliott, melliott@atriumresearch.com, Atrium Research & Consulting LLC, Wilton, CT 06897, United States
The market for ELN technology is robust, growing over 20% per year. With over 30 suppliers spanning the range from simple unstructured experiment data capture to workflow-based solutions with complex structured data models, the choice of a solution can be daunting. For example, what works well for synthetic chemistry may not be optimal for in vivo study support. The challenge is to not just understand today's needs but also the capabilities required for a future state. Through extensive analysis of ELN implementations across a wide number of industries over the last nine year, the author will present best practices and pragmatic considerations for product selection and implementation.
9:05 53 Successful selection and deployment of an ELN throughout a medical university
Cecilia Bjorkdahl, Cecilia.Bjorkdahl@ki.se, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
This sessions covers a case study for how Karolinska Institutet Medical University selected, rapidly deployed and successful implemented a collaborative Electronic Laboratory Notebook (ELN) throughout the university in partnership with 2 other Swedish universities . The talk will focus on the initial goals, challenges and best practices for selecting and deploying a single ELN to a large number of users, and the benefits associated with the deployment. Today more than a 1000 researchers have adopted the ELN with an expectation to reach 2000 users by the end of 2012 across all 3 universities.
9:35 54 Getting academic synthetic chemists to use ELNs: Why and how?
Richard J Whitby1, rjw1@soton.ac.uk, Bogdan Ibanescu1, Tim Dickens2, Brian Brooks2. (1) Chemistry, University of Southampton, Southampton, HANTS SO17 1BJ, United Kingdom, (2) Department of Chemistry, University of Cambridge, Cambridge, Cambs CB2 1EW, United Kingdom
The EPSRC funded Dial-a-Molecule Grand Challenge Network has the 40 year aim of making the delivery of any desired molecule as quick as ordering a stock chemical. Its roadmap identified the lack of complete reaction data as a key barrier to progress, and the widespread adoption of ELNs in academia a necessary enabling step. Currently only one U.K. chemistry department, Cambridge, has an established ELN, and its usage is modest. A survey of UK academics revealed the main barriers to adoption, and demonstrated a widespread willingness to move to the use of ELNs. In response a 16 university, 300 user 6 month pilot of a cloud-delivered ELN has been organised. This talk will present the background to the 'National ELN' initiative and the outcomes and learnings from the pilot. It will also describe the outcomes of a JISC funded study to establish what can be learnt from the Cambridge deployment.
10:00   Intermission.
10:10 55 Changing role of electronic lab notebooks: How “First To File” is shifting the landscape
Michael Swartz, Michael.Swartz@PERKINELMER.COM, Philip Skinner, philip.skinner@perkinelmer.com, Phil McHale, Alex Jewett, Kate Blanchard. PerkinElmer Informatics, Cambridge, MA 02140, United States
The adoption of electronic laboratory notebooks has become increasingly widespread since their introduction almost a decade ago. Early in their evolution, the decision to implement was heavily influenced by an organization's view of the legal landscape, and whether electronic signatures would prove sufficiently robust to withstand legal scrutiny. With the recent introduction of patent reform, moving to a worldwide system of “first to file”, the landscape is shifting. There is an increased emphasis on the ability of electronic lab notebooks to increase individual efficiencies, including shortening the time to file, in addition to continued intellectual property protection. Notebooks are increasingly providing collaboration and decision support and the boundaries between notebooks and other data systems are blurring. This presentation will describe how the changing environment is altering the way in which key organizations view the role of notebooks now and evolving for the future.
10:35

56

Leveraging the cloud for partnerships and collaboration
Frederic Bost, Frederic.Bost@scynexis.com, Scynexis, Research Triangle Park, NC 27709, United States
This session will present how a cloud based environment has been used to support collaboration and communication for scientific projects across academic and commercial partnerships. The session will focus on the secure and efficient exchange of information in a global community to enhance the collaborative capabilities of teams sharing science and project knowledge. The cloud removes many limitations of geographically-dispersed scientists offering instead a centralized and easily accessible environment for teams to communicate, gain insight and share knowledge. The session will also cover an industry case study showing how research communities working on solutions for neglected diseases have been enabled by the cloud.
11:00

57

Mobile chemistry apps participating in the open science revolution
Alex M Clark, aclark@molmatinf.com, Molecular Materials Informatics, Montreal, Quebec H3J2S1, Canada
Practicing open science is a relatively new idea that has been facilitated by universal access to internet technologies. It has been motivated by the popularity of social media, the evolution of open protocols, and the increasing demand for transparency and unrestricted flow of knowledge. Open science is a comfortable fit with technologies that are designed for creating and consuming information using open formats and open platforms. Such activities are increasingly being performed using mobile devices. This presentation will describe the state of the art with regard to creating chemical information and sharing it openly. Established apps such as the Mobile Molecular DataSheet (MMDS), MolSync, SAR Table, Reaction101, Yield101 and Open Drug Discovery Teams (ODDT) will be described, in the context of how they can be used to participate in the open science revolution.
11:25

58

Evaluating the quality and performance of automatic atom mapping algorithms
Daniel M Lowe, daniel@nextmovesoftware.com, Roger A Sayle. NextMove Software, Cambridge, Cambridgeshire CB4 0EY, United Kingdom
Automatic atom mapping is highly useful for a variety of applications including allowing more specific queries of reaction databases and normalizing reactions (for registration). The problem is difficult due to the computational cost of performing atom mapping comprehensively and the potential for multiple solutions to be found. We test a variety of atom mapping algorithms on reactions from a pharmaceutical company electronic lab notebook as well as reactions extracted from US patent applications. The quality of reaction mapping may be assessed using criteria such as the number of bonds broken in the proposed mapping. Hence the strengths and weaknesses of atom mapping algorithms may be quantified.

MONDAY AFTERNOON

Philadelphia Marriott Downtown
Room 302/303

Future of the History of Chemical Information Cosponsored by HIST
Andrea Twiss-Brooks, Leah Solla, Organizers
Andrea Twiss-Brooks, Presiding
1:00 59

From “Index Chemicus” for chemical documentation to integrated tools for intelligent retrieval
Vijay Bhatia, vijay.bhatia@thomsonreuters.com, Thomson Reuters, United States
The presentation summarizes a 50 year journey of ISI, from a completely independent non-governmental private chemical documentation company to becoming a part of Thomson Reuters, an” intelligent information provider” Documentation systems in the earlier part of this journey were directed by the “principle of completeness”. Due to an exponential growth in chemistry and with rapidly growing information this principle of completeness has become challenged. In addition to this, the demarcation line that existed between chemistry and biology has now vanished and we either talk of chemical biology or biological chemistry. The consequence of overlap of chemistry and biology has shifted the principles of documentation to “utility criteria” and this shift has changed and added new retrieval tools for both individuals and institutions.

Presentation (pdf)

1:20 60

Back to the future: CAS and the shape of chemical information to come
Roger Schenck, rschenck@cas.org, Chemical Abstracts Service, United States
CAS, the only organization in the world whose objective is to find, collect and organize all publicly-disclosed chemistry, has always been a leader in providing scientists access to chemical information. Originally relying on a group of globally-situated volunteer chemists, CAS now keeps pace with the explosion in newly disclosed chemistry with more than 500 scientists working at CAS' Columbus Ohio headquarters, who are supported in turn by at least that same number of scientists working in locations around the world, from Japan, China and India, to Germany and elsewhere. Beginning with the inception of the CAS REGISTRYSM in 1965, CAS has developed computer applications both for database-building efforts and service delivery. In 1995, with the introduction of SciFinder, CAS once again changed the way chemists do research, as the first to reach the end user chemist directly. Since then, CAS has leveraged rapid changes in technology, evolving sources of disclosed chemistry, as well as how scientists use information to fulfill its mission to provide the world's best digital research environment to search, retrieve, analyze and link chemical information. This presentation will describe how CAS has changed the chemical information world and close with some predictions about the future of chemical information.

Presentation (pdf)

1:40 61

How books shape what we know
Bruce V Lewenstein, b.lewenstein@cornell.edu, Chemical Heritage Foundation, United States and Cornell University, United States
Chemists and other scientists usually think of the peer-reviewed journal article as the sine-qua-non of science -- if it isn't in a peer-reviewed article, it isn't really science. But historically, books have played a critical role in establishing what counts as reliable knowledge about the natural world. Textbooks are particularly important, because they provide the overall structure for knowledge. This talk will examine a series of chemistry and other textbooks to show how particular authors and textbooks have shaped our understanding of chemistry. These historical findings have direct impact on how chemists use textbooks -- and the new digital materials that are supplementing and supplanting textbooks -- in the future.

Presentation (pdf)

2:00 62

Evolution of chemical information instruction
Adrienne W. Kozlowski, kozlowskia@ccsu.edu, Department of Chemistry, Central Connecticut State University, New Britain, CT 06053, United States
Considered will be trends in textbooks and syllabi, print vs. online dala bases, effect of Sci-Finder, growth of molecular modeling programs, and the explosion of poster presentations.

Presentation (pdf)

2:20   Intermission.
2:30 63

Looking back, but not in anger: My view of history and future of chemical information
Engelbert Zass, zass@chem.ethz.ch, Department of Chemistry and Applied Biosciences, ETH Zurich, Zuerich, Switzerland
While many chemists do not know any more from personal experience how chemical information was retrieved from printed sources only, their content and many of their traditional organization, indexing rules, and data structures are still with us. Today, modern chemical information retrieval is almost completely integrated into the information and communication mainstream, using browsers on laptops, tablets, or smartphones, instead of the dedicated hard- and software required for searching in earlier times. These opposing aspects outline not only an obvious success story, but they provide also a blueprint for problems yet to be solved.

Presentation (pdf)

3:00 64 Data collection in the future: The views and concerns of a historian of chemistry
Jeffrey I. Seeman, jseeman@richmond.edu, Department of Chemistry, University of Richmond, Richmond, Virginia 21373, United States
The Symposium Organizer provided the following vision and request: “Regarding chemical information, where could the landscape of today lead and distill to in the future, based on what we've learned over 100+ years about chemistry, information and most importantly, the people involved in it all? I am looking for real working stories and experiences that lead to some depth insights. I'd also like to make the session interactive, to push the audience into thinking beyond where they are now and focus on deliverables for transferring skills as the landscape moves.” This talk will attempt to meet this request as viewed by a former practicing research chemist and now historian of chemistry and researcher who also studies responsible conduct of research (RCR) within the chemistry community.
3:20 65 Chemical information: From print to the Internet
Robert E. Buntrock E Buntrock, buntrock16@myfairpoint.net, Buntrock Associates, Orono, ME 04473, United States
Developments in and the evolution of chemical information in the last half of the 20th Century will be described from the perspective of a chemist and user. Resources, databases, systems, education, and people involved will be covered. Can the past assist in predicting the future?
Presentation (pdf)
3:40   Discussion.

 Philadelphia Marriott
Conference Room 307

Open Notebook Science/Open Chemistry/Electronic Lab Notebook
Jean-Claude Bradley, Philip McHale, Organizers
Jean-Claude Bradley, Presiding
1:15 66 Shining a light on chemical properties with Open Notebook Science and open strategies
Jean-Claude Bradley1, bradlejc@drexel.edu, Andrew SID Lang2. (1) Department of Chemistry, Drexel University, Philadelphia, PA 19104, United States, (2) Department of Mathematics, Oral Roberts University, Tulsa, OK 74171, United States
This presentation will illustrate how strategies for doing science with openness in mind can amplify the impact of research. One example will cover Open Notebook Science, the practice of making the laboratory notebook and all associated raw data available to the public in real time. Other examples will include the use of open chemical descriptors, open models and open algorithms. It will be demonstrated how simply uncovering existing empirical relationships between properties and descriptors allows for the automated generation of an Open Chemical Property Matrix from a collection of open data feeds.
1:45 67 Leveraging Open Notebook Science for solubility and melting point predictions for optimizing reactions and recrystallizations
Matthew J McBride1, mcbridemj@gmail.com, Jean-Claude Bradley1, William E Acree3, Andrew Lang2, Antony Williams4. (1) Department of Chemistry, Drexel University, Philadelphia, Pennsylvania 19104, United States, (2) Department of Mathematics, Oral Roberts University, United States, (3) Department of Chemistry, University of North Texas, United States, (4) Unaffiliated, United States
This research investigates the ability of the Abraham Model to accurately predict the solubility of organic compounds in organic solvents. The Abraham Model is useful because it uses experimentally measured solubilities to predict solubility in unmeasured solvents. In addition, an open melting point model will be described which allows temperature dependent solubility to be predicted. This information is useful for determining the best solvent to use when attempting to recrystallize a particular compound and more generally, provides valuable information for choosing a solvent for an organic reaction. This research was conducted using Open Notebook Science, which releases all experiments completed and results online to promote the sharing of information. A case will be made that open and real-time sharing of experimental results, whether successful or not, leads to more efficient and rapid scientific progress.
2:05 68 Feeding and consuming data to support Open Notebook Science via the ChemSpider Platform
Antony J Williams1, williamsa@rsc.org, Jean-Claude Bradley2, Andrew S.I.D. Lang3, Valery Tkachenko1. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) Chemistry, Drexel University, Philadelphia, PA, United States, (3) Mathematics, Oral Roberts University, Tulsa, OK, United States
We are all benefiting from a shift towards openness fed by Open Source, Open Standards, Open Data and Open Access. Open Notebook Science is likely the scientific revolution of the near term. As more scientists become comfortable with the concepts of openly sharing their experiments and data, often in near real time, we are seeing a shift to significant increases in the availability of new data that does not have to be extracted from publications but is available as data feeds that can be delivered to the community. This presentation will provide an overview of how the ChemSpider database from the RSC supports Open Notebook Science using programmatic access to both data and services and how ChemSpider ingests data feeds to mesh together with our existing database of over 27 million chemical compounds.
2:30 69 LabTrove: Software for facilitating open notebook science
Jeremy G Frey, j.g.frey@soton.ac.uk, Department of Chemistry, University of Southampton, Southampton, United Kingdom
The LabTrove (http://www.labtrove.org) software (part of the Smart Research Framework software (http://mylabnotebook.ac.uk) used as an electronic laboratory notebook (ELN) and collaboration system will be described. The way the software can facilitate collaboration across different scales e.g. between research student & supervisor, local or geographically dispersed research groups and in the limit global collaboration supporting open science projects will be illustrated and issues raised particularly by the openness of the later case highlighted and discussed. The important issue of the use and usefulness of metadata in the classification of notebook entries will and the way this impacts of the re-use of data and information contained within the notebook system will be discussed and exemplified.
2:55   Intermission.
3:10 70 Online CHEmical Modeling environment (OCHEM) is an innovative open collaborative platform for chemistry
Iurii Sushko1, Sergii Novotarskyi1, Robert Körner1, Ahmed Abdelaziz1,2, Wolfram Teetz1, Igor V. Tetko1,2, itetko@vcclab.org. (1) eADMET GmbH, Neuherberg, Germany, (2) Institute of Structural Biology, Helmholtz Zentrum Muenchen, Neuherberg, Germany
OCHEM (http://ochem.eu) provides unprecedented opportunities for open science and collaborative work in chemical sciences. It supports advanced tools to store, edit and share chemical information. The features include hidden and public data, automatic unit conversion, detection of duplicates, identification of the primary source of publication of a given piece of information, automatic verification of structures and names of molecules, batch upload, editing and exporting of thousands of records simultaneously. The database is tightly connected with the data modeling framework allowing development of models with hundreds thousands molecules and their applications to screen new molecules. A recent introduction of data moderators and peer-reviewing process has dramatically contributed to the quality of the data. The integration of OCHEM with ELN can be performed through web services and automatized workflows, such as KNIME or Pipeline Pilot. The use of OCHEM within the Open Notebook Science projects will be presented. Actually, OCHEM is such the project.
3:30 71 Opening up ELNs and Repositories to support formal publication
Simon J Coles1, s.j.coles@soton.ac.uk, Graham Tizzard1, Jeremy Frey1, Andrew Milsted1, Mark Edwards2, Romanus Onyeabo2, John Spencer3, Jan Kuras4. (1) Chemistry, University of Southampton, Southampton, Hampshire SO17 1BJ, United Kingdom, (2) Pharmaceutical, Chemical & Environmental Sciences, University of Greenwich, Chatham, Kent ME4 4TB, United Kingdom, (3) Department of Chemistry, University of Sussex, Brighton, Sussex BN1 9NQ, United Kingdom, (4) Chemistry Central, Chemistry Central Journal, London, London WC1X 8HB, United Kingdom
Supplementary Information supporting academic publications is not uniform across journals, is generally not 100% representative of the work undertaken, is not structured or comprehensive and therefore is often not fully considered in review. Our approach to solving this problem is to expose the Laboratory Notebook recordings of the researchers who conducted the work and related data held in information management systems and repositories and link to these from the article. The ELN plays a key role here by establishing authenticity, adding structure to the record and having the capacity to readily be made open. This paper presents a formal publication in Chemistry Central Journal, which is a collaborative piece of work between the UK National Crystallography Service and its users at the University of Greenwich, where the data supporting the article is not contained within it, but is openly exposed at source by an ELN and a crystal structure repository.
3:55 72 ChemWiki: Advancing undergraduate chemistry content, curation, and education with dynamic open-access textbooks
Delmar Larsen, dlarsen@ucdavis.edu, Chemistry, University of California, Davis, Davis, CA 95616, United States
We propose the development of the Dynamic Textbook Project (DTP) consisting of six corollary pseudo-independently operating and interconnected “STEMWikis” that focus on augmenting education in separate STEM (Science, Technology, Engineering, and Mathematics) fields. The central aim of the DTP is to develop and disseminate free, virtual, and customizable textbooks to substitute for current, commercial paper texts in multiple courses at post-secondary institutions across the nation. The ChemWiki (http://ChemWiki.ucdavis.edu) is the pilot STEMWiki developed to demonstrate efficacy of the DTP. The ChemWiki is multifaceted, highly flexible, and adaptable to any level or course in chemistry and uses established Wiki technology to facilitate the mass collaborative effort necessary to provide this flexibility. and currently has a visitor traffic of 7.5 M visits and 11.25 M pageviews per year with an estimated 512 hours of reading/writing occurring daily. When six STEMWikis are developed to this level, then 45 M visitors are expected to access 67.5 M pages annually.
4:20 73 Negotiating trust in the communication of science by blog
Lawrence Souder, ls39@drexel.edu, Department of Culture and Communications, Drexel University, Philadelphia, PA 19104, United States
Advocates for open systems in science make claims for efficient collaboration and transparent communication. Although these characteristics are consistent with the traditional norms of science, the implementation of open systems has had mixed effects, particularly on the role of trust. This case study of the published correspondence in research journals suggests that when communication moves from traditional print systems to open on-line systems, two levels of trust arise, one at the discourse level and another at the metadiscourse level. The coincidence and conflation of discourse in these two registers both ameliorate and trouble trust in the communication of science.
4:45   CINF Business meeting.

 Hilton Garden Inn Philadelphia
Salon D

Hunting for Hidden Treasures: Chemical Information in Patents and Other Documents Cosponsored by CHAL, SCHB
Wei Deng, Organizer, Presiding
1:00   Introductory Remarks.
1:05 74 Challenges in chemical literature mining
Shashikala G, Anirban Mudi, anirban.m@molecularconnections.com, Lokanath Khamari, Jignesh Bhate. Molecular Connections, Bangalore, Karnataka 560004, India
Mined chemical information is used to validate research results, design new synthetic routes and file patents. However, finding the chemical information of interest from millions of chemistry articles and patents poses a huge problem. Several text mining approaches have been applied to mine the chemical information in literature, but none of them is fully accurate. Lack of a universal standard for chemical structure representation and chemical nomenclature is a significant challenge in mining chemical entities. Various problems in chemical information extraction have been addressed with partial success by some commercial and academic efforts through a multifaceted approach to recognize these diverse representations and nomenclature. In this presentation we review the challenges in chemical literature mining, and examine a combination of automated and manual approaches in extracting high quality chemical information. We also discuss how manually curated data can be used to improve automated chemical literature mining techniques.
1:30 75 Where the blue of the night meets the gold of the day: The challenges of collaboration between searcher and client to produce the perfect chemistry patent search
Robert A Stembridge, bob.stembridge@thomsonreuters.com, Thomson Reuters, London, United Kingdom
Chemical patent searching is as much art as science. It presents unique challenges in knowing how and where to draw the boundaries for a search which is efficient, cost-effective and fit for purpose. The perfect search lies somewhere at the interface between what is required by the client and what is achievable within time limits and cost constraints placed on the searcher. This presentation will review the challenges thrown up by different types of chemical patent searching and discuss how these are resolved through good communication and collaboration between searcher and client.
1:55 76 Numeric property searching in STN patent databases
Jim Brown, jim.brown@fiz-k.com, FIZ Karlsruhe, Townsend, DE 19734, United States
The Numeric property search feature in PCTFULL, AUPATFULL and CANPATFULL, offers exact value and range search options for over 30 physical and chemical properties in almost 400 units. Numeric values and their corresponding units are automatically extracted from the patent text, normalized and made available for searching. This session introduces this powerful search tool, and provides many practical search examples to demonstrate its use.
2:20   Intermission.
2:30 77 Approaches for extraction and “digital chromatography” of chemical data - a perspective from the RSC
David Sharpe1, sharped@rsc.org, Colin Batchelor1, Valery Tkachenko2, Kenneth Karapetyan2, Antony J Williams2, Richard Kidd1. (1) Cheminformatics, Royal Society of Chemistry, Cambridge, Cambridgeshire CB4 0WF, United Kingdom, (2) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States
The traditional perception of the publishing process has been that it culminates in a print article. The Royal Society of Chemistry (RSC) has for many years been acutely aware that there is a wealth of information contained in scientific communications that we publish and that its true value can only be unlocked by enabling the discovery of the data within them. This is challenging due to the variety of ways that scientists provide data, textually, graphically, and increasingly in supplementary information. This talk will outline how the RSC has applied innovative approaches, developed both internally and externally, to identifying important chemical data within the literature and provides tools to anyone using chemical data to analyse and improve its quality. Examples will include: Project Prospect, the Experimental Data Checker, our CIF data importer, ChemSpider and our structure validation and standardization service.
2:55 78 SCRIPDB: A portal for easy access to syntheses, chemicals, and reactions in patents
Abraham Heifets1,3, abe@cs.toronto.edu, Igor Jurisica1,2,3. (1) Department of Computer Science, University of Toronto, Toronto, Ontario M5G 1L7, Canada, (2) Department of Medical Biophysics, University of Toronto, Toronto, Ontario M5G 1L7, Canada, (3) Ontario Cancer Institute, Princess Margaret Hospital, Toronto, Ontario M5G 2M9, Canada
SCRIPDB is a publicly-accessible chemical structure database designed to provide metadata about bioactive molecules, including mechanism of action, disease class, homologous experimental series, structural alternatives, or the synthetic pathways used to produce molecules of interest. SCRIPDB describes over 10 million compounds found in over 100,000 patents granted since 2001. It provides the full original patent text, reactions, and relationships described within any individual patent, in addition to the molecular files common to structural databases. We discuss how this information is valuable for such applications as medical text mining, chemical image analysis, reaction extraction, and in silico pharmaceutical lead optimization. We present opportunities and challenges, as well as traps for the unwary, and assess the quality of data available in the patent literature.
3:20 79 CWM Global Search: The internet search engine for chemists and biologists
Guenter Grethe1, ggrethe@att.net, Hans-Juergen Himmler2, Alex Kos2. (1) Unaffiliated, Alameda, CA 94502-7409, United States, (2) AKos GmbH, Steinen, Germany
CWM Global Search allows scientists to solve chemical structure oriented scientific problems on the Internet by using federated searching over many excellent sources. Using only one interface, the user can retrieve information from many publicly available databases. These include, but are not limited to, databases about chemical suppliers, biological information, toxicity and safety data, and patents. At the time of this writing more than 70 sources can be searched, and the number is growing monthly. Since many scientists may not be familiar with the predefined profiles available for all these sources, CWM Global Search allows you to add your own. The use of the IUPAC supported InChIs (International Chemical Identifier) or InChIKeys as the unique molecular identifier allows structure searches in many sources that were until recently only searchable by text. We will discuss the many features of CWM Global Search in detail by providing examples.
3:45   Intermission.
3:55 80 Extracting chemical information in a corporate environment with JChem for SharePoint
Tamás Pelcz, tpelcz@chemaxon.com, Anna Gulyás-Forró. ChemAxon Ltd., Budapest, Hungary
Microsoft SharePoint is a widely used web-based collaboration platform, which enhances communication between working teams at different locations. Recently, most of chemical document searching applications are operating on public sources, e.g. journals, patents, but SharePoint provides the opportunity to index and search internal enterprise contents, taking into account user permissions as well. ChemAxon's extensions provide structure editing, search and filtering making SharePoint more attractive for pharma companies. In this talk we will demonstrate JChem for SharePoint's features, which facilitate the extraction of chemical data from corporate documents and give some examples how to manage and analyze search results.
4:20 81 Federated chemical searching across SharePoint and E-Notebook
Rudy Potenzone, rudy.potenzone@perkinelmer.com, Phil McHale, Megean Schoenberg, Alex Jewett, Kate Blanchard, David Gosalvez, Philip Skinner, philip.skinner@perkinelmer.com. PerkinElmer Informatics, Cambridge, MA 02140, United States
Chemical information is often dispersed around an organization in many disparate data systems and unstructured data records. Scientists typically record reaction-specific information within an Electronic Laboratory Notebook (ELN) but other chemical information is held within live structural files, both native and embedded in other documents. Such files are routinely stored within Document Management Systems. Each data source generally can only be searched via its built-in search interface, which for Document Management Systems does not usually include chemical intelligence, and requires each system to be searched individually.
We will describe the creation of a bridging technology between an electronic lab notebook and a document management system to allow simultaneous searching across both. We will show how access to both structured reaction data and unstructured document data from one source is used to collect all the relevant documents and data in one place to help guide scientific decisions and develop ideas.
4:45   Concluding Remarks.

MONDAY EVENING

Pennsylvania Convention Center
Hall D

Sci-Mix
R. Bienstock, Organizer
8:00 - 10:00 pm
  2 See previous listing
  5 See previous listing
  21 See previous listing
  38 See previous listing
  40 See previous listing
  41 See previous listing
  42 See previous listing
  44 See previous listing
  45 See previous listing
  58 See previous listing
  70 See previous listing
  81 See previous listing
  91 WITHDRAWN
  102 See later listing
  114 See later listing
  122 See later listing
  126 See later listing
  127 See later listing
  128 See later listing
  134 See later listing
  137 See later listing
  141 See later listing
  145 See later listing

TUESDAY MORNING

Philadelphia Marriott Downtown
Room 302/303

Herman Skolnik Award Symposium
Henry Rzepa, Peter Murray-Rust, Organizers, Presiding
8:30   Introductory Remarks.
8:35 82 Changing ways of sharing research in chemistry
Henry S Rzepa, rzepa@imperial.ac.uk, Chemistry, Imperial College London, London, U*K SW7 2AZ, United Kingdom
In 1994 onwards, the Internet was seen as having an increasingly influential potential for how chemistry may be handled, shared, stored and communicated, and how the Internet might have impacted upon the quality, reproducibility and re-use of both experimental observation and computational modelling for new scientific opportunities. Examples will be presented to illustrate from a personal viewpoint how the author carried out collaborative research in pre-Internet days, and how things have changed up to 2012. This will include a review of early attempts at electronic conferencing, examples of modern "datuments" as data-enriched interactive articles, the role of digital repositories (Dspace, Chempound, Figshare) and how environments such as blogs and Wikis can be used to promote collaborative new science.
8:55 83 Making the connection between molecular structure and spectroscopy: Jmol, JSpecView, and JCAMP-MOL
Robert M Hanson1, hansonr@stolaf.edu, Robert Lancashire2. (1) Department of Chemistry, St. Olaf College, Northfield, MN 55057, United States, (2) Department of Chemistry, University of the West Indies, Mona, Jamaica
Over the past decade, advances in online resources have dramatically increased interactive web-based access to both molecular structure and molecular spectroscopy. The two open-source applets, Jmol and JSpecView, have been at the forefront of this technology. But connecting the two has been difficult -- until now. In this presentation, the two principal developers of these applets will discuss the merger of Jmol and JSpecView that was effected in early 2012 and demonstrate the power of that connection and its potential. A proposal for a simple JCAMP file extension, JCAMP-MOL, which allows programs such as Jmol and JSpecView to read molecular structure data, spectroscopy data, and associated correlation data all from the same file (and the critical importance of doing that), will be discussed.
9:10 84 HTML5 enables new methods of distrubuting scientific information
Josef J Polak, joe@ichemlabs.com, Kevin J Theisen. iChemLabs, LLC, Piscataway, New Jersey 08854, United States
The ChemDoodle Web Components are an open-source cheminformatics and chemical graphics HTML5/Javascript library. This library is currently used to create scientific applications that run on any web-capable device with no additional effort required on the part of the end user. These types of tools compliment the dissemination of information, and allow users to quickly and easily view and interact with data. We investigate several new opportunities enabled by these technologies, with a focus on scientific publishing. Such HTML5 scientific tools decrease the costs faced by academia, government and industry as they use the web to further spread science.
9:25 85 Chem4Word: Semantic chemical authoring within Microsoft Word
Alex D Wade, awade@microsoft.com, Anthony J. G. Hey. Microsoft Research, Redmond, WA 98052, United States
The semantic enabling of scientific literature is accelerating. Much of this has concentrated on adding semantics to text, for instance using RDF and ontologies. The Chemistry Add-in for Word (aka Chem4Word) is a joint initiative between Microsoft Research and the University of Cambridge to extend semantic concepts to chemical structures and to allow these concepts to be captured during the authoring process. This open-source tool chemically-enables Microsoft Word by allowing direct searching of structural repositories and insertion of structures directly into documents. Structures can be locally manipulated within Word and are stored in an accessible format (Chemical Mark-up Language - CML) allowing chemical information to be more easily mined from documents. Our intent is that Chem4Word will prove indispensable: in adding semantic value to new and existing bodies of chemical literature and to research institutions seeking to exploit their own expertise and knowledge to the full.
9:40 86 From Smart Tea to blog3: A story of users, laboratories, and the semantic web
Jeremy G Frey, j.g.frey@soton.ac.uk, Department of Chemistry, University of Southampton, Southampton, Hants SO17 1BJ, United Kingdom
The highs and lows of using the semantic web to capture characterise and disseminate chemical information in context processes. At the start of the e-science revolution the use of RDF (the language of the semantic web) along with schemas and ontologies seemed like and ideal way to capture and describe data in context, i.e. process and the result of the process. However, we soon found that a semantic electronic notebook pushed the limits of the available description technology. Advances in understanding of the semantic web and the introduction of Web 2.0 technologies during the evolution of the e-science community allowed us to continue to develop the RDF view of chemical experimental descriptions. There are still many challenges to this way of approaching laboratory and computational notebooks and this talk will highlight the state the current systems.
9:55 87 Blogs and chemical communication
Steven M Bachrach, sbachrach@trinity.edu, Chemistry, Trinity University, San Antonio, TX 78212, United States
In a search for new means of communicating chemistry, the blog offers some interesting features. The short history of the chemistry blog, some representative examples, and how the author has utilized his blog (www.comporgchem.com/blog) will be discussed. Enhanced communication features inherent in blog technology make this medium suitable for many purposes, including commenting on the literature and even reporting new results.
10:10   Intermission.
10:20 88 Semantic pipelines to molecular properties
Egon Willighagen, egon.willighagen@gmail.com, Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, The Netherlands
In our quest to replace answers in molecular sciences by recipes to get answers, the semantic web technologies play the important role of giving meaning to numbers and characters. The Resource Description Framework (RDF) complements (and not replaces) earlier work with eXtensible Markup Language (XML) applications by providing a more clear separation between syntax and meaning. This creates an environment where multiple serialization formats can be used, that grows and shrinks in complexity where needed, and, for example, that can be easily embedded in document formats like HTML. We here present recent work in the dissemination and prediction of molecular properties, where data is shared in RDF, read into statistical and life science software including Bioclipse and R, and where molecular properties are predicted. Samwald, M.; Jentzsch, A.; Bouton, C.; Kallesoe, C.; Willighagen, E.; Hajagos, J.; Marshall, M.; Prud'hommeaux, E.; Hassanzadeh, O.; Pichler, E.; Stephens, S. Journal of Cheminformatics 2011, 3,19. Willighagen, E. L.; Jeliazkova, N.; Hardy, B.; Grafström, R. C. BMC Research Notes 2011, 4. Hastings, J.; Chepelev, L.; Willighagen, E.; Adams, N.; Steinbeck, C.; Dumontier, M. PLoS ONE 2011, 6, e25513+.
10:35 89 Using Semantic Web tools to improve chemical collaboration
Omer Casher1, omer@imaqa.com, Henry Rzepa2. (1) IMAQA, Stevenage, United Kingdom, (2) Chemistry, Imperial College London, London, United Kingdom
The online searching of electronic publications and virtual networking using social networks provide a novel approach for locating potential collaborations. By combining publication metadata with metadata from Web 2.0 resources, a scientist's Web profile could be broadened to include metadata about research activities, useful for locating new collaborators. A proof-of-concept will be presented featuring two Semantic Web tools: The first is SemanticEye, an exemplar which joins up chemical publications using two key metadata identifiers, molecules (InChI) and authors. The second is FOAF, the “friend-of-a-friend” Semantic Web vocabulary for social networking. A dynamic approach to generating FOAF profiles is demonstrated whereby SemanticEye is used to output a FOAF serialisation by querying it with SPARQL, the Semantic Web query language. FOAF information from other resources can then be readily aggregated with the SemanticEye FOAF to further enrich scientific profiles.
10:50 90 Towards publishing semantic descriptions of Electronic Laboratory Notebook records
Simon Coles1, s.j.coles@soton.ac.uk, Richard Whitby1, Jeremy Frey1, Colin Bell1, Aileen Day2. (1) Chemistry, University of Southampton, Southampton, Hampshire SO17 1BJ, United Kingdom, (2) ChemSpider, Royal Society of Chemistry, Cambridge, Cambridgeshire CB4 0WF, United Kingdom
The Chemistry Grand Challenge, Dial-a-Molecule (http://dialamolecule.chem.soton.ac.uk/site/), is based on the ability to be able to mine all available chemical information to predict reaction outcomes and make compounds more efficiently. The sheer number of experiments conducted in laboratories, coupled with the current processes for publishing and no culture of publishing “negative” results means it is not possible to make the outcomes of all experiments available. Our solution is to open up access to the laboratory records relating to experimental observations in order to discover records of interest and automatically process them. We propose a high-level semantic description of an ELN record that comprises a compact set of terms, including title, keywords, identifiers, contact, license, related items, contributors, content, source and dates. This approach is demonstrated by the schema being used for the: 1) LabTrove ELN to openly publish its records. 2) IDBS e-Workbook to interact with RSC's ChemSpider database.
11:05 91 WITHDRAWN
11:20 92 Open Source cocktail: Benefits of integrating Cheminformatics and statistical software
Rajarshi Guha, guhar@mail.nih.gov, Informatics, NIH NCATS, Rockville, MD 20850, United States
One of the key effects of Open Source software is the ability to freely mix and match code, libraries and applications to address problems in an efficient manner. In this talk I will discuss how the CDK, a Java library for cheminformatics is being used in the R environment, a platform for statistical modeling, to provide a comprehensive cheminformatics modeling environment. I will discuss related packages, that enhance core cheminformatics functionality, primarily by providing access to public chemogenomic databases, ChEMBL and PubChem. I will highlight how the use of freely available and liberally licensed software has enabled the development of these tools and how such tools lead to the development of a useful software ecosystem. I will also touch upon the issue of reproducibility of analytical workflows that is enabled by R and will finally discuss some of the problems and bottlenecks in projects that depend on multiple Open Source components.
11:35 93 Avogadro, open chemistry, and chemical semantics
Marcus D Hanwell, marcus.hanwell@kitware.com, Kyle Lutz. Department of Scientific Computing, Kitware, Inc., Clifton Park, NEW YORK 12065, United States
Avogadro is being rewritten and architected to put semantic chemical meaning at the center of its internal data structures in order to fully support data-centric workflows. Computational and experimental chemistry both suffer when semantic meaning is lost; through the use of expressive formats such as CML, along with lightweight data-exchange formats such as JSON, workflows that previously demanded manual intervention to retain semantic meaning can be used. Integration with projects like JUMBO and Open Babel when conversion is required, coupled with codes such as NWChem where direct support for CML is being added, allow for much richer storage, analysis, and indexing of data. As web-based data sources add more semantic structure to their data, Avogadro will take advantage of those resources.

TUESDAY AFTERNOON

Philadelphia Marriott Downtown
Salon H

Herman Skolnik Award Symposium
Henry Rzepa, Peter Murray-Rust, Organizers, Presiding
1:30

94

Can we build artificially intelligent chemists?
Peter Murray-Rust, pm286@cam.ac.uk, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
It is now possible to create fully semantic CML datuments enhanced with a range of dictionaries. We have now added mathematics, through MathML formalism, and with appropriate imperative semantics can create weakly intelligent documents. These documents can answer simple questions using heuristics (as in the parsing of natural language) or formally following algorithms through MathML. Traditional problems such as the evauation of forcefields can be broken into individual components so programs can be written simply with mathematical and chemical editors. The presentation will demonstrate a range of weak intelligences currently possible with generic libraries. Chemistry and related subjects will be enhanced by moving from conventional publications and databases to Open Semantic Web resources.
1:50 95 Enriching the NWChem computational chemistry software with Chemical Markup Language semantics
Wibe A de Jong, wibe.dejong@pnnl.gov, William A Shelton, william.shelton@pnnl.gov. EMSL, Pacific Northwest National Laboratory, Richland, WA 99352, United States
Coupling data obtained from various complex experiments and simulations has the potential to deliver a new scientific knowledge and discovery base. Within the computational chemistry and materials community there are multiple efforts to make simulation data available to the broader community. To enable researchers to effectively use diverse data sources requires information rich data formats and well defined interfaces. In this paper we describe our efforts to develop a semantically rich Chemical Markup Language (CML) compliant Extensible Markup Language (XML) data file and format for the widely used NWChem computational chemistry software. In addition, we will discuss the need for a comprehensive computational chemistry dictionary.
2:05 96 Data-rich chemistry inside Wikipedia and other wikis
Martin A Walker, walkerma@potsdam.edu, Department of Chemistry, State University of New York at Potsdam, Potsdam, New York 13676, United States
Chemical information inside Wikipedia has great value, but the site is designed as an encyclopedia rather than as a database. Despite this, chemists active on the site have designed the “Chembox” (in substance articles) so that the information is machine-readable. A validation effort has also made the data more reliable, and a bot “patrols” the chemboxes to ensure that validated content is not vandalized. Data subpages allow more information to be stored than is possible on Wikipedia article pages. Identifiers such as InChIs help users find Wikipedia substance information via structures. Another wiki, in RSC LearnChemistry, takes this approach further, using InChIs to improve learning by allowing students to answer questions by inputting structures.
2:20 97 Progress and directions towards semantic NMR data
Karl T. Mueller1,2, karl.mueller@pnnl.gov, Nancy M. Washton2, Nico Adams3. (1) Department of Chemistry, Penn State University, University Park, PA 16802, United States, (2) Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, United States, (3) Materials Science And Engineering, CSIRO, Clayton, VIC 3168, Australia
Nuclear magnetic resonance (NMR) spectroscopy is a powerful and widely used analytical method and chemical tool that exploits the behavior of nuclear “spins” within a large magnetic field and under the influence of radiofrequency magnetic pulses. Most, if not all, major universities and many national laboratories have NMR instrumentation comprising a variety of vintages from a handful of vendors. There is no accepted “data model” for NMR data, although recent attempts reveal promising directions. We are pursuing a semantic framework for NMR data, and will report on progress related to the construction of NMR dictionaries and data extraction/conversion tools to place NMR data within the semantic architecture of Chemical Markup Language (CML).
2:35   Intermission.
2:45 98 Natural language parsing for semantic science
Lezan Hawizy1, l.hawizy@digital-science.com, Daniel Lowe2, Hannah Barjat3, David Jessop4, Peter Murray-Rust5. (1) Macmillan Publishing, Digital Science, London, Greater London N1 9XW, United Kingdom, (2) Innovation Centre, NextMove Software Ltd, Cambridge, Cambridgeshire CB4 0EY, United Kingdom, (3) Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire CB1 1EW, United Kingdom, (4) Global Dawn, London, Greater London W1W 8BE, United Kingdom, (5) Wellcome Trust Genome Campus, European Bioinformatics Institute, Cambridge, Cambridgeshire CB10 1SD, United Kingdom
Scientific publications are a highly-valuable resource for mining chemical information. The language used in this literature is highly formulaic and contain many domain-specific terminologies that provide primary results and metadata. Data mining these publications provide semantically-rich information about routine experiment types such as chemical syntheses, figure captions, abstracts, and article metadata. ChemicalTagger uses tagging from OSCAR (a chemical entity recogniser) and other NLP sources as well as an ANTLR grammar to extract a shallow parse. It was developed to extract all the components of a reported synthesis (compounds, conditions, amounts, times, outcomes, etc.) and has extracted 420,000 atom mapped-reactions from US patents. ChemicalTagger is generic and has been applied to Atmospheric Chemistry abstracts and more recently to captions of figures such as spectra. ChemicalTagger is particularly valuable for quantities and scientific units and is useful in any physical science with a high need for extracting semantic numerical data.
3:00 99 Language, semantics, and chemistry: Why computers need to say what we mean
Robert C Glen, rcg28@cam.ac.uk, Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire CB21EW, United Kingdom
In e-science, 'Big Data' is capturing the imagination of researchers, funders and entrepreneurs, but data is only as useful as the metadata attached to it. How we store and analyse the data is to a large extent governed by our current intended uses, but in the future, it is often uncertain how the data will be used. Humans have developed languages that are semantically rich and cope with extrapolations from philosophically different concepts with ease. The work of Henry and Peter is moving our field of chemistry significantly in that direction, enabling linguistically rich queries to interrogate large complex databases in a manner that often allows discovery of new ideas or concepts. We will explore ideas of how semantic data can change our view of chemistry, its importance in experimental and computational science and give some examples.
3:15 100 Automated molecular data extraction using Open Babel and ChemSpotlight: The Semantic Web and Semantic Desktop
Geoffrey R Hutchison, geoffh@pitt.edu, Department of Chemistry, University of Pittsburgh, Pittsburgh, PA 15260, United States
In the last 10 years, the Open Babel project has become an open toolbox for chemistry, including widely-used tools for interconversion of molecular data, and a programming toolkit. The ChemSpotlight project leverages Open Babel to automatically extract and annotate a wide variety of chemical files with semantic data. It can be used to prepare a semantic desktop and semantic network database with little effort or extra software. In particular, we will show how it is used to filter and select from computational chemistry output files without creating a database or explicit annotated data store.
3:30 101 Crystallographic publishing in the semantic age
Brian McMahon, bm@iucr.org, International Union of Crystallography, Chester, United Kingdom
The International Union of Crystallography (IUCr) launched its website (with journal tables of contents and associated structural data files) in 1994, and its electronic journal publishing platform in 1999. From the outset these publishing activities have been driven by the vision of interactive semantic publications linking data and publications that Murray-Rust and Rzepa have pioneered in the field of chemistry [1, 2]. In turn, the Crystallographic Information File (CIF) released by the IUCr in 1991 as an information interchange standard [3] has informed the structural content of chemical markup language (CML). CIF was designed from the outset as an extensible standard, and now covers many areas of crystallography. It forms the basis for integrated data and publishing workflows linking laboratories, data repositories, publishers and databases, and has been an important factor in improving the quality of published crystal structures. [1] Murray-Rust, P. (1998). Acta Cryst. D54, 1065-1070. [2] Murray-Rust, P. & Rzepa, H. S. (1999). J. Chem. Inf. Comput. Sci. 39, 928-942. [3] Hall, S. R., Allen, F. H. & Brown, I. D. (1991). Acta Cryst. A47, 655-685.
3:45 102 Towards semantic materials informatics: Making materials data accessible and useful
Nico Adams, nico.adams@csiro.au, Murray Jensen. Materials Science and Engineering, CSIRO, Clayton, Victoria VIC 3168, Australia
Materials informatics requirements are substantially different from small molecule informatics: while structural representations of small molecules often contain enough information for the development of structure-property relationships (SPRs), this is frequently not the case for complex materials. Often an account of the provenance of a material must be added to the chemical/structural representation of a material. Additionally, materials data is usually generated in “native vernaculars”: non-portable formats, which do not easily allow for data exchange. To make this data widely accessible, it must be converted to formats with both human as well as machine comprehensible standard syntax and semantics. The talk will discuss how the complete semantic web toolstack (from XML dialects to axiomatically rich ontological models in OWL) can be leveraged for the development of machine comprehensible representations of materials and their associated data and how modern materials information systems can be established on the basis of these representations.
4:00   Intermission.
4:10 103 Exploring large chemical data sets: Interactive analysis and visualization
Kyle Lutz, kyle.lutz@kitware.com, Marcus D Hanwell. Department of Scientific Computing, Kitware, Inc., Clifton Park, NEW YORK 12065, United States
A new open-source application, ChemData, has been developed to facilitate the exploration and analysis of large chemical data sets. The program features include a variety of 2D plotting techniques, such as traditional scatter plots, parallel coordinates charts, and scatter plot matrices. Similarity relations between molecules can be explored using a range of graph-based visualization methods. Multiple querying and filtering functions allow users to locate molecular data relevant to their work. The application uses MongoDB as a semantic data store, focusing on cheminformatics and assessment of chemical properties such as QSAR data. Computational chemistry data is stored directly in the file store, and semantic data is extracted to facilitate search and analysis. Initial work is also in progress for using web-based visualization and analysis tools to interact with the data.
4:25 104 Chemical classification for the Semantic Web
Janna Hastings, hastings@ebi.ac.uk, Paula de Matos, Venkatesh Muthukrishnan, Marcus Ennis, Adriano Dekker, Steve Turner, Gareth Owen, Christoph Steinbeck. Cheminformatics and Metabolism, European Bioinformatics Institute, Cambridge, United Kingdom
Recent years have seen a welcome explosion in the availability of open data in the chemistry domain. With this information explosion, however, it becomes harder to retrieve relevant results from the available information and organise those results towards answering specific scientific questions. Classification is essential for effective browsing and visualisation and for discovering underlying chemical-biological mechanisms. The ChEBI chemical ontology provides a structure-based and an activity-based classification for chemicals in biological contexts. This chemical ontology has been applied to annotation of chemicals in biological contexts and for diverse tasks of chemical discovery including metabolic network gap prediction. However, ChEBI's growth is limited to the throughput of manual annotation. We will describe tools that we are developing to address this challenge that integrate cheminformatics and Semantic Web technology in order to extend the ChEBI chemical classification to a wide range of chemical data on the Semantic Web.
4:40 105 Use of InChI on the Internet
Stephen Heller, steve@hellers.com, CBRD, NIST, Gaithersburg, MD 20899-8320, United States
This presentation will discuss and stress the extensive use and dissemination of the InChI and InChIKey structure representations by and for the world-wide chemistry community, the chemical information community, and major publishers and disseminators of chemical and related scientific offerings in manuscripts and databases.
4:55 106 ChemSpider compound database as one of the pillars of a semantic web for chemistry
Valery Tkachenko1, tkachenkov@rsc.org, Antony J Williams1, Alexey Pshenichnov1, Kenneth Karapetyan1, Colin Batchelor2, Jonathan Steele2, Aileen Day2, David Sharpe2. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) Cheminformatics, Royal Society of Chemistry, Cambridge, Cambridgeshire CB4 0WF, United Kingdom
The ChemSpider compound database is a free online database provided to the community by the Royal Society of Chemistry. The database is an aggregator of information from online resources as well as a host of data extracted from RSC scientific articles. Over the past five years over 26 million chemicals together with a diverse array of associated data has been deposited. The online database is open to community deposition, annotation and curation and, as a result, has expanded into a rich resource to contribute to a semantic web of chemistry. ChemSpider provides access to its data via web services and as RDF. ChemSpider provides the chemistry services supporting the Open PHACTS project, a semantic project serving the Life Sciences community to facilitate the linking of chemical-biology data and enable drug discovery. This presentation will provide an overview of our contributions to the semantic web.
5:10   Award Presentation.

WEDNESDAY MORNING

Philadelphia Marriott Downtown
Room 302/303

Global Opportunities in Chemical Information Cosponsored by PROF, SCHB
Jignesh bhate, Tom Blackadar, Organizers
Rachelle Bienstock, Presiding
9:00   Introductory Remarks.
9:10 107 Translating IUPAC-like chemical nomenclature to and from simplified Chinese
Roger A Sayle, roger@nextmovesoftware.com, Daniel M Lowe. NextMove Software, Cambridge, CAMBS CB4 0EY, United Kingdom
More and more synthetic and medicinal chemistry is being done in China. China now files more patent applications each year than the United States. A google search for benzoic acid retrieves more hits in Chinese than it does in English. Alas the language barrier between Chinese university chemistry departments, chemical suppliers and contract research organizations and western pharmaceutical companies is increasingly becoming a challenge. One possible solution to some of these problems is the use of machine translation software for converting chemical names from one language to another. This talk will describe one such system that, amongst other uses, enables the chemical text mining of structures from Chinese patent office patents.
Roger Sayle, "Foreign Language Translation of Chemical Nomenclature by Computer", Journal of Chemical Information and Modeling (JCIM), Vol. 49, No. 3, pp. 519-530, February 2009
9:35 108 Challenges and rewards of starting a small global chemistry business in China
Thomas A Blackadar, tom@binocvision.com, Management, Binocular Vision, San Francisco, CA 94104, United States
With China the bright spot on an otherwise troubled global economy, small-business entrepreneurs are flocking to the Middle Kingdom. The reality is often complicated, however, by bureaucratic, cultural and business issues in a country not yet geared to small businesses. In the end, a solid long-term plan is essential for navigating the route to a successful business. We will discuss recent experiences from starting our Shanghai-based informatics company Binocular Vision, and we will examine some common pitfalls and strategies for overcoming them.
10:00 109 Multilingual WorldWideScience.org
Brian Hitson, hitsonb@osti.gov, Office of Scientific and Technical Information, U.S. Department of Energy, Oak Ridge,, TN 37830, United States
Multilingual WorldWideScience.org, wearing our hats as the WWS.org operating agent, the U.S. DOE Office of Scientific and Technical Information (OSTI), and ICSTI member.
10:25   Intermission.
10:35 110 India - the drunken man's stupor: Eventually we reach...
Jignesh Bhate, jignesh@molecularconnections.com, Molecular Connections, Bengaluru Area, India
India powers the world's back office. On one end it has world class companies, driving chemical content and production for many chemical database companies and publishers. On the other, it has some really archaic policies and government regulations, which prohibit innovation & growth. It is a land of contrast. English is widely spoken, but the cultural nuances of different regions within India makes it not widely understood in the same context. We will provide an outsourcer's perspective on doing business in India and weigh its pros and cons vis-à-vis other outsourcing destinations like the Philippines and eastern Europe.
11:00 111 Challenges in sourcing and processing patent information from emerging markets
Andrew McFarlane, andrew.mcfarlane@thomsonreuters.com, IP Services, Thomson Reuters Inc,, London, United Kingdom
The focal point of information businesses is now no longer solely confined to traditional markets; it is now a truly global playing field, this is apparent in the extensive growth of patents being filed in emerging markets e.g. China and India. There are unique challenges in managing these documents - the sourcing, processing and searching of this information needs a different approach than what was historically adopted for traditional markets. The value proposition from these new information sources is also very different than the information generated from established markets. The presentation will concentrate on the enormous growth of patent literature (including chemical patent literature) in these markets and the challenges faced to source and process this information.
11:25   Moderated Panel Discussion.

 Philadelphia Marriott
Conference Room 307

Chemical Space: Challengers in Visualization and Mining
Jose Medina-Franco, Maciej Haranczyk, Organizers
Maciej Haranczyk, Presiding
8:00   Introductory Remarks.
8:05 112 TorsionAnalyzer: Interactive analysis and exploration of the conformational space
Christin Schärfer1, schaerfer@zbh.uni-hamburg.de, Tanja Schulz-Gasch2, Matthias Rarey1, Wolfgang Guba2. (1) Center for Bioinformatics (ZBH), University of Hamburg, Hamburg, Germany, (2) F. Hoffmann-La Roche Ltd, Basel, Switzerland
As the underlying conformational model has a major influence on the results of virtual screening applications, a closer insight into the conformational space of molecules is very important. In order to examine the torsion angle spaces of molecules we developed a new interactive software tool for conformation analysis called TorsionAnalyzer. The graphical tool analyzes the torsion angles of a conformation by using a predefined set of about 200 SMARTS patterns, each describing a torsion angle and its environment. For each pattern a list of allowed/usual angles is automatically derived from analyzing the torsion angle distribution in a user defined molecule set. The TorsionAnalyzer supports adding new patterns or modification of existing ones as well as the preparation and storage of different sets of patterns. Rotatable bonds of molecules are colored according to their classification into usual and unusual torsion angles using the list of allowed angles from the corresponding patterns.
8:25 113 Knowledge-based characterization of chemistry space and visualization
Barun Bhhatarai, bbhhatarai@med.miami.edu, Stephan Schurer. Center for Computational Science, University of Miami, Miami, Florida 33136, United States
Cheminformatics is often focused on managing and analyzing large amounts of data and their application to solve chemical problem. The meaningful extraction and curation of knowledge from existing, often un-integrated, data repositories requires better tools that address data management, analysis and, importantly, characterization and visualization of chemical space. We developed a new framework called SMARTNames to describe and formalize chemical information based on chemical-functional-groups (CFGs). In contrast to other approaches, our system integrates structure and function using a semantic knowledge representation of chemistry that captures chemists' insights to complex problems; yet it makes that knowledge accessible even to non-experts. We will present an analysis of several chemistry databases. To process the large datasets we developed a novel approach to compute and compare very large chemical libraries based on any combination of descriptors and to visualize unique and overlapping chemical spaces. We contrast this with classical approaches such as BCUTS.
8:45 114 Definition and visual exploration of the biologically relevant chemical space
Obdulia Rabal, Julen Oyarzabal, julenoyarzabal@unav.es. Small Molecule Discovery, Center for Applied Medical Research (CIMA) - University of Navarra, Pamplona, Navarra 31008, Spain
The definition and pragmatic implementation of the biologically relevant chemical space is critical in addressing navigation strategies in the overlapping regions where chemistry and therapeutically relevant targets reside, and therefore also key to performing an efficient drug discovery project. Here, we describe the development and implementation of a simple and robust method for representing the biologically relevant chemical space, independently of any reference space, and analyzing chemical structures accordingly. Underlying our method is the generation of a novel descriptor (LiRIf) that converts structural information into a one-dimensional string accounting for the plausible ligand-receptor interactions as well as for topological information. Capitalizing on ligand-receptor interactions as a descriptor enables the clustering, profiling and comparison of libraries of compounds from a chemical biology and medicinal chemistry perspective.
9:05 115 The chemical space mapplet: Interactive access to millions of molecules on your desktop
Jean-Louis Reymond, jean-louis.reymond@ioc.unibe.ch, Mahendra Awale, Lorenz C Blum, Julian Schwartz, Ruud van Deursen. Department of Chemistry and Biochemistry, University of Berne, Berne, Switzerland
The chemical space describes the ensemble of all organic molecules to be considered when searching for new drugs. How large is chemical space and what does it contain? To answer these questions, we report the Chemical Space Mapplet, an interactive desktop application for browsing visually through chemical space in analogy to the google-maps system for the earth. The application does not require prior knowledge of chemistry, and therefore opens chemical space to all, including non-specialists.
9:25 116 Quantification of the diversity of chemical libraries: The "Delimited Reference Chemical Subspaces" (DRCS) methodology
Luc Morin-Allory1,2, luc.morin-allory@univ-orleans.fr, Vincent Le Guilloux1,2, Lionel Colliandre1,2, Stephane Bourg3. (1) Institut de Chimie Organique et Analytique (ICOA), Université d’Orléans, Orleans, France, (2) UMR 7311, CNRS, Orleans, France, (3) Fédération de Recherche, “Physique et Chimie du Vivant” FR 2708, CNRS, Orleans, France
We present a new method based on the DRCS to quantify the diversity of chemical libraries. A set of 16 million commercial compounds has been gathered resulting in a database of 6.63 million standardized and unique molecules which have been used to create representative space. Using a robust PCA model the molecules are projected in a reduced 2D viewable space. Then the reduced space is delimited by a representative contour encompassing most of the molecules creating the DRCS. This allows a rapid and easy visual comparison of chemical libraries.
Moreover, the DRCS methodology is applied to compare the relative molecular diversity of chemical libraries using diversity indices which are independent of the size of the library. The delimitation of the chemical subspace enables the use of numerous mathematical methods to compute this diversity. This methodology is bundled in "Screening Assistant 2.0" a free and open-source JAVA software.
9:45 117 Interactive tools for navigating structure and property space
Lisa Peltason, lisa.peltason@roche.com, Daniel Stoffler. Department of Cheminformatics and Statistics, Pharma Research & Early Development Informatics, F. Hoffmann-La Roche Ltd., Basel, Switzerland
The question how structural features of active molecules are associated with biological responses is central to any small-molecule drug discovery project. Understanding this relationship provides the basis for making rational decisions towards the prioritization and optimization of molecules. In recent years, novel techniques for visualizing and analyzing structure-activity relationships (SARs) have been developed in academia and industry. This talk will illustrate how such innovative approaches are put into practice at Roche. Aiming at creating an interactive SAR analysis toolkit, existing innovative visualization techniques have been selected, enhanced with user interaction capabilities, and integrated into Roche's assay data query and analysis platform. With capabilities for obtaining a quick overview of a compound data set as well as for scrutinizing selected series in more detail, the tool provides complementary views on a SAR landscape and helps to address questions that are relevant in different stages of medicinal chemistry research projects./td>
10:05   Intermission.
10:15 118 High-dimensional activity landscape representations
Dagmar Stumpfe, Jürgen Bajorath, bajorath@bit.uni-bonn.de. Life Science Informatics, B-IT, University of Bonn, Bonn, Germany
Integrating chemical and biological activity space is a principal requirement for the generation of activity landscape models. The activity landscape concept has been extended beyond single targets to design selectivity or multi-target landscapes. As long as activity against only a few targets is monitored, activity space representation is not a difficult task. However, the situation is more complicated if large numbers of targets are considered. For example, when compound collections are profiled against subsets of the kinome, often including hundreds of kinases, high-dimensional activity spaces are obtained that are difficult to de-convolute and represent in activity landscape models. In this case, further advanced design approaches must be investigated that depart from conventional activity landscape modeling. This contribution introduces new approaches to the design of high-dimensional activity landscape representations.
10:35 119 Exploring activity landscapes through molecular reference structures
David Marcus, dm565@cam.ac.uk, Hamse Y Mussa, Andreas Bender, Robert C Glen. Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
Rationalizing structure-activity landscapes is crucial for designing molecules with the desired activity at biological targets. Available experimental data suggest that these landscapes may consist of distinct regions some of which divert from the molecular similarity principle, making it difficult to obtain a coherent picture of how structural changes and changes in bioactivity are related. One way of exploring bioactivity landscapes is by measuring the structural and activity differences of the molecules in the dataset in a pairwise manner exhaustively - i.e., each molecule is considered as a reference molecule in turn. However, this scheme can overemphasize the relevance of a significant subset of molecules with average activity in datasets. We propose a new approach that explores activity landscapes, which employs only a few molecular reference structures. We describe how the proposed approach works and can be utilized, for example, to guide experimentalists in exploring activity landscapes by visual and quantitative techniques.
10:55 120 Hole filling and library optimization: Application to commercially available fragment libraries
Woody Sherman, woody.sherman@schrodinger.com, Yuling An, Steven L. Dixon. Schrodinger, New York, NY 10036, United States
Compound libraries comprise an integral component of drug discovery in the pharmaceutical and biotechnology industries. While in-house libraries often contain millions of molecules, this number pales in comparison to the accessible space of drug-like molecules. In this work, we present an automated method to fill holes in an existing library using compounds from an external source and apply it to commercially available fragment libraries. The method, called Canvas HF, uses distances computed from 2D chemical fingerprints and selects compounds that fill vacuous regions while not suffering from the problem of selecting only compounds at the edge of the chemical space. We show that the method is robust with respect to different databases and the number of requested compounds to retrieve. We also present an extension of the method where chemical properties can be considered simultaneously with the selection process to bias the compounds toward a desired property space without imposing hard property cutoffs. Overall, the method presented here offers an efficient and effective hole-filling strategy to augment compound libraries with compounds from external sources. The method does not have any fit parameters and therefore it should be applicable in most hole-filling applications.
11:15 121 Visualizing chemical information
Krisztina Boda, krisztina@eyesopen.com, OpenEye Scientific Software, Santa Fe, New Mexico 87508, United States
The 2D structure diagrams can be considered to be the "natural language" of chemists, since the graphical representation allowsmolecules to be instantly conceivable. Historically, 2D representations have mainly been used to visualizethe connection table of molecular graphs. However, projecting information derived from 3D into the 2D layout can open up a novel way to present information to chemists. The new depiction toolkit, called Grapheme, provides several representation schemes that allow visualization of complex molecular properties in a clear and coherent 2D format that is the most natural to chemists.
11:35 123 SmallWorld: Efficient maximum common subgraph searching of large databases
Roger A Sayle1, roger@nextmovesoftware.com, Jose C Batista2, Andrew Grant2. (1) NextMove Software, Cambridge, Cambridgeshire CB4 0EY, United Kingdom, (2) Discovery Sciences, AstraZeneca, Alderley Park, Cheshire SK10 4TG, United Kingdom
We report a novel chemical database search method-based upon explicit representation of chemical space. A pre-computed index allows the exact size of the maximum common edge subgraph (MCES) between a query molecule and molecules in the index to be calculated rapidly. In practice, this allows the 100 nearest neighbors having the largest MCES to a query molecule to be determined in a few seconds even for target databases containing millions of molecules. This work builds upon the previous efforts of Wipke and Rogers in the late 1980s and of Messmer and Bunke in the 1990s, but takes advantage of the rapid advances in parallel processing power and storage technology now available to researchers. Data will be presented on the size of the index/chemical universe as a function heavy atom count and number of represented molecules.
11:55 122 Evaluation of data quality in currently available compound libraries
Ferenc Szalai, Márk Sándor, Gáspár Körtesi, Enikő Dorogi, Zoltán Szalai, Róbert Kiss, rkiss@mcule.com. mcule.com, Budapest, Budapest H-1096, Hungary
About 5-10 years ago, the amount of freely accessible chemical and biological data was one of the bottlenecks of efficient chemoinformatic model building. This situation has been significantly changed during the last few years, as now an enormously large amount of data is freely available e.g. in the ChEMBL database. Similarly, compound libraries of purchasable compounds applicable for virtual screening were rather rare in the last century, but several providers made serious efforts to collect the accessible chemical space into a single database. The amount of data does not seem to be an issue anymore. Data quality of these resources is, however, very diverse. In this presentation we are making an attempt to compare the quality of currently available large chemical libraries particularly focusing on data for available purchasable compound catalogs. We also demonstrate how data quality can affect the efficiency of virtual screening.

WEDNESDAY AFTERNOON

Philadelphia Marriott Downtown
Room 302/303

Cheminformatics Opportunities in Personalized Medicine and Chemogenomics
Christoph Steinbeck, David Wild, Organizers, Presiding
1:00   Introductory Remarks.
1:05 124 What works well together? Predicting synergistic combinations
Rajarshi Guha, guhar@mail.nih.gov, Lesley Mathews, Don Liu, Paul Shinn, Marc Ferrer, Craig Thomas. NIH NCATS, Rockville, MD 06040, United States
Many disease treatments make use of a single therapeutic agent. Butsuch approaches can lead to unwanted side-effects and resistance. Incontrast, combination therapies offer significant advantages in termsof reducing side effects and avoiding resistance and have beensuccessfuly employed in diseases such as a cancer, AIDs andmalaria. We have recently developed a high throughput approach toperforming combination screens that can rapidly identify synergisticand antagonistic combinations. To enhance this pipeline we haveinvestigated the use computational methods to prioritize combinationsthat are predicted to be synergistic. The approach makes use ofchemical structure and biological (targets and pathways) informationand ranks pairs of compounds in terms of whether they will exhibitbetter activity in combination than on their own. We investigatedmultiple predictive modeling approaches as well making use of networkcovariates from the underlying pathways. Our results indicate thatprediction of synergies can be a useful tool in combination screeningbut that the predictions can be suffer from missing mechanisticinformation.Many disease treatments make use of a single therapeutic agent. Butsuch approaches can lead to unwanted side-effects and resistance. Incontrast, combination therapies offer significant advantages in termsof reducing side effects and avoiding resistance and have beensuccessfuly employed in diseases such as a cancer, AIDs andmalaria. We have recently developed a high throughput approach toperforming combination screens that can rapidly identify synergisticand antagonistic combinations. To enhance this pipeline we haveinvestigated the use computational methods to prioritize combinationsthat are predicted to be synergistic. The approach makes use ofchemical structure and biological (targets and pathways) informationand ranks pairs of compounds in terms of whether they will exhibitbetter activity in combination than on their own. We investigatedmultiple predictive modeling approaches as well making use of networkcovariates from the underlying pathways. Our results indicate thatprediction of synergies can be a useful tool in combination screeningbut that the predictions can be suffer from missing mechanisticinformation.
1:25 125 Data-mining of pharmacogenetic data to predict and improve drug safety
John P Overington, jpo@ebi.ac.uk, Computational Chemical Biology, EMBL-EBI, Hinxton, Cambs CB10 1SD, United Kingdom
Adverse Drug Reactions (ADRs) are a significant health and economic problem, and no drug will ever have a completely safe and perfectly tolerated profile. Data integration and analysis techniques can now be applied to the current pharmacopeia, and compounds in historicial and current clinical development with the aim to identify patterns or rules useful in both drug discovery, development and clinical practice. The talk will address the challenges of assembling useful data from typically unstructured sources, and then some approaches to address and predict differential response and safety.
1:45   Intermission.
2:00 126 Chemogenomic approach for QSAR modeling of inhibition activity against five major cytochrome P450 isoforms
Sergii Novotarskyi1, Igor V. Tetko1,2, itetko@vcclab.org. (1) eADMET GmbH, Neuherberg, Germany, (2) Institute of Structural Biology, Helmholtz Zentrum Muenchen, Neuherberg, Germany
CYP enzymes metabolize over 75% of currently marketed drugs. Of these reactions over 90% are facilitated by CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4. Accurate prediction of inhibition activity of molecules against CYP enzymes is particularly important in the field of personalized medicine. Chemogenomic descriptors obtained from docked protein-ligand complexes were used in this work. The quality of the descriptors was benchmarked in QSAR modeling of HTS data for human CYP450 inhibition. The training and validation sets for the benchmarked models were obtained from PubChem BioAssay database. The models achieved 82 - 87% of correctly classified compounds on the validated training set and 65-75% of correctly classified instances on the test sets. The applicability domain allowed achieving the accuracy of 90% of correctly classified instances on the subset of 20% most confident predictions of the test sets (models and datasets are publicly available at http://ochem.eu).
2:20 127 Activity landscape modeling of PPAR ligands with dual-activity difference maps
Oscar Méndez-Lucio1, oscarmen@comunidad.unam.mx, Jaime Pérez-Villanueva2, Rafael Castillo1, José L Medina-Franco3. (1) Departamento de Farmacia, Universidad Nacional Autónoma de México, Mexico, DF 04510, Mexico, (2) Departamento de Sistemas Biológicos, División de Ciencias Biológicas y de la Salud, UAM-X, Mexico, DF 04960, Mexico, (3) Torrey Pines Institute of Molecular Studies, 11350 SW Village Parkway, Port St. Lucie, FL 34987, United States
Activation of PPAR subtypes offers a promising strategy for the treatment of diabetes mellitus and some of its risk factors. Herein, we report a systematic description of the SAR of 168 compounds screened against the three PPAR subtypes using the principles of activity landscape modeling. We employed consensus dual-activity difference maps recently reported. The analysis is based on pairwise relationships of potency difference and structure-similarity, which was calculated from the combination of four different 2D and 3D fingerprints. Dual-activity difference maps uncovered regions in the landscape with similar SAR as well as regions with inverse SAR. Analysis of pairs of compounds with high structure similarity revealed the presence of single-, dual-, and 'pan-receptor' activity cliffs. Single-, dual-, and pan-receptor scaffold hops are also discussed. The analysis of the chemical structures of selected data points suggests specific structural features that are helpful for the design of new PPAR agonists.
2:40 128 Combining HTS in vitro assays with in silico descriptors for liver toxicity modeling
Ahmed M Abdelaziz, contact@amaziz.com, Igor V Tetko. Institute of Structural Biology, HelmholtzZentrum Muenchen, Neuherberg-Munich, Bavaria 85764, Germany
There is an increasingly available sum of in vitro screening data from experimental assays. This embodies a new dimension of experimental knowledge that can help in predicting chemical toxicity along with traditional in silico desrciptors. The iPRIOR platform http://toxcast.ochem.eu was developed based on the Online Modeling Environment to build quantitative activity relationship models for in vivo toxicity from ToxRefDb exploiting HTS in vitro responses from ToxCast phase I data. The platform consists of two integrated subsystems: The database of ToxCast experimental measurements and the modeling framework. It incorporates in silico descriptor packages from both commercial and academic domains which were utilized to develop the models. We show that hybrid models, which incorporate both in vitro parameters and in silico descriptors, provided higher accuracy for prediction of liver toxicity compared to the separate use of individual descriptors. The in vitro parameters also expand the applicability domain of models.

 Philadelphia Marriott
Conference Room 307

Informatics Approaches to Materials Design
Maciej Haranczyk, Ian Bruno, Organizers
Ian Bruno, Maciej Haranczyk, Presiding
1:30   Introductory remarks.
1:35 129 Data mining and crystal structure informatics in pharmaceutical drug development
Magali Hickey1, magali.hickey@alkermes.com, Peter Wood2, Mark Oliveira1, Heather Clarke3, Michael Zaworotko3, Orn Almarsson1. (1) Alkermes, Inc., Waltham, MA 02451, United States, (2) Cambridge Crystallographic Data Centre, Cambridge, United Kingdom, (3) University of South Florida, Tampa, Florida 33620, United States
Crystallography and crystal structure mining have become recognized as powerful tools in pharmaceutical R&D. Progress in the past two decades is evident in drug discovery and drug product development. Structural analysis of small molecules (molecular weight
2:00 130 Elaboration of weak intermolecular forces involving terminal alkynes
Eric Bosch, ericbosch@missouristate.edu, Department of Chemistry, Missouri State University, Springfield, Missouri 65897, United States
I will describe the elaboration of supramolecular synthons involving weak intermolecular interactions of terminal alkynes. The first test of these interactions involves mining of structural databases for non-bonded distances and angles that support the central hypothesis. In our experience the positive examples found in the database generally involved single molecules containing both functional groups with favorable relative orientations. The next level in the examination of these weak interactions involves the deliberate synthesis of molecules that contain both functional groups in order to examine the reliability of the interaction. A higher level test of the versatility of these interactions as tools in crystal engineering involves incorporating them in different molecules and examining their ability to direct co-crystallization.
2:25 131 Role of CSD-derived structural informatics approaches in solid form design strategy
Neil Feeder, feeder@ccdc.cam.ac.uk, Peter T A Galek, Elna Pidcock, Peter A Wood. The Cambridge Crystallographic Data Centre, Cambridge, United Kingdom
Uncontrolled crystal form polymorphism can have a critical impact on pharmaceutical drug product robustness, exemplified by Norvir [1] and Neupro [2]. The Norvir example illustrates how such polymorphism can be driven by a stronger set of hydrogen bonds in the stable form. At the CCDC we are developing structural informatics approaches to solid form design, including the Logit Hydrogen-Bonding Propensity method which would have predicted the likely existence of the more stable polymorph of ritonavir (Norvir)[3]. Software including such methodologies is being developed under the guidance of the Crystal Form Consortium (CFC); a partnership between the CCDC and eleven global pharmaceutical/agrochemical companies. Here we will describe the potential application of these methodologies to minimise risk in solid form design.
[1] J. Bauer, et al, Pharm. Res., 2001 , 18, 859-866.
[2] S. Cajigal, Neurology Today, 2008, 8, 1 & 8.
[3] P. T. A. Galek et al, CrystEngComm, 2009 , 11, 2634-2639.
2:50   Intermission.
3:05 132 Informatics method to bridge gap between experimental results and simulation for carbon nanotube reinforced composites
Tammie L Borders1, tammie.l.borders@gmail.com, Alexandre F Fonseca2, Hengji Zhang3, KJ Cho3,4. (1) Lockheed Martin, United States, (2) Department of Physics, UNESP - Sao Paulo State University, Bauru, SP 17033-360, Brazil, (3) Department of Physics, University of Texas at Dallas, Richardson, TX 75080, United States, (4) Department of Materials Science and Engineering, University of Texas at Dallas, Richardson, TX 75080, United States
Real-world mechanical property improvement from carbon nanotube (CNT) reinforced composites has been inconsistent with predictions, regardless of type and volume fraction of CNTs used. To bridge the gap between prediction and experiment, we have developed descriptors and quantitative structure property relationships (QSPRs) for simple CNT systems and refined these descriptors and models on experimental data from more complex CNT systems. Specifically, we will present a QSPR model that captures load transfer improvement and wall stiffness decrement due to inter-wall cross-links. The model and descriptors were built from small-diameter, short, double-walled CNTs whereas the experimental data is acquired from large-diameter, long, multi-walled CNTs. It will be shown that complex systems can be accurately represented through combining results from simple systems, reducing simulation time and optimizing simulation reuse. By refining descriptors with experimental data, we can overcome the disagreements between prediction and experiment.
3:30 133 Creating a chemical informatics approach to accelerating materials simulations
Michael P. Krein1, michael.krein@lmco.com, Gregory S. Ho1, gregory.s.ho@lmco.com, Jason J. Poleski2, Richard R. Barto1. (1) Lockheed Martin Advanced Technology Laboratories, Cherry Hill, NJ 08002, United States, (2) Lockheed Martin Mission Systems and Sensors, Moorestown, NJ 08057, United States
Ostwald ripening in catalyst nanoparticles has been shown to be responsible for the early termination of growth of carbon nanotubes, a major barrier for the realization of high-performance nanocomposites. Ostwald ripening occurs on a timescale of minutes, and existing atomistic computational models of Ostwald ripening are limited to much shorter timescales. In addition, parameters needed for an accurate simulation (e.g., atomic migration barrier height, surface interaction energies) are not easily accessible. We have created a materials informatics framework that automatically maps the parameter space of kinetic Monte Carlo (kMC) simulations of Ostwald ripening in iron catalyst particles. In this framework, descriptors that encapsulate simulation results are automatically created, Quantitative Structure Activity Relationship (QSAR) models are built, and new simulations based on previous QSAR results are launched and evaluated. We show that our framework can bridge the timescale gap and thus allow for accelerated design of materials.
3:55 134 Descriptors and search tools for porous materials
Richard L Martin, richardluismartin@lbl.gov, Maciej Haranczyk, mharanczyk@lbl.gov. Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
Modern computational techniques such as molecular simulations and electronic structure calculations allow for accurate prediction of the guest molecule-related properties of an individual porous material. However, the chemical search space of porous materials is vast, with over 2.5 million materials hypothesized in the class of zeolites alone. How to select which materials to examine with these techniques is an open problem, requiring novel methods for the comparison of materials. Traditionally, porous materials have been compared through visual inspection of their structures, infeasible for a very large number of materials, or with one-dimensional pore descriptors such as free sphere diameter, which provide only a very narrow view of a material, encoded as a single number. In this presentation we demonstrate novel descriptors for porous materials - influenced by concepts in cheminformatics - and their application to the comparison and searching of large datasets of porous materials.
4:20 135 Multicomponent organic solar cells: High performance multi-objective searches
Geoffrey R Hutchison, geoffh@pitt.edu, Department of Chemistry, University of Pittsburgh, Pittsburgh, PA 15260, United States
Our group pioneered the use of high-throughput genetic algorithm searches for organic photovoltaic materials. We have improved on our previous work using an enlarged molecular search space, and an effort to find multi-component cells with optimally matched properties. We will outline our efforts to use multi-objective searches to generate lead compounds for high-efficiency single-junction and tandem devices.

THURSDAY MORNING

Philadelphia Marriott
Franklin Hall 6

General Papers Chemical Databases, Drug Discovery, and Chemical Structure Representation
Rachelle Bienstock, Organizer, Presiding
8:30 136 Relative drug likelihood: Going beyond "drug-likeness"
Matthew M Segall, matt.segall@optibrium.com, Iskander Yusof. Optibrium Ltd, Cambridge, ... CB25 9TL, United Kingdom
Many approaches have been used to characterise compounds as 'drug-like' or not based on the similarity of simple properties of a compound, e.g. molecular weight, to those of known drugs. However, having a 'similar' property to known drugs does not necessarily mean that a compound is more likely to become a drug. We propose an extension to 'drug likeness' approaches, based on an assertion that a desirable value of a property is one that increases the probability of identifying a drug. Using Bayesian approaches we can estimate the relative likelihood of a compound being a drug by comparing the distributions of properties for drugs and non-drugs. We will demonstrate that this offers improved performance for the identification of drugs and provides insights into which characteristics provide the greatest discrimination between successful drugs and unsuccessful drug discovery compounds.
8:50 137 Statistical significance of 3D molecular similarity scores
Sunghwan Kim, kimsungh@ncbi.nlm.nih.gov, Evan E Bolton, Stephen H Bryant. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
Although the use of 3-D similarity techniques in the analysis of biological data and virtual screening is pervasive, little has been known about the statistical meaning of similarity scores from these methods. To address this issue, the similarity value distribution curves for randomly selected compounds were generated using six different 3-D similarity types utilized by PubChem analysis tools. An attempt was also made to explore the question of whether it was possible to realize a statistically meaningful 3-D similarity value separation between reputed biological assay actives and inactives. In addition, the complementarity between PubChem's 2-D and 3-D similarity methods was investigated. This work is a critical step to create a statistical framework to build upon and will help to develop search and analysis tools that exploit 3-D molecular similarity.
9:10   Intermission.
9:15 138 Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform
David Sharpe2, sharped@rsc.org, Antony Williams1, Valery Tkachenko1, Colin Batchelor2, Kenneth Karapetyan1. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) Cheminformatics, Royal Society of Chemistry, Cambridge, United Kingdom
The production of valid and appropriate chemical structure representations which are appropriate for deposition into chemical structure databases and for inclusion into scientific publications requires adoption of a set of pre-processing filters and standardization procedures. As part of our ongoing effort to improve the quality of data for deposition into the RSC ChemSpider database, to provide a manner by which to validate and prepare data for publication and to provide a valuable service to the chemistry community, we have delivered an online service. This website provides access to an intuitive user interface for the upload of chemical compounds in various formats, pre-processing and standardization relative to a defined set of standards and validation checking of the chemicals according to a number of rules including hypervalency, absence of stereochemistry and charge balance. This presentation will report on the development of this validation and standardization service.
9:35 139 How can the International Chemical Identifier (InChI) be extended to non-trivial chemicals?
Valery Tkachenko1, tkachenkov@rsc.org, Antony J Williams1, Yulia Borodina2, Frank Switzer2, Tyler Peryea2, Larry Callahan2. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) FDA, Silver Spring, MD 20993, United States
In recent years there has been a dramatic increase in the number of databases of chemical substances that have become available, especially online and in the public domain. While many of these databases contain small molecules that can be explicitly defined using molecular connection tables and InChIs many of them also contain chemicals of biological interest such as synthetic polymers, polypeptides, polynucleotides, etc. A critical capability of any database is a unique identifier which allows for the de-duplication of entries and InChI has become increasingly popular for this purpose. However despite many impending developments for InChI (polymer InChIs, Reaction InChIs, etc) the area of biological chemistry support using a standard approach remains a challenge. This presentation will analyze an approach to address this problem.
9:55 140 Serving up and consuming community content for chemists using wikis
Antony J Williams1, williamsa@rsc.org, Alexey Pshenichnov1, Valery Tkachenko1, Aileen Day2, Sean Ekins3. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) Cheminformatics, Royal Society of Chemistry, Cambridge, Cambridgeshire CB4 0WF, United Kingdom, (3) Collaborations in Chemistry, Fuquay Varina, NC 27526, United States
Wikipedia has become the world's most famous encyclopedia using as it's platform the MediaWiki open source software. The software is supported not only by the MediaWiki foundation but by a community of developers who build widgets and add-ons to extend the capabilities. This presentation will review how MediaWiki has been used as a container for a number of resources of value to chemists, specifically SciMobileApps, SciDBs and ScientistsDB holding content regarding mobile scientific apps, scientific databases and scientists. We will also review how chemistry content within Wikipedia has been used to enhance the content underlying the RSC ChemSpider database and how the platform supports an educational environment for chemistry students.
10:15   Intermission.
10:20 141 Use of an institutional repository to promote chemical sciences collections at the University of Florida
Donna T. Wrublewski1, dtwrublewski@ufl.edu, Lois J. Widmer2, Laurie N. Taylor2, Dina Benson2. (1) Marston Science Library, University of Florida, Gainesville, Florida 32611, United States, (2) Digital Library Center, University of Florida, Gainesville, Florida 32611, United States
This talk highlights two collaborative projects undertaken by the Chemical Engineering Subject Librarian, the University of Florida Digital Library Center (DLC), and the Chemical Engineering Department. In both cases, the Department reached out to the Subject Librarian, who identified the resources of the DLC as being able to meet the need for access and preservation for the collections. The first project created an online archive for an international conference hosted at the University for which proceedings were only distributed digitally to attendees. The Subject Librarian disambiguated author-provided keyword metadata, resulting in a more cohesive search experience for users. The second resulted in back issues of the journal Chemical Engineering Education being made freely available. These projects serve to both promote the University and Department scholarship, as well as foster closer library-department ties. The project timelines and challenges encountered will be discussed, and workflow recommendations for similar projects will be presented.
10:40 142 Linked data and the globally harmonized health and safety (GHS) system
Jeremy G Frey, j.g.frey@soton.ac.uk, Mark I Borkum, Simon J Coles, David Kinnison, Francesco Cuda. Deaprtment of Chemistry, University of Southampton, Southampton, Hants So17 1BJ, United Kingdom We have converted the CLP Regulation (the UK implementation of the EU implementation of the UN legislation) into an RDF graph. This allows us to create and augment systems such as the COSHH form generator. In general this can make the legislation accessible to machines facilitating an inversion of control, moving accountability from the safety officer to the individual. The power of this approach is currently limited by the difficulties in linking the chemical identifiers used in the GHS tables with other chemical identifiers in a machine resolvable manner and we discuss how this may be improved. We discuss how each record in the GHS is essentially a reification that links one or more substances to one or more labelling and classification entities and how these relationships are best visualized.

THURSDAY AFTERNOON

Philadelphia Marriott
Franklin Hall 6

Legal, Patent, and Digital Rights Management in Publishing Cosponsored by PROF
Judith Currano, Charles Huber, Organizers
Judith Currano, Presiding
1:00   Introductory Remarks.
1:10 143 How to find references that inherently anticipate pharmaceutical patents
David Gange, DavidGange@Altimatia.com, Altimatia Patent Research, Pennington, NJ 08534, United States
Recent decisions by the Court of Appeals for the Federal Circuit have broadened the scope of inherent anticipation and opened a new line of attack for people seeking to invalidate pharmaceutical patents. In this talk, I'll briefly discuss the relevant Federal Circuit decisions and provide examples of successful inherent anticipation searches. Search strategies to uncover inherent anticipation references will also be discussed.
1:30 144 Digital rights drain? Implications for library services
Leah Solla, leah.solla@cornell.edu, Cornell University, Ithaca, NY 14853, United States
Digital content is a hot commodity and academic research is caught in a perfect storm as the market, technology and the insatiable desire to be online converges on the traditional book genre. Will regulating use of content through technological means really increase innovation in feature development? Will libraries be able to negotiate licenses to offer content to users in usable form and continue to serve such research community expectations as interlibrary loan and preservation? Will researchers be able to intellectually engage with a critical mass of books? This talk will explore the juggling act of academic libraries we journey into the storm of DRM-encrusted digital content.
1:50 145 Digital rights management and e-books: Perspectives from a research library
Tara Cataldo, Donna T. Wrublewski, dtwrublewski@ufl.edu. Marston Science Library, University of Florida, Gainesville, Florida 32611, United States
The adoption of e-books theoretically brings several benefits: easier patron access and library cost and space savings are just a few. However, Digital Rights Management (DRM) implementation, although “necessary” from a publisher standpoint, needs to be streamlined and simplified in order to provide the most ease of use. This talk will discuss some issues that have arisen at the University of Florida Marston Science Library with regards to e-books, where there are currently 31 different e-book platforms available. Most issues have involved explaining procedures to patrons, including checkout, download and printing restrictions. These policies can vary widely among publishers, making it very confusing for patrons. Some potential avenues for both patron and librarian education will be presented. From a collection development standpoint, DRM potentially thwarts regional collaborative collection building that is becoming a priority as library budgets continue to erode. Examples and recommendations in this area will also be discussed.
2:10 146 Right to retain: The problems of preserving digital content
Ian Bogus1, Judith N. Currano2, currano@pobox.upenn.edu. (1) Van Pelt-Dietrich Library Center, University of Pennsylvania, Philadelphia, PA 19104, United States, (2) Chemistry Library, University of Pennsylvania, Philadelphia, PA 19104-6323, United States
Digital content has made research easier and quicker, and, as the technology improves, we will be able to do even more with our data. University libraries have seen a mixed blessing: information has never been more easily provided to patrons, but retaining access to it for the long term has never been more difficult. Technological, social, and legal issues complicate the preservation of digital content. Analog media had straightforward freedoms: the rights to trade, sell, backup, and share the original. Digital media is much more difficult, with problems stemming from complicated licensing agreements, copyright restrictions, and digital rights management (DRM). In some cases, "buyers" do not actually own the content that they “purchase,” restricting what they can do to preserve it. In other cases, the laws themselves keep individuals and institutions from taking advantage of legal rights, such as the Digital Millennium Copyright Act's criminalization of bypassing DRM for backup and preservation purposes.
2:30 147 Finding an alternative to restrictive digital rights mangement: The Momentum Press approach
David Parker, david.parker@businessexpertpress.com, Adam Chesler. Business Expert Press, New York City, NY 10017, United States
The ease with which material can be shared via the Internet has created new levels of unease for publishers, who worry that with the push of a single button, their copyrighted books could end up freely accessible to millions. But is the solution really the implementation of draconian digital rights management restrictions, which can create impediments for authorized users, especially in institutional environments? For Momentum Press, the answer is “no:” the user (and buyer) experience is paramount and while risks exist, a new publisher has the luxury of adopting unorthodox methods to face these challenges. MP has opted for an approach that ensures broadest possible use amongst authorized users, and experience is proving that accessibility isn't a problem, it's a solution.
2:50   Roundtable Discussion.
3:50   Concluding Remarks.