Technical Program with Abstracts

ACS Chemical Information Division (CINF)
248th ACS National Meeting, Fall 2014
San Francisco, CA (August 10-14, 2014)

CINF Symposia

E. Bolstad, Program Chair

[Created Fri Jul 25 2014, Subject to Change]

Sunday, August 10, 2014

Nature's Second Act: Revisiting Natural Products - AM Session

Palace Hotel
Room: Marina

Roger Schenck, Organizers
Roger Schenck, Presiding
9:40 am - 12:00 pm
9:40 Introductory Remarks
9:45 1 Applying Royal Society of Chemistry cheminformatics skills to support the PharmaSEA project

Antony J Williams1, tony27587@gmail.com, Valery Tkachenko1, Alexey Pshenichnov1, Ken Karapetyan1, David Sharpe2, Colin Batchelor2. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina 27587, United States, (2) eScience, Royal Society of Chemistry, Cambridge, United Kingdom

The collaborative project PharmaSea brings European researchers to some of the deepest, coldest and hottest places on the planet. Scientists from the UK, Belgium, Norway, Spain, Ireland, Germany, Italy, Switzerland and Denmark are working together to collect and screen samples of mud and sediment from huge, previously untapped, oceanic trenches. The large-scale, four-year project is backed by almost 10 million euros of funding and brings together 24 partners from 13 countries from industry, academia and non-profit organisations. The PharmaSea project focuses on biodiscovery research and the development and commercialisation of new bioactive compounds from marine organisms, including deep-sea sponges and bacteria, to evaluate their potential as novel drug leads or ingredients for nutrition or cosmetic applications. The Royal Society of Chemistry is responsible for developing a number of capabilities to support the Pharmasea project including a chemical registration system for new compounds, dereplication technologies to assist in the identification of new compounds and search techniques for mass spectrometrists within the project. This presentation will provide an overview of the project and our progress to contributing chemical information technologies to support the effort.

10:15 2 From publishing to recognition – indexing literature for natural products

David Evans1, david.evans@reedelsevier.ch, Pieder Caduff1, Juergen Swienty-Busch2. (1) Reed Elsevier Properties SA, Neuchâtel, Switzerland, (2) Elsevier Information Systems GmbH, Frankfurt, Germany

Natural products build an inspiring source for Medicinal Chemists in their ever-challenging journey of developing new and more efficient drugs. Finding reliable information about natural products quickly, including their isolation from biological sources, their total synthesis and their bioactive properties is becoming more and more important in a highly competitive industry and environment. We will explain Reaxys content excerption rules for this exciting scientific field and demonstrate use cases showing how the information can be retrieved and applied in Medicinal Chemistry information workflows

10:45 Intermission
11:00 3 Hazardous substances data bank: A tool for natural product information and research

Shannon M. Jordan, shannon.jordan1@nih.gov, National Institutes of Health, National Library of Medicine, Bethesda, MD 20894, United States

The National Library of Medicine (NLM) Hazardous Substances Data Bank (HSDB) is a database that contains a wealth of information on many types of chemicals including natural products. Increasingly, professionals and the general public seek and utilize information about natural products for various purposes. In response to this demand, the HSDB development team has increased the number of natural product records within the database and updated existing records. Natural product records in HSDB contain, but are not limited to the following information: chemical structure, human health effects, animal toxicity, pharmacology, metabolism and pharmacokinetics, environmental fate and exposure, safety and handling, manufacturing, use, laboratory methods, and more. The data extraction team along with a Scientific Review Panel (SRP) utilizes dozens of sources to build, update, and peer-review HSDB records on a four month cycle. As new articles on natural products are published, HSDB serves as an information resource that captures historical and emerging science. NLM will continue to develop and market HSDB as a tool for researchers and the general public covering all types of natural products from phytochemicals to venoms and toxins.

11:30 4 Real structures for real natural products − really getting them right and getting them faster

Patrick Wheeler1, pwheeler@yahoo.com, Antony Williams3, Mikhail Elyashberg2, Rostislav Pol2, Arvin Moser1. (1) Advanced Chemistry Development, Toronto, Ontario M5C 1B5, Canada, (2) Advanced Chemistry Development, Moscow, Russian Federation, (3) Royal Society of Chemistry, London, United Kingdom

Structure determination for natural products has been revolutionized by the advance of NMR technology and application of innovative experimental techniques. Notably, it is possible to obtain structures from small amounts of material that are not accessible to single crystal X-ray diffraction. Still, the interpretation of this data can be arduous, requires great expertise, and is error-prone. Of course, other techniques are used to confirm structures as well: synthetic reproduction of natural products has a long tradition of success in the elucidation of molecules containing intricate elements, including multiple stereo centers. However, despite rigorous analysis by qualified chemists, these methods still sometimes arrive at erroneous results1-5.
Astute application of modern technology can speed the rate at which structures are solved, while also vastly reducing errors that result either from synthetic methods or from unassisted analysis of instrumental data. Computer Assisted Structure Elucidation (CASE) has developed over the past decades to relieve the burden of work in proving correct structures. In this presentation, we will discuss how CASE is used to objectively analyze complex sets of NMR data in order to test structural hypotheses, conduct de novo structure elucidation, and query large databases of known structures for matches of already identified natural products.

Sunday, August 10, 2014

Hunting for Hidden Treasures: Chemistry Text Mining in Patents and Other Documents - AM Session

Palace Hotel
Room: Presidio

Wei Deng, Organizers
Wei Deng, Presiding
8:40 am - 12:00 pm
8:40 Introductory Remarks
8:50 5 When your language is science: Abstracting, classifying, and indexing patents in the Derwent World Patents Index

Donald Walter, don.walter@thomsonreuters.com, IP Solutions, Thomson Reuters, Alexandria, VA 22314, United States

The Derwent World Patents Index® (DWPISM) collects patents from 50 countries in 30 languages – not all of them using the Roman alphabet – and creates from them high quality English abstracts, classifications and indexes. This talk will outline how we deal with them, including our human, and human assisted translations by language and technical experts; how we organize the information into fielded abstracts using clear editorial guidelines to provide consistent records; and how we correct errors in the information sent to us from the patent offices for use in DWPI.

9:20 6 Chemistry and reactions from non-US patents

Daniel M Lowe, daniel@nextmovesoftware.com, Roger A Sayle. NextMove Software, Cambridge, Cambridgeshire CB4 0EY, United Kingdom

All US patents from 1976 onwards are freely available in computer-readable formats providing a large corpus for chemical text mining. Other patent offices are increasingly also offering their back-catalogues as XML, allowing chemical text mining to be performed in the same way as for recent US patents. We investigate how much chemistry is found in non-US patents (compared to US patents) and, where the chemistry is present in publications from multiple patent offices, how long were the delays between these publications. We show that non-US patents can be text mined for a large number of chemical reactions and analyse the overlap with reactions from US patents. Finally we use all the extracted chemical reactions to explore whether models for predicting reaction yield may be built from features such as the reaction type and its reaction conditions (as text mined from the patent text).

9:50 7 Teach Document-to-Structure to be trilingual: Extract, display, and search chemical information within English, Chinese, and Japanese patents

David Deng, ddeng@chemaxon.com, Daniel Bonniot. ChemAxon, Cambridge, MA 02142, United States

By expanding Naming, a reliable chemical name-to-structure technology, ChemAxon has developed a suite of chemistry text mining tools. The core application is Document-to-Structure, which can extract chemical information from patent and other documents. Document-to-Structure includes numerous functions to overcome the challenges in patent mining:

  1. Implemented OCR technology for non-text patent document. A correction algorithm will identify OCR errors and correct the names before converting to structures.
  2. Easy Integration with different image-to-structure software to extract structure images.
  3. In addition to English, Asian language support for Chinese and Japanese patent mining.
  4. Annotate a document with a single mouse click: create a new document with chemical information "annotated". Mouse over the chemical name to display the structures.
  5. With ChemAxon's chemistry search function, the extracted structure information can be searched, which makes identifying a compound in a patent document much faster and easier.

This presentation will demonstrate various text mining applications, including extracting structures from chemical patents using Document-to-Structure; searching the patent structure database with Document-to-Database; and interactively displaying chemical information in patent documents with Document Annotation.

10:20 Intermission
10:35 8 Chemically aware text mining platform

David Milward, David.Milward@linguamatics.com, Andrew Hinton, andrew.hinton@linguamatics.com. Linguamatics Limited, 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, United Kingdom

The greater availability of patent content in recent years has led to several chemically aware search systems providing unprecedented access to chemical information in patents. In this paper we will describe a chemically aware text mining system and show how this can provide finer-grained access to patent information in both English and Chinese, addressing challenges of patent searchers and medicinal chemists. This work resulted from a research partnership between Linguamatics and Chemaxon, leading to a close integration of chemical name to structure, structure drawing, substructure and similarity searching into the Linguamatics I2E text mining platform. We will show progress in addressing automatic extraction of structure activity relationships (SAR) from patents, including understanding of data found within tables, and connection of information originating from different parts of the document. We will also discuss Markush structure recognition including linking of chemical scaffolds and R-group information within particular claims.

11:05 9 CHEMDNER task: Automatic recognition of chemical entities in text

Martin Krallinger1, Obdulia Rabal2, Florian Leitner1, Julen Oyarzabal2, julenoyarzabal@unav.es, Alfonso Valencia1. (1) Structural Biology and Computational Biology, Spanish National Cancer Research Centre (CNIO), Madrid, Spain, (2) Small Molecule Discovery Platform, Center for Applied Medical Research (CIMA) - University of Navarra, Pamplona, Spain

There is an increasing interest, both on the academic side as well as for industry, to facilitate more efficient access to information on chemical compounds and drugs (chemical entities) described in repositories of unstructured data, including scientific articles, patents or health agency reports. In order to achieve this goal, a crucial aspect is to be able to identify mentions of chemical compounds automatically within text as well as to index whole documents with the compounds described in them. The recognition of chemical entities is also crucial for other subsequent text processing strategies, such as detection of drug-protein interactions, adverse effects of chemical compounds and their associations to toxicological endpoints or the extraction of pathway and metabolic reaction relations. Despite its importance, only a very limited number of publicly accessible chemical compound recognition systems have been released. In contrast to this, a considerable number of methods and strategies to recognize chemicals in text have been proposed. One of the main bottlenecks currently encountered to implement and compare the performance of such systems is the (a) lack of suitable training/test data, (b) the intrinsic difficulty in defining annotation guidelines of what actually constitutes a chemical compound or drug, (c) heterogeneity in terms of scope and textual data sources used, as well as (d) limited evaluation efforts carried out so far. A total of 27 teams submitted results for the proposed CHEMDNER task. Teams were provided with the manual annotations of 7,000 abstracts to implement and train their systems and then had to return predictions for the 3,000 test set abstracts during a short period of time. When directly comparing the automated results against the manually labeled Gold Standard annotations, the best team reached an F-score (the harmonic mean between precision of coverage) of 87% on the Chemical Entity Mention (CEM) task (http://www.biocreative.org/tasks/biocreative-iv/chemdner)

11:35 10 Structuring the unstructured: Creating knowledge through visual analytics and the use of Tibco Spotfire with Attivio for text analytics of scientific patents

Philip J Skinner1, philip.skinner@perkinelmer.com, Joshua A Bishop1, Josh.Bishop@PERKINELMER.COM, Alexandia Vamvakidou1, Megean Schoenberg1, Sameer Nori2, Matt Connon2. (1) PerkinElmer, Waltham, Massachusetts 02451, United States, (2) Attivio, Newton, MA 02466, United States

The growing adoption of visual analytics tools such as Tibco Spotfire has revolutionized the way that scientists interpret structured data, such as SAR analyses conducted by medicinal chemists. There remains, however, a wealth of valuable insight contrained within unstructured content sources such such as patents and scientific literature, where critical information is contained. These text based sources do not fit nicely within the traditional model of organizing information as database records and require other techniques to derive insight from them.
Text analytics technologies such as those developed by Attivio, in combination with visual analytics tools, can uncover new trends and patterns within unstructured content sources. We will outline relevant usecases where such technologies were applied to develop the research strategies of scientific organizations by incorporating analysis around identification of research adjacencies, key opinion leaders and geographical and historical trends of research.

11:55 Concluding Remarks

Sunday, August 10, 2014

Computational Methods and the Development/Production of Biologics and Biosimilars - AM Session

Palace Hotel
Room: California Parlor

Rachelle Bienstock, Organizers
Rachelle Bienstock, Presiding
8:30 am - 9:35 am
8:30 Introductory Remarks
8:35 11 Classification, representation, and analysis of cyclic peptides and peptide-like analogs

Roger A Sayle, roger@nextmovesoftware.com, Daniel M Lowe, Noel M O'Boyle. NextMove Software, Cambridge, CAMBS CB4 0EY, United Kingdom

Wikipedia defines a cyclic peptide as a polypeptide chain wherein the amino terminus and the carboxyl terminus, amino terminus and sidechain, carboxyl terminus and sidechain or sidechain and sidechain are linked to form a ring. The awkwardness of this definition reflects the multitude of ways that peptide-like sequences can cross-link to form macrocycles. Together with the great diversity of non-standard amino acid monomers, the myriad topologies and architectures available to cyclic peptides enable the chemical diversity that has resulted in their prevalence amongst natural products and synthetic small molecule libraries. In this presentation, we consider some of the informatics challenges of recognizing and representing homodetic and heterodetic peptides, stapled peptides, disulfide bridge patterns and other polycyclic peptides. Amongst the complications are that covalently cross-linked sidechains may have multiple possible (degenerate) primary sequences, requiring the selection of a preferred canonical form during biological registration. This talk will present examples and statistics drawn from the PubChem and ChEMBL databases.

8:55 12 Non-covalent interactions in protein-ligand interactions: Applications of halogen bonds and carbon bonds in designing PTSD drugs

Suman Sirimulla, suman.sirimulla@nau.edu, Chemistry & Biochemistry, Northern Arizona University, Flagstaff, Az 86011, United States

Currently, there is a growing attention for non-covalent interactions such as halogen bonds and carbon bonds in protein-ligand interactions. Data mining of halogen bonds and carbon bonds in the Protein data bank was performed and the statisical analysis of these results will be presented. Successful applications of these non-covalent interactions will be illustrated in designing drugs for Post-Traumantic Stress Disorder (PTSD) syndrome.

9:15 13 BCL:Conf A knowledge based ligand flexibility algorithm and application in computational drug discovery like online drug design game Foldit

Sandeepkumar K Kothiwale1, sandeepkumar.k.kothiwale@vanderbilt.edu, Jens Meiler1,2, Will Lowe1. (1) Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States, (2) Pharmacology, Vanderbilt University, Nashville, Tennessee 37235, United States

The three-dimensional conformation a small molecule can adopt is critical for its binding to a target protein. Therefore, rapid and accurate prediction of the conformational space of a small molecule is critical for both structure- and ligand-based drug discovery algorithms such as docking or quantitative structure-activity relationships, respectively. Here we have derived a database of small molecule fragments frequently sampled in experimental structures within the crystallographic structure database (CSD) and the protein databank (PDB). Likely conformations of these fragments are stored as rotamers in analogy to amino acid side chain rotamer libraries used for rapid sampling of protein conformational space – an approach that will allow integration of BCL::Conf into respective computational biology programs such as Rosetta or Scwrl. A conformational ensemble for small molecules can now be generated by recombining fragment rotamers with a Monte Carlo search strategy. BCL::Conf was benchmarked against other conformer generator methods including Moe, Confab, and Frog2, in its ability to recover native-like protein bound conformation of small molecules, diversity of conformational ensembles, and sampling rate. BCL::Conf recovers 97% of molecules within a root mean square deviation of 2Å to the native conformation. The rapid rotamer sampling approach allows integration into Rosetta suite of macromolecular modeling and drug design module of the online protein folding game, Foldit. Conformer generation using multiple threads allows easy integration into high-throughput docking experiments using RosettaLigand.

Sunday, August 10, 2014

Science and the Law: How the Communication of Science Influences Science-Based Policy Development in the Environment, Food, Health, and Transport Sectors - PM Session

Palace Hotel
Room: Marina
Cosponsored by AGFD

William Town, Organizers
William Town, Presiding
1:15 pm - 5:25 pm
1:15 Introductory Remarks
1:20 14 Coming out from under the cloud of “Climategate”: Are scientists effectively communicating with the public on climate change?

Frederick W Stoss, fstoss@buffalo.edu, University at Buffalo--SUNY, Buffalo, New York 14260, United States

The strategies, resources, and tools of these groups challenge scientists' communicating with the public, and are discussed and various STEM-base information, communication and education resources. Communicating with the public is not a traditional role of scientists, but is increasingly importance in today's virtually-connect world. It is incumbent that the public understands the basic scientific principles of the environments in which they live, work, learn, and play. Climate change emerged in the 21st Century as one of the most complex, and controversial environmental problems. Enhanced scientific communication by scientists and their proxies is a response to the theft of emails and documents from the University of East Anglia in November 2009, and strategically “leaked” days before the United Nations Framework Convention on Climate Change in Copenhagen. Climate change deniers attacked researchers, and Conservative radio talk-shows dispersed allegations of fraud, withholding and manipulation of data, and suppression of publications. “Climategate” neutered the UN negotiations, and more lastingly, created public confusion and mistrust about the scientific consensus on climate change. Scientific organizations challenged scientists to evaluate their roles as communicators and set the stage for increasing the proactive discussion of their research in terms of how their research contributes to a more informed decision-making process that is necessary to transform our ways of thinking from living in a greenhouse gas constrained world, to a world not constrained by greenhouse gases.

1:50 15 Carbon accounting for indirect land use change (ILUC) in biofuels policy: The co-evolution of science and policy

Hanna Breetz, hbreetz@berkeley.edu, University of California, Berkeley, Berkeley CA, California 94720, United States

Over the last several years, policy-makers in the US, EU, and California have been grappling with how to account for land use change emissions in biofuel regulations. One of the great challenges is that the modeling of land use change emissions has rapidly evolved since initial estimates were published by Searchinger et al. (2008). This paper explores how the science and policy have co-evolved during this period. In particular, it traces and compares how the process of policy-making responded to emerging models and data, with a particular focus on how the science was portrayed by public discourse and interest group advocacy. The policies include the U.S. Renewable Fuel Standard (RFS), California Low Carbon Fuel Standard (LCFS), and the E.U. Fuel Quality Directive (FQD) and Renewable Energy Directive (RED).

2:20 16 Communicating the risk of nicotine delivery products

Jim Solyst, jim.solyst@smna.com, Swedish Match North America, Severna Park, Maryland Md. 21146, United States

The rapid increase in the use of electronic cigarettes by smokers of tobacco cigarettes has highlighted the risk perception and communication challenge facing the US Food and Drug Administration (FDA) in characterizing nicotine delivery products. Tobacco smokers believe that by switching to electronic cigarettes they are reducing their risk level; which is likely true, but the scientific evidence is only now being collected. FDA is a public health and science-based agency, and cannot communicate the risk reduction potential of electronic cigarettes until they have sufficient evidence, regardless of the intuitive risk reduction potential of the product.
The 2009 Tobacco Control Act provides authority to FDA Center for Tobacco Products to regulate tobacco products, including electronic cigarettes. Section 911 of the Act –Modified Risk Tobacco Product (MRTP)--provides a scientific evidence based process by which a company can apply for and receive a MRTP order. If a product can be demonstrated to reduce harm to the individual and benefit the overall public health then it may be characterized as modified risk and FDA may communicate that information to the public.
There are products –Swedish snus for example—for which there is a great deal of human health evidence and may at some time be granted a MRTP order; but there is no such human health evidence for electronic cigarettes due to the recent introduction of the product to the market. So what does FDA say to the tobacco smoker who is considering switching to electronic products? The best advice is not to use nicotine products at all, but does FDA have an obligation to inform smokers of the obvious benefits of switching, even if the evidence is not complete?

2:50 Intermission
3:00 17 PEPFAR - a US Government program that is helping to keep millions alive around the world

George Lunn, george.lunn@fda.hhs.gov, Food and Drug Administration, Silver Spring, Maryland MD 20993, United States

The President's Emergency Plan for AIDS relief was announced by President George W. Bush in 2003 with the aim of preventing infections, treating infected people, and caring for infected individuals and orphans in resource-limited countries. In a unique arrangement, low-cost manufacturers submit New Drug Applications or Abbreviated New Drug Applications for antiretroviral drugs to treat AIDS to the FDA and these applications are reviewed to the same standards as applications for products that are destined for the US market. To expedite the preparation and submission of these applications, the FDA has reached out to the manufacturers, distributers, and other interested parties. At the beginning of 2014 FDA had taken an action on 168 applications and 6.7 million people worldwide are being treated with these antiretroviral drugs.

3:30 18 Does science or communications have greater influence in formulating policy? A UK perspective

Tamora Langley, Tlangley@webershandwick.com, Healthcare Public Affairs, Weber Shandwick, United Kingdom

The dynamic and principles of the scientific environment are starkly at odds with the dynamic of the political environment in which policies are made or broken. While a scientific approach is rational, evidence-based and formed through consensus of experts, the political environment is emotional, driven by communications and adversarial. Although decision-makers aspire to evidence-based policy-making, in contested areas the side with the most effective communications often seems to 'win'. The recent economic crisis in the UK and across mainland Europe has necessitated drastic cuts in public spending, impressing on officials the need to make savings and squeeze more value out of public resources. In the UK, the government marked out the health budget as one of only two areas of public spending to be shielded from the cuts. Still, relatively flat health spending has been outstripped by rising demand for services, and so any new policies requiring additional resources remain in theory unaffordable. Where new health policies have been introduced, such as the Cancer Drugs Fund (CDF), they have been driven not by scientific developments so much as by public opinion and political decision-making. Similarly, attempted policy change driven by or expressed in terms of economic or rational imperatives (such as attempts to reconfigure health services, or attempts to change statutory regulation of dispensing medicines), have failed in the face of patient and professional campaigns. To conclude that those who shout loudest will always win, is to over-simplify. Besides, in some policy debates, patient and professional advocacy groups are divided. Even if policies are pushed through by noisy campaigns, they can be reversed or stalled by the public officials who 'outlive' their political masters and realise they are in practice unworkable or inefficient. The answer? Begin with the science, but recognise that the communication of science is just as critical.

4:00 19 Consumer communication of nutrition science and impact on public health

David P Richardson, info@dprnutrition.com, School of Chemistry, Food and Pharmacy, University of Reading, Reading, Berkshire RG6 6UR, United Kingdom

Dietary interventions for vulnerable groups such as the elderly, women of childbearing age, children and adolescents can contribute beneficially to help reduce the risk of suboptimal intakes and deficiencies of micronutrients, to control costs of healthcare, and to promote the health and quality of life of people globally. Examples include the communication of the scientific evidence for (a) the use of folic acid/folate to reduce the risk of neural tube defects, (b) the reduction in prevalence of iron-deficiency anaemia, (c) the relationship of calcium and vitamin D to bone health and reduced risk of osteoporosis, and (d) the modulation of the age-related decline in most organ functions and reduction in the development and/or progression of many chronic diseases. The paper will highlight the need for evidence-based healthcare and communication policies, including the use of nutrition and health claims on food products to raise awareness of the role of diet in health.

4:30 20 Communicating controversial science: The case of tobacco harm reduction and the ethics of blanket censorship

Sarah Cooney, sarah_cooney@bat.com, Christopher Proctor. British American Tobacco, Southampton, United Kingdom

It has long been accepted that cigarette smoking causes serious disease and death, and public policy has focused on reducing tobacco use. In the US, the Food & Drug Administration (FDA) has had regulatory jurisdiction over tobacco products since 2009 and is committed to an evidence-based approach for regulatory decision making anchored by sound science. In an effort to generate much more data about tobacco science, the FDA has established an interagency partnership with the National Institutes of Health (NIH), which is making available billions of research dollars to study priority questions about tobacco science to inform FDA regulations. This new funding should attract many new researchers, creating a larger and more diverse, transparent and results-orientated tobacco science community. The FDA has set an example in acknowledging tobacco manufacturers both as an important stakeholder and as a potential source of valuable scientific expertise. Perhaps as a result, there is a general increase in scientific publications resulting from research undertaken by tobacco industry scientists. Additionally, most tobacco manufacturers are even more committed to developing products substantially less risky than cigarettes, and the science to evaluate the potential of such products to promote harm reduction. At the same time there is an increase in the number of scientific journals introducing blanket bans on publishing science from tobacco manufacturers, with the British Medical Journal being a recent example. This paper looks at the ethical dilemmas surrounding scientific censorship and the role of peer review in protecting scientific integrity.

5:00 Panel Discussion
5:20 Concluding Remarks

Sunday, August 10, 2014

Nature's Second Act: Revisiting Natural Products - PM Session

Palace Hotel
Room: Presidio

Roger Schenck, Organizers
Roger Schenck, Presiding
1:00 pm - 5:20 pm
1:00 Introductory Remarks
1:05 21 Evaluation of genus species coverage in chemical abstract

Matthew J McBride, mmcbride@cas.org, Science IP, Chemical Abstracts Service, Columbus, OH 43202, United States

Comprehensive retrieval of natural products reported in the public literature – patents, journal articles and other sources – requires thorough indexing of biological organisms at both common names and Genus species. CAS Registry and Chemical Abstracts (available in SciFinder and STN) provides the most complete publicly disclosed coverage of substance and literature information, however little has been reported on organism coverage, nor the impact on substances derived from such indexing. This session presents an overview of Genus species coverage in Chemical Abstracts, and focuses specifically on a case study on plant species in Chemical Abstracts and examples of how natural products and secondary metabolites are indexed in the database.

1:35 22 Natural products information resources and the role of Dictionary of Natural Products

Fiona M Macdonald, fiona.macdonald@informa.com, John Buckingham, Steve Walford. Taylor & Francis, Boca Raton, FL 33487, United States

Originally envisaged as a spin-off from the Dictionary of Organic Compounds, the Dictionary of Natural Products (DNP) was first published in 1991 and contained 79,000 compounds. Updated continuously since then, it now contains 260,000 compounds and is published biannually on DVD and online. Its role in natural product research will be reviewed, and plans for future enhancements revealed.

2:05 23 Garlic and other alliums: The lore and the science

Eric Block, eblock@albany.edu, Department of Chemistry, University at Albany, SUNY, Albany, New York 12222, United States

It has been written "Cultivation of leek, onion and garlic is as old as the history of the human race, and as extensive as civilization itself. References to these plants in the Bible and the Koran reflect their importance to ancient civilizations both as flavorful foods and as healing herbs" (1). Clearly, adequately reporting on both the newest and oldest knowledge about these healing herbs and flavorful foods, including their natural products chemistry, requires more than a routine search of scientific data bases. In the course of writing a monograph on "Allium science", the author used rare book collections in major libraries and botanical gardens to view old herbals and botanical mongraphs, spoke to botanists, visited archeological sites to see historical depictions of plants, and even found useful material in used book stores. Visits to farms in the U.S. and abroad, to overseas spice markets as well as to manufacturers and processors of alliums all proved helpful in better understanding agricutural aspects of these plants, while visits to museums, onion-domed churches, and even theaters, revealed the role alliums play in culture and the arts. Even cookbooks can be a useful source of information when considering the chemistry that occurs in the kitchen with common vegetables, which could be described as "chemistry in a salad bowl." When writing about plants that have been known for centuries, researchers should find it very helpful, as well as enjoyable, to utilize non-traditional sources and locations, and view the plants in cultivation and in the wild. References (1) Eric Block, Garlic and Other Alliums: The Lore and the Science, Royal Society of Chemistry, Cambridge, 2009 (hardback), 2010 (paperback).

2:35 24 Rediscovering macrocyclic natural products as drug leads

Roger Schenck, rschenck@cas.org, Marketing, Chemical Abstracts Service, Columbus, Ohio 43202, United States

While natural products are being rediscovered as drug leads, macrocyclic natural products have been largely ignored as therapeutic leads. Containing rings with 12 or more atoms, these macrocyclic natural products generally exceed the rule-of-five but still exhibit positive drug-like physicochemical and pharmacokinetic properties. This talk will focus on some recent examples from the CAS databases and conclude with a study of synthetic pathways for making these valuable synthetic targets in the lab.

3:05 Intermission
3:20 25 MarinLit: Database and essential tools for the marine natural products community

Serin Dabb1, dabbs@rsc.org, John Blunt2, Murray Munro2. (1) Royal Society of Chemistry, Cambridge, United Kingdom, (2) University of Canterbury, Christchurch, New Zealand

MarinLit is the premier database for marine natural products research. The database contains comprehensive bibliographic information for published articles related to all aspects of marine natural products. The unusual genesis of this database, originally designed as an in-house system in the 1980s to fulfil the needs of the University of Canterbury Marine Group, led to features that are unique, or seldom found in other chemistry databases. These features, in addition to complete bibliographic details, include: taxonomy, ecology, biogeography, key words, and for compounds, structures, trivial names, syntheses, biosyntheses, bioactivities and NMR and UV data. In addition to easy access to these indexed data for the individual compounds, MarinLit is a very powerful dereplication tool. There are two aspects to this. The first is biogeography. The availability of collection site and depth data gives spatial answers to all possible search profiles. Additionally, all compounds (23,500) in the database have been analysed by a unique algorithm that populates 44 individually searchable fields with the numbers of each structural feature that can be readily recognised by 1H-NMR spectroscopy. This presentation will describe the unique functionality and searching capability of the database, the editorial process, and demonstrate how it can be used to aid dereplication in a research environment.

3:50 26 RÖMPP natural products: An online encyclopedia

Guido F. Herrmann, guido.herrmann@thieme.de, Manfred Köhl, Klaus Köberlein, Ute Rohlf. Georg Thieme Verlag, Stuttgart, Germany

The chemical encyclopedia “RÖMPP” has been founded by Dr. Hermann RÖMPP in 1947. About 250 authors have been contributing to the work over the last years and today RÖMPP contains more than 63000 entries and 14000 structural formulas. For more than a decade RÖMPP is published online (https://roempp.thieme.de/) and there are now monthly updates.
The coverage of Natural Products and Secondary Metabolites has always been a central topic for the RÖMPP. The first German print edition “Naturstoffe” was published in 1997 with contributions by more than 40 authors. The German version was followed by an English edition in 2001. This edition covers about 6000 relevant natural compounds including 2200 formulas. Today, about 8% of the content of the online version of RÖMPP covers Natural Products.
Our talk will highlight:

  • how RÖMPP covers natural products (secondary and –to a lesser extent- primary metabolites) including additional information such as genus/species and their geographic location, analytical methods, biological and physical properties, indices with Latin species names;
  • how the Editorial Board and Authors have handled the content curation over the years and decide upon updates and new entries;
  • how researchers are using RÖMPP and the Natural Products sections;
  • how we communicate with our users and improve the content of RÖMPP;
  • how we cooperate with our Advisory Board (Dr. Sabine Angel, BASF; Dr. Andreas Barth, FIZ Karlsruhe und Dr. Engelbert Zass, ETH Zürich);
  • how we have developed the graphical user interface and the underlying technologies;

4:20 27 ChEMBL - linking chemistry and biology to enable mapping onto molecular pathways

Louisa J Bellis, ljbellis@ebi.ac.uk, Anna Gaulton, Anne Hersey, A Patricia Bento, Jon Chambers, Mark Davies, Felix Kruger, Yvonne Light, Nathan Dedman, Shaun McGlinchey, Michal Nowotka, George Papadatos, Rita Santos, John P Overington. ChEMBL Group, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, United Kingdom

ChEMBL is an open access, large-scale bioactivity database (https://www.ebi.ac.uk/chembl) containing over 11 million bioactivity data points and 1.6 million compounds, primarily curated from scientific literature. Systems biology plays a central role in drug discovery by integrating both chemical and biological processes. Understanding a drug's mode of action, from the consideration of its target to how it affects a molecular pathway is crucial and so combining systems biology with cheminformatics is necessary. ChEMBL is one such database that extracts data and then displays clear links between the chemical compounds, their bioactivity endpoints and associated protein targets. Showing these clear links is essential if a database is to be successfully used for drug discovery. A major stumbling block in such bioinformatic databases has, historically, been the number of different activity types and units published, thereby making it difficult to compare results from different papers. ChEMBL uses their own standardization technique to give an activity type, pChEMBL, to unite these disparate activity types and values into a uniform set that can be compared across multiple sources. pChEMBL is defined as –Log(molar IC50, XC50, EC50, AC50, Ki, Kd or Potency). This symposium will aim to show how ChEMBL can be used for mapping onto molecular pathways, in order to understand the modulated nodes and the chemical tools available. We will present some examples from our research, where we have developed informatics approaches to automatically annotate pathways with ChEMBL data. A second use case for systems biology is the assembly of thematic views of ChEMBL, for example in ADME systems biology.

4:40 28 Treatment of Bifenthrin against subterranean termite damage to structural wood works in a semi-arid tropical urban system

Sammaiah Chintha, sammaiah_ch@yahoo.com, DEPARTMENT OF ZOOLOGY,KAKATIYA UNIVERSITY, DEPARTMENT OF ZOOLOGY, WARANGAL, ANDHRA PRADESH 506009, India

Subterranean termites cause damage to structural wood and other articles in different types of houses and its control was investigated in a semi-arid tropical urban system. Ten species of termites were recorded within urban houses, of which six viz Coptotermes ceylonicus, Holmgren (Rhinotermitidae), Odontotermes brunneus (Hagen), Odontotermes redemanni (Wasmann), Odontotermes wallonensis (wasmann), Odontotermes bellahunisensis Holmgren and Holmgren and Microtermes obesi Holmgren (Termitidae) were found damaging different wood works such as door frames and panels, windows- sashes, joists rafter etc. The termite damage was controlled by chemical treatment of the site at the foundation level, timber at the time of construction or by drilling holes and treating buildings (105) with Bifenthrin (pyrethroid insecticide). The results of controls carried out during the last three year will be discussed.

5:00 29 Bioisosteres in accessible chemistry space

Tim Cheeseright, tim@cresset-group.com, Mark Mackey, Rae Lawrence, Martin Slater. Cresset, Cambridge, Cambs Sg80SS, United Kingdom

Searching for bioisosteric replacements is a valuable part of a medicinal chemist's toolbox. A bioisosteric core replacement can solve an ADMET or IP issue and move development into a new lead series, while bioisosteric replacements for leaf groups enable fine tuning of molecular properties without affecting the fundamental activity. A fundamental limitation of most current bioisosteric replacement search tools is the synthesisability of their suggestions. Existing computational synthesizability assessments tend to perform poorly in this context. We present an alternative approach in which the chemist is able to define the accessible synthetic space around the core of their lead molecule. The search for bioisosteres is confined to this space, so that the results are all known to be synthesisable using accessible reagents. In order to do this, multiple databases of fragments are created from the reagent sets and classified according to the synthons present and the desired chemistries. Despite the very large number of chemical reactions in the modern medicinal chemistry tool set, we have found that a very limited number of synthetic transforms are needed to fully represent this space. By focusing on the structural transformation rather than the chemical reaction, many different chemistries can be summarized into a small set of rules. We implement these rules in a special-purpose chemical transformation language, ATPAT. The ATPAT language combines a molecular regular expression syntax that is simpler, more extensible and more powerful than SMARTS with a set of simple transformation procedures that can be automatically applied on a successful match. The ATPAT engine allows new chemical transformation rules to be generated effortlessly. The result is an integrated system that allows the chemist to easily process his or her available reagents into a list of potential molecules to make. Wrapping this system in a KNIME or Pipeline Pilot wrapper allows automation and simple integration into existing cheminformatics systems.

Sunday, August 10, 2014

The Impact of the IUPAC InChI on Finding and Linking Information on Chemicals - PM Session

Palace Hotel
Room: California Parlor
Cosponsored by CHED, COMP

Stephen Heller, Organizers
Stephen Heller, Presiding
1:30 pm - 4:35 pm
1:30 30 InChI project

Stephen Heller, steve@hellers.com, BMD, NIST, Gaithersburg, MD 20899-8362, United States

This presentation will provide the background for the InChI symposium presentations.

1:40 31 Moving the standard ever onwards: The role of the InChI Trust in supporting and developing the InChI

David Evans, david.evans@reedelsevier.ch, Reed Elsevier Properties SA, Neuchâtel, Switzerland and InChI Trust, United Kingdom

Since its inception over 10 years ago by IUPAC and with support from NIST, the InChI standard has been widely used in the publishing, database, and life sciences industries, as well as securing a strong foothold in academic research and the information community. The InChI Trust, a UK based not-for-profit, was founded in 2009 in order to support the continued development of the InChI. The vision for the InChI Trust is to provide a freely available Open Source structure representation algorithm to link and find information on defined chemical structures and to provide standards, apps, and other tools that will facilitate its use. In this presentation we will discuss the current status of the InChI projects, and show how the InChI Trust and IUPAC can continue to support the development of the InchI standard and how the community can help in this work.

2:00 32 How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry

Antony J. Williams1, williamsa@rsc.org, Valery Tkachenko1, Karen Karapetyan1, Alexey Pshenichnov1, Colin Batchelor2. (1) US Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States, (2) eScience, Royal Society of Chemistry, Cambridge, United Kingdom

The Royal Society of Chemistry hosts a growing collection of online chemistry content. For much of our work the InChI identifier is an important component underpinning our projects. This enables the integration of chemical compounds with our archive of scientific publications, the delivery of a reaction database containing millions of reactions as well as a chemical validation and standardization platform developed to help improve the quality of structural representations on the internet. The InChI has been a fundamental part of each of our projects and has been pivotal in our support of international projects such as the Open PHACTS semantic web project integrating chemistry and biology data and the PharmaSea project focused on identifying novel chemical components from the ocean with the intention of identifying new antibiotics. This presentation will provide an overview of the importance of InChI in the development of many of our eScience platforms and how we have used it to provide integration across hundreds of websites and chemistry databases across the web. We will discuss how we are now expanding our efforts to develop a platform encompassing efforts in Open Source Drug Discovery and the support of data management for neglected diseases

2:35 Intermission
2:50 33 Data linking in PubChem using InChI

Evan Bolton, bolton@ncbi.nlm.nih.gov, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, United States

PubChem is an open archive of chemical substances and their biological activities. It is a sizeable resource with many tens of millions of records provided by the chemical biology community. The IUPAC InChI and InChIKey can play a key role in accessing information in PubChem when using a chemical structure. This talk will provide an overview of the many ways one can use InChI and InChIKey with the PubChem resource. Particular emphasis will be placed on its role in linked data approaches.

3:25 34 QR InCHi codes

Jeremy G Frey, j.g.frey@soton.ac.uk, Andrew J Milsted, Simon J Coles. Chemistry, University of Southampton, Southampton, United Kingdom

The combination of QR codes with imbedded logos and InChI offers many opportunities for labelling and provision of machine and human readable information about materials and samples. We have developed a web based system to allow providers to create these InChI QR codes with a link to resources about a sample. This link is mediated via an InChI site and can be presented via smart phone apps. As well as enabling resource and stock control this type of label proves a very flexible way to present information to the emergency services. The service options considered will be summarised and example of the service demonstrated.

4:00 35 International chemical identifier for reactions (RInChI)

Guenter Grethe1, ggrethe@att.net, Jonathan M Goodman2, Chad Allen2. (1) Unaffliliated, Alameda, CA 94502-7409, United States, (2) Department of Chemistry, University of Cambridge, Cambridge, United Kingdom

An open-access software for creating a unique, text-based identifier for reactions (RInChI) was developed at Cambridge University based on the IUPAC International Chemical Identifier (InChI) standard. RInChIs describe the substances (reactants, products, reagents and solvents) participating in a reaction with their respective InChIs. The structure of RInChIs is analogous to that of InChIs. In addition to generate RInChIs from widely used Rxnfiles and RDfiles, the software also includes the generation of long- and short-form, hashed representations – RInChIKeys. Furthermore, the software allows the reversible conversion between CT-files and RInChIs, to search for specific substances and their specific roles in reactions and to analyze databases. All these functions are available through web-based tools. An easy-to-use and freely accessible website is available at http://www-rinchi.ch.cam.ac.uk/ . We will discuss details of the program and the status of the RInChI project.

Sunday, August 10, 2014

CINF Scholarships for Scientific Excellence - EVE Session

Palace Hotel
Room: Ralston

Guenter Grethe, Organizers
, Presiding
6:30 pm - 8:30 pm

36 Toward quantitative structure-activity relationship (QSAR) models for nanoparticles

Katarzyna Odziomek1,2, kjodziomek@lbl.gov, Daniela Ushizima2, Tomasz Puzyn1, Maciej Haranczyk2. (1) Faculty of Chemistry, Laboratory of Environmental Chemometrics, University of Gdansk, Gdansk, Poland, (2) Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States

For decades the implementation of combined chemometric and computational tools has successfully aided scientists in predicting properties of new chemical compounds, based on their molecular structure. Quantitative structure-activity relationship (QSAR) methods, for example, which use linear and non-linear combinations of molecular descriptors, can be utilized to predict physical-chemical and/or biological properties of a molecule. QSAR approaches require well-defined structures of the considered molecules. Similar approaches are envisioned to predict relevant properties of nanoparticles (NPs), which are common form of chemicals used in cosmetics, pharmaceuticals and even food products. Due to their nature, NPs exhibit different characteristics than bulk materials. Nanoparticle samples are typically non-uniform and present various shapes and sizes, which give rise to their properties. Using conventional QSAR techniques is therefore not possible. Our goal is to develop new approaches that will facilitate QSAR-type modeling for nanoparticles. Our strategy is to incorporate new nanoparticle descriptors into existing, proven QSAR methods. We used scanning electron microscopy (SEM) images to obtain valuable visual information based on the morphology and topography of nanoparticles. Using computer vision algorithms, we have analyzed the SEM images and obtained descriptors of regions of interest (i.e. potential NPs), such as shape, size, surface area, roughness etc. These descriptors can be used to group and classify the nanoparticles. Next, we plan to build statistical models correlating nano and microscale parameters with physical-chemical and biological characteristics. We demonstrate applications of our methodology by investigating hydroxyapatite-based bionanomaterials. Hydroxyapatite, (hydroxylapatite, bioapatite, HAp), Ca10(PO4)6(OH)2, is a naturally occurring calcium phosphate mineral and has a wide range of biomedical applications, such as bone implants.


37 Targeting androgen receptor DNA-binding domain using structure-based methods to overcome resistance

Huifang Li, janelhf@gmail.com, Fuqiang Ban, Kush Dalal, Eric Leblanc, Paul S. Rennie, Artem Cherkasov. Vancouver Prostate Centre, University of British Columbia, Vancouver, B.C. V6H 3Z6, Canada

The human androgen receptor (AR) is considered as a master regulator in the development and progression of prostate cancer (PCa). As resistance to current antiandrogens remains a major challenge for the treatment of advanced PCa, there is a continuing need to pursue new anti-AR therapeutic avenues. In this study, we identified a plausible binding site on the DNA binding domain (DBD) of the AR, and small-molecule inhibitors through initial screening against this site. Through exploring the related chemical space of a moderately active initial hit compound, an analogue with 10-fold improved activity was identified, and with the preliminary structure-activity relationship (SAR) on this chemical class, we obtained a lead compound of equal potency as current drug Enzalutamide. The site-directed mutagenesis demonstrates the developed inhibitors do interact with the proposed binding site on the AR DBD, suggesting a novel mechanism of action that is fundamentally different from conventional targeting of the AR through its ligand binding domain (LBD). Furthermore, they effectively inhibit the growth of cells with resistance to Enzalutamide and blocks the transcription of constitutively active AR splice variants, which lack the entire LBD and play a critical role in the development of resistance to conventional antiandrogens. The current study provides an initial proof of principle for selectively targeting the AR DBD, which may be a novel and viable approach for the treatment of advanced and resistant PCa.


38 Carbon bond: A noncovalent interaction

Chelsea Traina, cdt64@nau.edu, Erik M Chavez, Erin Carter, Suman Sirimulla. Chemistry & Biochemistry, Northern Arizona University, Flagstaff, Az 86011, United States

Nonbonding interactions between atoms are crucial to stabilizing and directing molecular formation in solutions and solids. Interactions such as hydrogen and halogen bonding have been studied to elicit these stabilizing and directing features. Another interaction, the carbon bond, also has similar capabilities and could prove to be essential in many molecular interactions, specifically in drug design. A quantitative evaluation was performed using the Cambridge Structural Database (CSD) and protein database (PDB)to analyze the carbon bond interaction. The search criterion was based upon van der Waals radii distances and avoidance of weak hydrogen bonds. This allowed us to measure the interactions between the carbon atom and nucleophile. The molecular configurations that were searched in the CSD were F/O/Cl-C…N/O/S/Cl/F. The O-C…O species yielded the greatest amount of hits at 39066. The F-C…S species generated only 125 hits. The statistical analyses of the results are presented.


39 Random walk-based prediction of novel drug-target interactions

Abhik Seal, abseal@indiana.edu, Yong Yeol Ahn, David J Wild. School of Informatics and Computing, Indiana University, bloomington, INDIANA 47408, United States

Predicting novel drug–target associations is important not only for developing new drugs, but also for understanding how drugs work and what are their modes of action. As more data about drugs, targets, and their interactions becomes available, computational approaches are becoming more viable in drug-target association discovery. In this paper, we apply Random Walk with Restart (RWR) method on a heterogeneous network of drugs and targets to predict novel drug-target associations. From DrugBank, we construct the heterogeneous drug-target networks using four types of chemical fingerprints, sequence similarity, and interaction profiles. We find that our method produces reliable prediction with respect to the choice of chemical fingerprint types. We use ChEMBL, an external dataset with 2,763 associations, to evaluate the performance of our approach, finding that it correctly predict nearly 45% of the interactions that are only present in the ChEMBL dataset. We also verify several associations between drugs and mode-of-actions, such as strong associations between hair loss and cardiovascular drug Simvastatin. Finally, the associations between 110 popular drugs and 3,519 targets are analyzed as a case study. In summary, we demonstrate the effectiveness and promise of the approach—RWR on heterogeneous networks—for identifying novel drug target interactions.


40 Pred-hERG: A novel web-accessible computational tool for predicting cardiac toxicity of drug candidates

Vinícius M Alves1, viniciusm.alves@gmail.com, Rodolpho C Braga1, rodolphobraga@yahoo.com, Meryck B Silva1, Eugene Muratov2, Denis Fourches2, Alexander Tropsha2, Carolina H Andrade1, carolina@ufg.br. (1) Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Goias 74605170, Brazil, (2) Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599, United States

Several non-cardiovascular drugs have been withdrawn from the market due to their critical side effect of inhibiting the human ether-à-go-go related gene (hERG) K+ channels, which may lead to heart arrhythmia and death. Thus, hERG safety testing is an indispensable process that is required by the US FDA. There is considerable interest in developing computational tools to filter out potential hERG blockers in early stages of drug discovery. In this work, we describe the development of a new tool for the rapid identification of potential cardiotoxic compounds by hERG inhibition. We have compiled the largest publicly available dataset of hERG binding, containing 11,958 compounds from the ChEMBL database. Once curated, this dataset contained 4,980 compounds for modeling. Several types of QSAR models have been developed and validated according to the OECD principles. The external classification accuracies discriminating blockers from non-blockers were 0.83-0.93 on external set. Model interpretation revealed several SAR rules, which can guide structural optimization of some hERG blockers into non-blockers. Virtual screening of the WDI chemical library using selected QSAR models identified 4,945 compounds as potential hERG blockers. The developed models can reliably identify blockers and non-blockers, which could be useful for the scientific community. A freely accessible web server has been developed allowing users to identify putative hERG blockers and non-blockers in chemical libraries of their interest (http://labmol.farmacia.ufg.br/predherg).


41 Mining chemical space for novel molecules: A graphical tool for working with fragment spaces

Florian Lauck, lauck@zbh.uni-hamburg.de, Matthias Rarey. Universität Hamburg - Center for Bioinformatics, Hamburg, Germany

Mining chemical space for novel molecules with desirable properties proves difficult. The collective strategy is to limit search space so that it is easier to consider only those molecules with suitable physicochemical and topological properties. One therefore requires methods and data structures for efficiently modeling this chemical subspace, as well as user-friendly tools to access this functionality. Here, we present a graphical user interface for creating, manipulating, and searching large chemical space. As a model we use a fragment space, i.e., a combinatorial chemical space consisting of molecular fragments and connection rules. Each fragment has at least one reaction site that corresponds to an open valence. These sites are modeled as artificial atoms with a defined type, called link atoms. The connection rules determine compatibility of such link atoms. When two fragments are connected, the link atoms are removed and a bond in accordance with the connection rule is introduced. A number of algorithms and strategies for working with fragment spaces have been developed in the past but were only available as separate command-line tools. Our new software combines these tools into one user-friendly application, able to visualize the contents of a fragment space (fragments and connection rules), as well as search results (novel molecules). For generating fragment spaces, an automated approach was incorporated using a set of molecules and cut rules, i.e., SMARTS pattern that define where a molecule should be cleaved and link atoms should be introduced. For retrieving molecules with structural similarity we provide two query-based search methods utilizing molecular similarity (reduced graph descriptors) and substructure search (SMARTS pattern matching). In addition, a new approach for constraint-based enumeration complements these algorithms, allowing a search based on physicochemical properties rather than structural similarity. Finally, specialized fragment spaces can be created by filtering fragments via numerous properties.


42 BCL::EvoGen: An evolutionary algorithm for focused library design

Alexander R Geanes, alexander.r.geanes@vanderbilt.edu, Edward W Lowe, Jens Meiler. Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States

In recent years, virtual high-throughput screening (vHTS) techniques have been successfully applied in the drug discovery process. In many cases, these vHTS techniques are leveraged to prioritize subsets of chemical libraries for acquisition and testing in physical screens. However, for computer-aided drug design (CADD), it is advantageous to have algorithms which are capable of designing new chemical entities for a specific biological target. An evolutionary algorithm was implemented as part of the BCL::ChemInfo suite within the Biochemistry Library (BCL), a C++ library developed at Vanderbilt University, to iteratively generate chemical species with high predicted biological activities for use in focused library design for hit-to-lead optimization. Quantitative structure activity relationships based on machine learning techniques were used to predict the biological activity of compounds in each generation. The compounds with the highest predicted activity, as well as a smaller number of lower activity species, were subjected to combination, crossover, and mutation to form the subsequent generation. Termination criteria were based on a percentage of compounds in a single generation achieving a predicted activity above a pre-determined cutoff, or after reaching a pre-set number of generations in the event the first criterion could not be satisfied. This method was benchmarked using a previously published set of 9 datasets designed for the validation of novel CADD methods. Each of the datasets was compiled from publicly available HTS data taken from PubChem, and contain a minimum of 150 active compounds each. In addition, the datasets span a range of protein targets including GPCRs, ion channels, and enzymes. Here we present the results of this focused library design application, BCL::EvoGen.

Monday, August 11, 2014

Global Challenges in the Communication of Scientific Research - AM Session

Palace Hotel
Room: Marina

David Martinsen, Norah Xiao, Organizers
David Martinsen, Norah Xiao, Presiding
8:05 am - 11:40 am
8:05 Introductory Remarks
8:10 43 Utility-based analysis of solar energy technologies

Collin Perry, collinperry@my.unt.edu, William Justin Youngblood. Department of Chemistry, University of North Texas, Denton, Texas 76203, United States

We explore the Von Neumann-Morgenstern utility theorem to compare the overall utility of different solar energy technologies. As part of this approach, we seek to define the variables that can be manipulated to maximize the utility of photovoltaic research with regards to economic developments, the protection of the natural environment, the advancement of basic and applied science, and the quality of life for human populations.

8:40 44 New IUPAC organic nomenclature: From bottle label to update of databases

Andrey Yerin, erin@acdlabs.ru, Advanced Chemistry Development, Inc. (ACD/Labs), Toronto, Ontario M5C 1B5, Canada

International Union of Pure and Applied Chemistry (IUPAC) recently published new recommendations on nomenclature of organic chemistry (IUPAC Blue Book 2013). The most important part of these recommendations is a concept of “Preferred IUPAC Name” (PIN) established by hierarchical order of criteria allowing to derive the unique systematic name intended for registrations, patents, regulations and other official purposes. Together with introduction of PIN concept new IUPAC recommendations introduce several principle changes in naming procedures that will change systematic names for many classes of organic structures. While strict following of IUPAC recommendations is not mandatory, the gradual change of naming conventions is expected and will affect chemical publications and databases. An introduction of these changes makes organic nomenclature not only stricter but at the same more difficult to memorize, thus increasing the role of automatic naming tools. The presentation will illustrate the most principle changes of naming principles with examples of the corresponding classes of chemicals. The overall impact of application of new nomenclature rules will be estimated via comparison of systematic names generated by ACD/Labs naming tools according to the previous and new naming principles for large compound libraries.

9:10 45 New strategies to engage more of the world with scientific app development and content deployment

Steven M Muskal, smuskal@eidogen-sertanty.com, Eidogen-Sertanty, Inc., Oceanside, CA 92056, United States

Over 70% of the world's population, i.e. 5 billion people will routinely use mobile devices in the next few years. In this age of immediate access and connectivity, an unprecedented “global conversation” and capability has arisen creating many new opportunities for content capture, manipulation, and dissemination. Unfortunately, the number of people developing cloud-based mobile applications is far too small to meet the global need. To this end, we have been engaged with Accelrys on a multi-year project to extend their popular Pipeline Pilot framework to interface directly with hand-held devices to leverage mobile capabilities including image capture, upload and annotation, geo-location tagging, audio and video capture, and other forms of content capture and manipulation. Earlier this year, we released the "ScienceCloud Tasks" mobile app (freely available to the world in the Apple Appstore), and have plans to develop an Android version. By better interfacing mobile devices with server-based, cloud-deployed pipelining technologies, we expect to dramatically simplify mobile-cloud app development and deployment in general, and hope to open-up new possibilities in workflows (i.e. "tasks") associated with publication assembly, submission, editing, and distribution.

9:40 46 Dealing with the complex challenge of managing diverse chemistry data online

Antony J Williams, williamsa@rsc.org, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov. US Cheminformatics, Royal Society of Chemistry, Wake Forest, NC 27587, United States

The Royal Society of Chemistry has provided access to data associated with millions of chemical compounds via our ChemSpider database for over 5 years. During this period the richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process of implementing a new architecture to build a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on how our efforts to manage chemistry related data has impacted chemists and projects across the world and will review specifically our contributions to projects involving natural products for collaborators in Brazil and China, for the Open Source Drug Discovery project in India, and our collaborations with scientists in Russia.

10:10 Intermission
10:20 47 Amplifying the role of collaboration globally for neglected and commercial disease drug discovery

Barry A. Bunin, bbunin@collaborativedrug.com, Charlie Weatherall, charlie@collaborativedrug.com. Management, Collaborative Drug Discovery (CDD), Burlingame, CA 94010, United States

Collaborative innovation is uniquely able to realize the economics of well-integrated specialization required for drug discovery. Particularly in the neglected infectious disease areas lacking a profit motive, better collaborative tools are fundamentally important to catalyze faster progress. Layering unique collaborative capabilities upon requisite drug discovery database functionality unlocks and amplifies synergy between biologists and chemists. Researchers need to have tools that balance individual needs for robust, intuitive registration and bioactivity analyses while at the same time facilitating collaborations with secure data partitioning, communication, and group engagement. Recent results shared publicly for Tuberculosis, Malaria, and Kinetoplastids, as well as an unusual collaboration among hundreds of undergraduates students developing novel compounds for Neglected Disease applications amply demonstrate these bold suppositions are true and general. Since collaborative technology is “therapeutic area agnostic”, it has generally been proven equally applicable for commercial applications. Representative commercial case studies include broad consortia such as the NIH Neuroscience Blueprint collaboration between drug discovery companies, CROs, together with seven leading academic biology laboratories as part of a 5-year government contract to advance new CNS drugs into the clinic. As well as more focused examples following the lean venture funded model such as the collaboration between Acetyton Pharmaceuticals with Harvard and a Chinese CRO to bring a selective HDAC inhibitor into the clinic. Finally, by spanning the continuum of private, collaborative and public modes, researchers globally can now seamlessly collaborate across the pre-competitive and competitive landscape.

10:50 48 Why there needs to be open data for ultrarare and rare disease drug discovery

Sean Ekins1, ekinssean@yahoo.com, Alex M Clark2, Jill Wood3, Lori Sames4, Allison Moore5. (1) Collaborations in Chemistry, Fuquay Varina, NC 27526, United States, (2) Molecular Materials Informatics,, Montreal, Quebec H3J 2S1, Canada, (3) Jonah's Just Begun, Brooklyn, NY 11215, United States, (4) Hannah's Hope Fund., Rexford, NY 12148, United States, (5) Hereditary Neuropathy Foundation., New York, NY 10016, United States

Individual parents and patients are increasingly doing more to fund, discover and develop treatments for rare and ultra-rare diseases that afflict their children, themselves or their friends. They are performing many roles in driving the science that are equivalent to professional scientists. Through their efforts and that of the collaborative networks which they have developed, they may be in position to disrupt drug discovery. But what is missing is accessibility to scientific data and publications. This can be illustrated using three different ultra-rare disease parent / patient advocate groups and the diseases for which they are developing treatments. Each group encountered difficulties in accessing information. Perhaps what could be proposed is an open data commons or database dedicated to rare and ultra-rare diseases. This could encompass chemistry and biology data. Our efforts with the mobile app Open Drug Discovery Teams (ODDT) will be described to catalyze such efforts. Challenges will likely come from industry which sees this as a lucrative field. Our aim is to discuss the pros and cons of Open Data for Ultra-Rare and Rare Disease Drug Discovery and suggest mechanisms for this to occur to benefit everyone.

11:20 49 Supporting the exploding dimensions of the chemical sciences via global networking

Valery Tkachenko1, TkachenkoV@rsc.org, Anthony Williams1, Sergey Vatsadze2. (1) Royal Society of Chemistry, Wak Forest, NC 27587, United States, (2) Department of Chemistry, Moscow State University, United States

The Royal Society of Chemistry is building is a comprehensive federated platform for chemical informatics in a Big Research Data world. The resulting platform is a blend of social, informatics and knowledge components which itself produces new dimensions in the chemical sciences to support activities such as Open Innovation and sharing of data. The platform itself would be isolated and insular unless a broad collaboration between societies, industry and universities is created in a federated and open way. In this presentation we will talk about one of these efforts, between RSC and Moscow State University, to facilitate the development, population and use of a global networking platform for the chemical sciences.

Monday, August 11, 2014

Hunting for Hidden Treasures: Chemistry Text Mining in Patents and Other Documents - AM Session

Palace Hotel
Room: Presidio

Wei Deng, Organizers
Wei Deng, Presiding
9:05 am - 12:00 pm
9:05 Introductory Remarks
9:10 50 Recent enhancements in the accuracy of CLiDE tool for extracting chemical structure data from patents and other documents

Aniko T Valko1, Aniko.Valko@keymodule.co.uk, Peter Johnson2. (1) R&D, Keymodule Ltd., Leeds, United Kingdom, (2) School of Chemistry, University of Leeds, Leeds, United Kingdom

We present an enhanced version of CLiDE, which is a long-term project aimed at detecting chemical structure diagrams rendered in images and converting these diagrams into chemical connection tables. The enhancement was achieved by introducing a feedback mechanism into CLiDE's interpretation process. This mechanism makes use of a series of domain- and spatial-specific rules for identifying drawing features that convey a complex or an ambiguous meaning. Once such a feature is found, CLiDE automatically corrects the structural information being compiled and passed through subsequent interpretation steps. This enhancement has a considerable effect on CLiDE's accuracy in reconstructing chemical structures and auto-detecting interpretation errors. A detailed study of CLiDE's performance on a large validation corpus will be presented. The validation corpus will include benchmark sets created by other projects and a set of non-Markush structures collected from patent documents.

9:40 51 Structure Clipper: An interactive tool for extracting chemical structures from patents

Christopher E Kibbey, christopher.kibbey@pfizer.com, Jacqueline L Klug-McLeod. Worldwide Medicinal Chemistry, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States

Medicinal chemists rely on patent intelligence at three distinct junctures of a research project: inception of an idea, ongoing competitive intelligence during project development, and preparation of the patent application. Knowledge of the competitive landscape provides medicinal chemists with insight into competitor's strategies for improving potency and decreasing ADME liabilities. While comprehensive databases of chemical structures obtained from patents are available, medicinal chemists generally focus their analysis on a few patents at a time. In addition to identifying “key” compounds within a competitor's patent, medicinal chemists are interested in obtaining bioassay results, synthetic route, yield and analytical characterization related to these compounds. Medicinal chemists spend considerable effort locating chemical structures within a patent, and tools to facilitate compound annotation and traceability are highly desired.
Structure Clipper is an interactive tool that combines image processing, image-to-structure (OSRA) conversion, optical character recognition (OCR), chemical name annotation, spelling correction, and chemical name-to-structure conversion to automatically identify and extract chemical structures from electronic documents, such as patents and journal articles. Chemical structures generated by Structure Clipper are tagged with the page number and rectangular coordinates on the page on which the original chemical text or image is located. In addition, chemical structures are assigned the identifier (e.g., example number, synthetic step, intermediate, etc.) described in the source document. Structure Clipper provides a direct link between chemical structures and their origin within the source document. Extracted structures may be manually annotated with related information, such as bioassay results, synthetic yield, and spectroscopic characterization. Lastly, Structure Clipper provides an interface for semi-automated enumeration of structures from Markush tables spanning multiple pages in the source document.

10:10 52 Computer-assisted Markush structures curation from patent documents

David Deng, ddeng@chemaxon.com, Arpad Figyelmesi. ChemAxon, Cambridge, Massachusetts 02142, United States

Markush structures (or generic structures) are widely used in chemical patents and combinatorial chemistry to define large chemical spaces. ChemAxon provides powerful Markush structure drawing, Markush search, overlap analysis and visualization functionalities. These Markush functionalities are widely available in our products, including Marvin, JChem Base, JChem Cartridge and Instant JChem. In this presentation, the latest development of the new Patent Curation Tool will be introduced. The curation tool provides an intuitive interface to the user to read patents and curate the Markush structures side-by-side. It features ChemAxon's chemistry text mining functionality to extract R-group definitions from patent documents, and allows easy importing of these R-group fragments in bulk. The tool takes full consideration of various Markush complexities (e.g. nested R-groups, multiple R-group attachment points). Once the Markush structure is curated, the user may also enumerate the Markush structure. ChemAxon also provides powerful search capability allowing structure search in the Markush chemical space. The latest improvement also allows overlap analysis between two Markush structures, and hit result analysis. Both features will be demonstrated in this presentation as well.

10:40 Intermission
10:55 53 Use of reverse text-mining to establish whether indexing and classification of chemical patents is still necessary

Robert A Stembridge, bob.stembridge@thomsonreuters.com, IP & Science, Thomson Reuters, London, United Kingdom

With the success of chemical structure name recognition techniques to identify chemical entities within patent documents (at least for specific entities - textual Markush structure recognition still remains a challenge), is there still a place for indexing and classification of chemical patents? This presentation will examine this question by "reverse text-mining" i.e. by analysis of data sets retrieved with indexing and classification using text-mining techniques to identify the success or otherwise of comprehensiveness and precision of recall.

11:25 54 Extraction of chemical reactions from full text documents: From n-tuples of value attribute pairs toward the automated construction of reaction databases

Lutz Weber, lutz.weber@ontochem.com, Matthias Irmer, Claudia Bobach. IT Solutions, OntoChem GmbH, Halle (Saale), Sachsen-Anhalt 06120, Germany

Computer aided extraction of chemical reactions from natural language text documents represents a problem of high complexity. Thus, chemicals may be mentioned as starting materials, catalysts, supports, solvents, side products or products – all representing different roles to be captured. Further, additional information includes temperature, yield or other quantitative or qualitative data that increases the value of the information on a particular chemical reaction. We will present a semantic text mining system that uses chemical ontologies on chemical compounds, chemical classes and substituents in conjunction with a chemical reaction relationship model to extract information on chemical reactions from scientific publications. The system first identifies all named entities such as chemical named entities, classifies chemical names as specific compounds, chemical compounds classes, substituent lists or general reaction related named entities. In addition, relevant units, numeric or qualitative value and terms such as for example “excellent yield of >90%” are identified. In a second step, terms are combined into complex, nested named entities to facilitate a syntax based rule assignment of specific roles the identified chemical named entities. Reaction specific syntax rules are implemented to understand and classify chemical reactions – from metabolic reactions up to the range of well-known named chemical reactions.

As a result, the presented text mining system allows to scan millions of text documents in few days and extract chemical reaction information in CML or other chemistry aware data file formats for filling chemical reaction databases. As a case study we will present the automated generation of a metabolic database for phytochemicals.

11:55 Concluding Remarks

Monday, August 11, 2014

The Impact of the IUPAC InChI on Finding and Linking Information on Chemicals - AM Session

Palace Hotel
Room: California Parlor
Cosponsored by CHED, COMP

Stephen Heller, Organizers
Stephen Heller, Presiding
8:30 am - 12:00 pm
8:30 55 InChIs are not just for small molecules

Keith T Taylor, keith.taylor@laderaconsultancy.com, Ladera Consultancy LLC, Sparks, Nevaa 89436, United States

The success of the InChI for small, drug-like molecules has driven the need to extend its coverage. Progress with enhncements to support enhanced stereochemistry, organometallics, biologics, polymers, and mixtures will be described.

9:05 56 NCI/CADD Group's InChI usage and analysis of tautomerism for InChI V2

Marc C Nicklaus, mn1@helix.nih.gov, CADD Group, CBL, CCR, National Cancer Institute, NIH, Frederick, MD 21702, United States

We are presenting a brief overview of the current status of the usage of InChI and InChIKey in several free web services of the NCI/CADD Group at http://cactus.nci.nih.gov, as well as in the underlying very large database of small molecules aggregated from screening sample collections and other sources. We also present a status update of the efforts to analyze, and if needed modify, the current rules in the InChI algorithm of calculating tautomers and thus ensuring InChI's design goal of being a tautomer-invariant identifier with view of a possible revamping of the handling of tautomerism for a version 2 of InChI.

9:40 57 Intersecting crystallographic databases using InChI

Ian J Bruno1, bruno@ccdc.cam.ac.uk, Tjelvar SG Olsson1, Sanchayita Sen2, Gary M Battle2, Jose M Dana2, Sameer Velankar2. (1) The Cambridge Crystallographic Data Centre, Cambridge, United Kingdom, (2) Protein Data Bank in Europe, EMBL-European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom

Knowledge derived from small-molecule crystal structure data has a role to play in helping protein crystallographers refine models of ligands crystallised with biological macromolecules. The Cambridge Crystallographic Data Centre (CCDC) and the Protein Data Bank in Europe (PDBe) are thus engaged in a collaborative project that aims to make coordinates of structures from the Cambridge Structural Database (CSD) available in the worldwide Protein Data Bank (wwPDB) chemical component dictionaries. This requires us to identify molecules in the CSD that match ligands in the Protein Data Bank (PDB), a task for which InChI is ideally suited. This presentation will describe how we have been able to take advantage of InChI to intersect these two resources of 3D structural information and will discuss the challenges encountered in reliably generating InChIs for structures in the CSD where the chemistry is diverse and crystallographic artefacts abound. This work is funded by a grant from the BBSRC, UK, Reference BB/K016970/1.

10:15 Intermission
10:30 58 Data formats for elementary gas phase kinetics: Unique representations of reactions

Donald R Burgess1, dburgess@nist.gov, Jeffrey A Manion1, Carrigan J Hayes2. (1) Chemical and Biochemical Reference Data Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, United States, (2) Department of Chemistry, Otterbein University, Westerville, OH 43081, United States

A method of extending the IUPAC International Chemical Identifier (InChI) to describe and identify elementary reactions in a standard computer readable notation was developed. Denoted InChI-ER, the method is based on the existing InChI formalism, with certain refinements including a more complete identification of molecular entities. Using this base notation, an identifier for elementary reactions on a molecular level can be created by adding additional layers in a conceptually similar and extensible manner. Two of the layers describe the atoms involved in the transition-state and the connectivity changes that occur during the reaction. Additional layers classify the reactions on the basis of the connectivity changes, providing chemical information useful in organizing and searching kinetic data sets found in databases or used in detailed kinetic modeling. Important aspects of the method are that the proposed layers are optional, that they do not interfere with existing InChI specifications, and that they retain extensibility should further refinements be desired in the future. Its utility for organizing data is illustrated by implementation for a well-employed combustion mechanism.

11:05 59 International Chemical Identifier (InChI) at Wiley: Strengths and limitations

Graeme E. Whitley1, gwhitley@wiley.com, Bernd Berger2, bberger@wiley.com. (1) Global Research, Research Innovations, Wiley, Hoboken, NJ 07030, United States, (2) Global Technology Solutions, Wiley-VCH, Weinheim, Baden-Württemberg 69469, Germany

Wiley's adoption of the InChI as a chemical indentifier is pervasive across its publications and databases. We will provide use-case examples that highlight the standard's core strengths and fundamental limitations. Areas to be covered will include compound identity matching, stereochemistry, and organometallics.

11:40 60 InChIKeys as chemical entity ids to enable in-context text indexing and to identify engine-ranked chemically similar documents

Stephen Boyer, sboyer@us.ibm.com, Tom Griffin, Cassidy Kelly, Eric Louie, Jacques Labrie, Scott Spangler, Ying Chen, Ru Fang, Su Yan. IBM Almaden Research Center, San Jose, CA 95120, United States

Chemical name annotators that find and standardize chemical names in scientific papers and patent text documents usually generate tables of SMILES or InChIs. In order to make the most use of identified compounds for text analytics, we convert the found compound names into InChIKeys and insert the InChIKey values into the original document text beside the compound name, along with the literal word "inchikey" as an entity type marker. The augmented documents are then indexed as full text using Solr/Lucene resulting in a search service with useful capabilities such as co-occurence analysis, e.g., finding all cases of any text synonym of aspirin within 10 words of cancer, finding cases of any chemical compound (any entity marker "inchikey") within 10 words of "produced" AND within 10 words of "yield". This indexing technique also supports finding chemically similar documents. The selected compounds' InChIKeys are entered together as a set of query "words", returning an engine-ranked list of documents with the most compounds overlapping the target's. Full engine capabilities such as boolean filter terms are still available to refine the set with full engine performance. The indexed augmented text also facilitates other data mining capabilities such as batch searches for large numbers of molecules over millions of documents. Our team is developing a comprehensive chemical pedia containing structure representations and attributes for millions of molecules derived from patents and other sources such as Medline Abstracts. On-going work includes entity insertions simultaneously performed on gene, drug and disease annotation types, enabling rich entity/text combination queries with no search engine modifications needed.

Monday, August 11, 2014

Global Challenges in the Communication of Scientific Research - PM Session

Palace Hotel
Room: Marina

David Martinsen, Norah Xiao, Organizers
David Martinsen, Norah Xiao, Presiding
1:15 pm - 5:25 pm
1:15 Introductory Remarks
1:20 61 Building BRICK by BRICK sometimes works: How ACS Editors' participation in ACS on Campus has brought publishing best practices to thousands of authors in BRICKS countries, an overview of challenges and successes

S. Sara Rouhi1, s_rouhi@acs.org, Kirk S. Schanze3, schanze-office@ami.acs.org, Prashant V. Kamat2, pkamat@nd.edu. (1) Library Relations, ACS Publications, Washington, DC 20036, United States, (2) Department of Chemistry & Biochemistry and Radiation Laboratory, University of Notre Dame, Notre Dame, IN 46556, United States, (3) Department of Chemistry, University of Florida, Gainesville, FL 32611, United States

The ACS on Campus program, spearheaded by ACS Publications in 2010 and now an ACS-wide program, was developed to help graduate students, post-docs, and faculty address the challenges of a career in the sciences, principal among those: getting published in top journals. This presentation will feature ACS on campus team members and ACS Editors sharing their experiences of teaching publishing best practices to the biggest source of science research today, the BRICKS countries. ACS on Campus team lead, Sara Rouhi, will outline the various publishing modules offered to students: Getting Started Writing a Manuscript, What is Peer-Review?, Technical Writing for Non-native Speakers, and Copyright and Ethics in Scholarly Communication. ACS Editors-in-Chief Prashant Kamat and Kirk Schanze will share their experiences delivering these best practices to students around the world during their participation in ACS on Campus. Drs. Kamat and Schanze have attended over 15 ACS on Campus events ranging from Beijing to Calcutta to Rio de Janeiro. Sara Rouhi created the ACS on Campus program in 2010 and current manages the international arm of the program.

2:00 62 Article-impact assessment in the age of open copyright and social networking

Frederick F Fenter, frederick.fenter@frontiersin.org, Costanza Zucca. Frontiers Media, Lausanne, Vaud 1015, Switzerland

Two trends in scientific publishing are the increased use of social networks and the dissemination of articles under open public copyright licenses. In this presentation, we wiill discuss how these will affect the impact metrics –both traditional and new – used to indicate the inherent scientific merit of an article.

2:30 63 Chemistry journals in China and chemical papers from China: What is the future?

Xiaowen Zhu, zhuxiaowen74@126.com, University & Higher Education Press, Tianjin, Tianjin, China

There are more than 5,000 academic journals in China and only about 60 on chemistry or chemical engineering. From 2009 to 2013, the number of SCI articles in chemistry is more than 160,000 but only 20 chemistry journals are indexed by SCI in China. Most of the chemical papers from China are published outside. To make the domestic journals more international and get more influence are a national strategy. The government put a big financial support in 2012 and 2013. The English language journals including Chinese Chem. Lett., Chinese J. Chem., Sci. China Chem., Chinese J. Polym. Sci., Chinese J. Chem. Eng., and Front. Chem. Sci. Eng. got the financial support.Building an international editorial broad, getting the best papers from Chinese scientists and other countries scientists, using international reviewers, collaborating with the international publisher are the main ways at this moment.

3:00 64 Globalization of scholarly publishing: Meeting the needs of international researchers

Amy Beisel, amy.beisel@researchsquare.com, Keith Collier, Ben Mudrak. Research Square, LLC, Durham, NC 27701, United States

Science is truly an international endeavor, with significant investments in research being made around the globe. The fundamentals of research may be universal, but language and lack of familiarity with scholarly publishing conventions represent considerable barriers to the broad dissemination of research findings for many scientists. Publishers are receiving an increasing proportion of their submissions from researchers who are not native English speakers. What can be done to simplify the submission process for these authors while still ensuring that they are sending high-quality and well-matched manuscripts? At Research Square, we assist researchers worldwide in their efforts to disseminate research results. Previously, we surveyed a sample of our customers to determine the biggest challenges they faced when submitting manuscripts to English-language journals. Specific suggestions we have heard from international researchers include their desire for assistance with certain aspects of the publishing process and an appreciation for clearly defined journal policies. Using these survey results and additional insight gained by our ongoing conversations with international investigators, we can provide a greater context for how the current trends in scholarly publishing affect researchers around the world. In some cases, simple changes to the publication processes can lead to better, long-lasting relationships with authors. Overall, by meeting the needs of international scientists, we enable the creation of an efficient publication process driven by a diverse community, which increases the pace of scientific discovery itself.

3:30 Intermission
3:45 65 Beyond open access

Martin Hicks, mhicks@beilstein-institut.de, Beilstein-Institut, Frankfuty, Germany

Open Access has now become established within scientific publishing. The advantages of having scientific discoveries and research results made freely available for the global scientific community is self-evident. Authors benefit in retaining full copyright and being able to archive the final version of their own articles without restriction. The Beilstein Open Access journals engage the scientific community worldwide, through the removal of price barriers for authors and readers. But what does the scientific community really want when it comes to publishing? The serials crisis is still ongoing, peer-review is straining, more and more papers are being submitted and published, plagiarism is a problem, data reproducibility and integrity is at times questionable – and big data is looming on the horizon. It is essential to ensure that data published by researchers really is useable by all members of the global scientific community. The Beilstein-Institut supports and coordinates two international projects with the aim of ensuring data integrity in enzyme and glycan data reporting.

4:15 66 Supporting and facilitating the publication of chemical science research: A global view

Daping Zhang, ZhangD@rsc.org, Royal Society of Chemistry, United States

The Royal Society of Chemistry is the fastest growing chemical society publisher in the world. Over the last five years its published output has increased by a factor of five, from approximately 5000 articles in 2008 to approximately 25000 articles in 2013. As is well known in the publishing community, the majority of these submissions now come from countries with a rapidly expanding research base, such as China, India and Brazil. Not yet producing the same level of output but making rapid progress nonetheless are countries from territories such as the Middle East and Africa. To support this growing trend and provide chemists from these countries with the tools and assistance to publish in English-language journals, the Royal Society of Chemistry has over recent years significantly expanded the reach and scale of its international development and publishing innovation activity: establishing local editorial teams (in China, India, Japan and the US); forging co-publishing partnerships with sister chemical societies; organizing publishing internships, students clubs and “how to publish” training workshops; hosting international conferences; and investing in capacity-building initiatives. This presentation will provide an overview of some of the Royal Society of Chemistry's recent initiatives in the fields of international co-operation, capacity building and publishing innovation, including the RSC Frontiers journals in China, activities and partnerships established in India and Japan, and the principles and aims of the Pan-African Chemistry Network.

4:45 67 Enabling international collaboration using the Eureka Research Workbench

Stuart Chalk1, schalk@unf.edu, Robert Belford2, Phuc Tran2, Thanit Pewnim3. (1) Department of Chemistry, University of North Florida, Jacksonville, FL 32224, United States, (2) Department of Chemistry, University of Arkansas at Little Rock, Little Rock, AR 72204, United States, (3) Department of Chemistry, Silpakorn University, Nakhon Pathom Province 73000, Thailand

The Eureka Research Workbench (http://eureka.sourceforge.net) is an online environment for capturing the scientific process that is currently under development. This presentation discusses the initial use of Eureka for international collaborative research on eco-toxicological investigations of the generation of estrogen mimics in the early aquatic food chain, (the biochemical conversion of nonylphenol ethoxylate to nonylphenol by the microalgae chlorella vulgaris). Specimens are collected in Thailand with initial extraction and sample preparation occurring at Silpakorn University, where data is directly entered into Eureka. Samples are then sent to analytical laboratories at UALR and UNF for separation and spectroscopic analysis with all data also being uploaded to Eureka. Through Eureka both the data and metadata are stored in one place enabling collaborative research across three distinct geographic regions and time zones. A discussion of the process and future plans for refinement of the system will be included.

5:05 68 Combatting chemophobia: Speaking science to distrust and engaging with empathy, online, and face-to-face

Leigh K Boerner2, Ljkboerner@gmail.com, Raychelle Burks6,8, rmburks@gmail.com, Matthew Hartings4, hartings@american.edu, Chad Jones5, chemist.jones@gmail.com, Kevin Shanks7, forensictoxguy@gmail.com, Janet D Stemwedel1, dr.freeride@gmail.com, Brandi VanAlphen3, branvanchemist@gmail.com. (1) Department of Philosophy, San Jose State University, San Jose, CA 95192-0096, United States, (2) Unaffiliated, United States, (3) Unaffiliated, United States, (4) Department of Chemistry, American University, Washington, DC 20016, United States, (5) Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT 84602, United States, (6) Center of Nanohybrid Functional Materials, University of Nebraska, Lincoln, NE 68508, United States, (7) Department of Forensic Toxicology, AIT Laboratories, Indianapolis, IN, United States, (8) Department of Chemistry, Doane College, Crete, NE 68333, United States

Chemists should serve the broader public, including better educating people in the nature and uses of chemistry. This requires communicating with different segments of the public and engaging in different modes of communication. Whether talking to media or to people online, speaking on behalf of professional organizations like ACS or speaking to our neighbors, our attempts to communicate impact people's understanding of chemistry and of the kind of people chemists are. Participants in this panel discussion will describe their chemistry outreach experiences, engaging with groups whose opinions on chemistry range from curiosity and misunderstanding to concern and mistrust. Panelists will describe what has made their own attempts to communicate successful and lessons learned from less successful attempts. We examine a diverse array of outreach efforts to build a repertoire of strategies for engaging with different communities. Leigh Krietsch Boerner writes about the science of consumer products and works to address chemophobia from product marketing and rampant misinformation on the internet. Brandi VanAlphen examines impacts of industry supported messages on the public's understanding of chemistry and of chemists, and discusses how corporations can convey a better understanding of science while addressing the public's distrust of their motives. Matthew Hartings uses food and cooking to engage large groups of non-scientists with chemistry in the classroom and in public lectures and will discuss how his outreach has been informed by the science of communicating science. Chad Jones explores audiovisual outreach (podcasts and videos) as a way to make chemistry exciting and accurate. Raychelle Burks is a blogger and activities coordinator using the intersection of chemistry and pop culture as an outreach tool. Kevin Shanks is a forensic toxicologist and drug chemist whose activities include community and media outreach through social media and blogging about toxicology, drug laws, and other things forensic.

Monday, August 11, 2014

Hunting for Hidden Treasures: Chemistry Text Mining in Patents and Other Documents - PM Session

Palace Hotel
Room: Presidio

Wei Deng, Organizers
Wei Deng, Presiding
1:30 pm - 4:55 pm
1:30 Introductory Remarks
1:35 69 ChemInfoCloud: Opensource based Cloud compatible chemical textmining tools for harvesting largescale medical literature

Muthukumarasamy Karthikeyan, karthincl@gmail.com, Digital Information Resource Centre, CSIR-National Chemical Laboratory, Pune, Maharastra 411008, India

Text mining involves recognizing useful patterns from a wealth of information hidden latent in unstructured text and deducing explicit relationships among data entities by using data mining tools. Harvesting chemical data from textual information is a challenging task. Text mining of Biomedical literature is essential for building biological network connecting genes, proteins, drugs, therapeutic categories, side effects etc. related to diseases of interest. We present an approach for chemically significant textmining biomedical literature mostly in terms of not so obvious hidden relationships and build biological network applied for the textmining of scientific literature related to human diseases like Tuberculosis and Malaria. The methods, tools and data used for building biological networks using a distributed computing environment previously used for ChemXtreme and ChemStar applications will be discussed. The architecture of open source based tool CheminfoCloud developed for this purpose will be presented.

2:05 70 Knowledge mining by structure search

Jinbo Lee, rhotchandani@scilligence.com, Scilligence, Burlington, MA 01803, United States

With prevalence of cross-organization collaborations, R&D reorganizations and company merger & acquisition, knowledge can be easily lost in a large pile of unstructured data such as PPT, Word, Excel and PDF. Through a case study example, knowledge mining and preservation by structure searching makes possible by Scilligence's informatics tools.

2:35 71 Toward extracting analytical science metrics from the RSC archives

Stuart Chalk1, schalk@unf.edu, Antony Williams2, Valery Tkachenko2, Colin Batchelor3. (1) Department of Chemistry, University of North Florida, Jacksonville, FL 32224, United States, (2) Royal Society of Chemistry, Wake Forest, NC 27587, United States, (3) eScience, Royal Society of Chemistry, Cambridge, Cambridgeshire CB4 0WF, United Kingdom

The Royal Society of Chemistry has an archive of over 300,000 articles containing rich chemistry data in the form of chemicals, reactions, property data and analytical spectra. In this work, we propose the development of standards for organization, representation, and annotation of analytical science information, based around the definition of a Chemical Analysis Metadata Platform (ChAMP). This involves development of specifications for metadata collection, ontologies to link the metadata semantically, and automated extraction of the metadata from the RSC archives. Integration with synergistic activities at RSC, such as the Analytical Abstracts database and the Chemistry Methodology Ontology (CMO), will be explored. This presentation will provide an overview of the vision and scope of this project and initial developments.

3:05 Intermission
3:20 72 SureChEMBL: An open system for exploration of patent chemistry space

Michal M. Nowotka, mnowotka@ebi.ac.uk, John P. Overington, Mark Davies. ChEMBL, The European Bioinformatics Institute, Cambridge, Cambridgeshre CB10 1SD, United Kingdom

In December 2013, EMBL-EBI acquired the SureChem (rebranded SureChEMBL) chemical patent extraction system from Digital Science Ltd. The acquisition has resulted in free and open access to regularly updated chemical data extracted in the patent literature to the research community, which previously would have only been accessible through paywalled systems. SureChEMBL joins ChEMBL as a resource linking bioactive molecule structures to targets and clinical utility, and is based at the EMBL-EBI, alongside resources such as UniProt, Ensembl and the 1000 genomes project. The presentation will cover how existing tools and methods developed within our group, used to extract chemical and biological data from the scientific literature, will be used to enhance the SureChEMBL data extraction process. It will address image segmentation, extraction and classification techniques as well as providing an overview of text extraction methods. Finally, future plans for extending the functionality of the SureChEMBL platform will be discussed. These plans include improving the integration with other resources such as Europe PubMed Central, the extraction of new entity types (e.g. targets, disease, cell-lines and assays) and improved API and KNIME support.

3:50 73 Computer analysis of the scientific literature

Scott Spangler1, Olivier Lichtarge1, Meena Nagarajan1, MeenaNagarajan@us.ibm.com, Angela Dawn Wilkins1, Lawrence Allen Donehower2, Curtis Reid Pickering2, Sam Julian Regenbogen2, Benjamin Judson Bachman2, Ioana Roxana Stanoi1, Ying Chen1, Jacques J Labrie1, Linda Kato1, Maria Elisa Terron2, Anbu Karani Adikesavan2, Stephen K Boyer1. (1) Watson, IBM, San Jose, CA 95120, United States, (2) College of Medicine, Baylor College of Medicine, Huston, Tx, United States

We have extended our ealier work in capturing and analyzing chemical and biological entities to programmatically identifying important relationships between these various entities. The programs further extract and model complex N-point associations between entities using Natural Language Processing techniques thus establishing a framework that allows scaling multiple analytics applications over extracted data and metadata. Two reasoning frameworks will be presented that have proven to generate useful, verifiable hypotheses in the P53 biology space.

4:20 74 Using the BRAIN, biorelations and intelligence network, for knowledge discovery

Albert Mons1, albert.mons@euretos.com, Barend Mons3, Aram Krol1, Arie Baak1, Antony Williams2, Valery Tkachenko2. (1) Euretos, Delft, The Netherlands, (2) Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina 27587, United States, (3) Netherlands bioinformatics center, Nijmegen, The Netherlands

In life sciences and biomedical research, public data is essential to the knowledge discovery process. Getting useful results from public data sources however is very challenging: there is too much relevant data and information is stored in too many disconnected data sources. Manually searching all these sources is time consuming; the results are at best incomplete and making the connections between results found is simply impossible. This situation is one of the biggest frustrations of life scientists today. For over 10 years academic data researchers have been trying to address these issues. At the level of manually curated data great progress has been made and clear proof exists that extracting these explicit data manually from sources is highly valuable. However, it is also proven that manual curation is not scalable and thus many scientific discoveries are hidden in the data, waiting to be 'discovered' This presentation will report the Euretos' Bio Relations and Intelligence Network [BRAIN[Ξ]] allowing for high quality in silico knowledge discovery. The platform brings together a wide range of scientifically high value data sources, harmonised in an interoperable format and semantically identified in terms of the concepts they represent The result of this process is a comprehensive network of concepts and their relations derived from all the underlying data sources. BRAIN[Ξ] thus provides one single, accessible view on all underlying data sources involving both the manually curated as well as the computer inferred implicit relations that have either escaped manual extraction or have not been made explicit at all. This allows unearthing 'hidden gems' at an unprecedented scale.

4:50 Concluding Remarks

Monday, August 11, 2014

Sci-Mix - EVE Session

Moscone Center, North Bldg.
Room: Hall D

Erin Bolstad, Organizers
, Presiding
8:00 pm - 10:00 pm

3 Hazardous substances data bank: A tool for natural product information and research

Shannon M. Jordan, shannon.jordan1@nih.gov, National Institutes of Health, National Library of Medicine, Bethesda, MD 20894, United States

The National Library of Medicine (NLM) Hazardous Substances Data Bank (HSDB) is a database that contains a wealth of information on many types of chemicals including natural products. Increasingly, professionals and the general public seek and utilize information about natural products for various purposes. In response to this demand, the HSDB development team has increased the number of natural product records within the database and updated existing records. Natural product records in HSDB contain, but are not limited to the following information: chemical structure, human health effects, animal toxicity, pharmacology, metabolism and pharmacokinetics, environmental fate and exposure, safety and handling, manufacturing, use, laboratory methods, and more. The data extraction team along with a Scientific Review Panel (SRP) utilizes dozens of sources to build, update, and peer-review HSDB records on a four month cycle. As new articles on natural products are published, HSDB serves as an information resource that captures historical and emerging science. NLM will continue to develop and market HSDB as a tool for researchers and the general public covering all types of natural products from phytochemicals to venoms and toxins.


4 Real structures for real natural products − really getting them right and getting them faster

Patrick Wheeler1, pwheeler@yahoo.com, Antony Williams3, Mikhail Elyashberg2, Rostislav Pol2, Arvin Moser1. (1) Advanced Chemistry Development, Toronto, Ontario M5C 1B5, Canada, (2) Advanced Chemistry Development, Moscow, Russian Federation, (3) Royal Society of Chemistry, London, United Kingdom

Structure determination for natural products has been revolutionized by the advance of NMR technology and application of innovative experimental techniques. Notably, it is possible to obtain structures from small amounts of material that are not accessible to single crystal X-ray diffraction. Still, the interpretation of this data can be arduous, requires great expertise, and is error-prone. Of course, other techniques are used to confirm structures as well: synthetic reproduction of natural products has a long tradition of success in the elucidation of molecules containing intricate elements, including multiple stereo centers. However, despite rigorous analysis by qualified chemists, these methods still sometimes arrive at erroneous results1-5.
Astute application of modern technology can speed the rate at which structures are solved, while also vastly reducing errors that result either from synthetic methods or from unassisted analysis of instrumental data. Computer Assisted Structure Elucidation (CASE) has developed over the past decades to relieve the burden of work in proving correct structures. In this presentation, we will discuss how CASE is used to objectively analyze complex sets of NMR data in order to test structural hypotheses, conduct de novo structure elucidation, and query large databases of known structures for matches of already identified natural products.


13 BCL:Conf A knowledge based ligand flexibility algorithm and application in computational drug discovery like online drug design game Foldit

Sandeepkumar K Kothiwale1, sandeepkumar.k.kothiwale@vanderbilt.edu, Jens Meiler1,2, Will Lowe1. (1) Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States, (2) Pharmacology, Vanderbilt University, Nashville, Tennessee 37235, United States

The three-dimensional conformation a small molecule can adopt is critical for its binding to a target protein. Therefore, rapid and accurate prediction of the conformational space of a small molecule is critical for both structure- and ligand-based drug discovery algorithms such as docking or quantitative structure-activity relationships, respectively. Here we have derived a database of small molecule fragments frequently sampled in experimental structures within the crystallographic structure database (CSD) and the protein databank (PDB). Likely conformations of these fragments are stored as rotamers in analogy to amino acid side chain rotamer libraries used for rapid sampling of protein conformational space – an approach that will allow integration of BCL::Conf into respective computational biology programs such as Rosetta or Scwrl. A conformational ensemble for small molecules can now be generated by recombining fragment rotamers with a Monte Carlo search strategy. BCL::Conf was benchmarked against other conformer generator methods including Moe, Confab, and Frog2, in its ability to recover native-like protein bound conformation of small molecules, diversity of conformational ensembles, and sampling rate. BCL::Conf recovers 97% of molecules within a root mean square deviation of 2Å to the native conformation. The rapid rotamer sampling approach allows integration into Rosetta suite of macromolecular modeling and drug design module of the online protein folding game, Foldit. Conformer generation using multiple threads allows easy integration into high-throughput docking experiments using RosettaLigand.


27 ChEMBL - linking chemistry and biology to enable mapping onto molecular pathways

Louisa J Bellis, ljbellis@ebi.ac.uk, Anna Gaulton, Anne Hersey, A Patricia Bento, Jon Chambers, Mark Davies, Felix Kruger, Yvonne Light, Nathan Dedman, Shaun McGlinchey, Michal Nowotka, George Papadatos, Rita Santos, John P Overington. ChEMBL Group, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, United Kingdom

ChEMBL is an open access, large-scale bioactivity database (https://www.ebi.ac.uk/chembl) containing over 11 million bioactivity data points and 1.6 million compounds, primarily curated from scientific literature. Systems biology plays a central role in drug discovery by integrating both chemical and biological processes. Understanding a drug's mode of action, from the consideration of its target to how it affects a molecular pathway is crucial and so combining systems biology with cheminformatics is necessary. ChEMBL is one such database that extracts data and then displays clear links between the chemical compounds, their bioactivity endpoints and associated protein targets. Showing these clear links is essential if a database is to be successfully used for drug discovery. A major stumbling block in such bioinformatic databases has, historically, been the number of different activity types and units published, thereby making it difficult to compare results from different papers. ChEMBL uses their own standardization technique to give an activity type, pChEMBL, to unite these disparate activity types and values into a uniform set that can be compared across multiple sources. pChEMBL is defined as –Log(molar IC50, XC50, EC50, AC50, Ki, Kd or Potency). This symposium will aim to show how ChEMBL can be used for mapping onto molecular pathways, in order to understand the modulated nodes and the chemical tools available. We will present some examples from our research, where we have developed informatics approaches to automatically annotate pathways with ChEMBL data. A second use case for systems biology is the assembly of thematic views of ChEMBL, for example in ADME systems biology.


36 Toward quantitative structure-activity relationship (QSAR) models for nanoparticles

Katarzyna Odziomek1,2, kjodziomek@lbl.gov, Daniela Ushizima2, Tomasz Puzyn1, Maciej Haranczyk2. (1) Faculty of Chemistry, Laboratory of Environmental Chemometrics, University of Gdansk, Gdansk, Poland, (2) Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States

For decades the implementation of combined chemometric and computational tools has successfully aided scientists in predicting properties of new chemical compounds, based on their molecular structure. Quantitative structure-activity relationship (QSAR) methods, for example, which use linear and non-linear combinations of molecular descriptors, can be utilized to predict physical-chemical and/or biological properties of a molecule. QSAR approaches require well-defined structures of the considered molecules. Similar approaches are envisioned to predict relevant properties of nanoparticles (NPs), which are common form of chemicals used in cosmetics, pharmaceuticals and even food products. Due to their nature, NPs exhibit different characteristics than bulk materials. Nanoparticle samples are typically non-uniform and present various shapes and sizes, which give rise to their properties. Using conventional QSAR techniques is therefore not possible. Our goal is to develop new approaches that will facilitate QSAR-type modeling for nanoparticles. Our strategy is to incorporate new nanoparticle descriptors into existing, proven QSAR methods. We used scanning electron microscopy (SEM) images to obtain valuable visual information based on the morphology and topography of nanoparticles. Using computer vision algorithms, we have analyzed the SEM images and obtained descriptors of regions of interest (i.e. potential NPs), such as shape, size, surface area, roughness etc. These descriptors can be used to group and classify the nanoparticles. Next, we plan to build statistical models correlating nano and microscale parameters with physical-chemical and biological characteristics. We demonstrate applications of our methodology by investigating hydroxyapatite-based bionanomaterials. Hydroxyapatite, (hydroxylapatite, bioapatite, HAp), Ca10(PO4)6(OH)2, is a naturally occurring calcium phosphate mineral and has a wide range of biomedical applications, such as bone implants.


37 Targeting androgen receptor DNA-binding domain using structure-based methods to overcome resistance

Huifang Li, janelhf@gmail.com, Fuqiang Ban, Kush Dalal, Eric Leblanc, Paul S. Rennie, Artem Cherkasov. Vancouver Prostate Centre, University of British Columbia, Vancouver, B.C. V6H 3Z6, Canada

The human androgen receptor (AR) is considered as a master regulator in the development and progression of prostate cancer (PCa). As resistance to current antiandrogens remains a major challenge for the treatment of advanced PCa, there is a continuing need to pursue new anti-AR therapeutic avenues. In this study, we identified a plausible binding site on the DNA binding domain (DBD) of the AR, and small-molecule inhibitors through initial screening against this site. Through exploring the related chemical space of a moderately active initial hit compound, an analogue with 10-fold improved activity was identified, and with the preliminary structure-activity relationship (SAR) on this chemical class, we obtained a lead compound of equal potency as current drug Enzalutamide. The site-directed mutagenesis demonstrates the developed inhibitors do interact with the proposed binding site on the AR DBD, suggesting a novel mechanism of action that is fundamentally different from conventional targeting of the AR through its ligand binding domain (LBD). Furthermore, they effectively inhibit the growth of cells with resistance to Enzalutamide and blocks the transcription of constitutively active AR splice variants, which lack the entire LBD and play a critical role in the development of resistance to conventional antiandrogens. The current study provides an initial proof of principle for selectively targeting the AR DBD, which may be a novel and viable approach for the treatment of advanced and resistant PCa.


39 Random walk-based prediction of novel drug-target interactions

Abhik Seal, abseal@indiana.edu, Yong Yeol Ahn, David J Wild. School of Informatics and Computing, Indiana University, bloomington, INDIANA 47408, United States

Predicting novel drug–target associations is important not only for developing new drugs, but also for understanding how drugs work and what are their modes of action. As more data about drugs, targets, and their interactions becomes available, computational approaches are becoming more viable in drug-target association discovery. In this paper, we apply Random Walk with Restart (RWR) method on a heterogeneous network of drugs and targets to predict novel drug-target associations. From DrugBank, we construct the heterogeneous drug-target networks using four types of chemical fingerprints, sequence similarity, and interaction profiles. We find that our method produces reliable prediction with respect to the choice of chemical fingerprint types. We use ChEMBL, an external dataset with 2,763 associations, to evaluate the performance of our approach, finding that it correctly predict nearly 45% of the interactions that are only present in the ChEMBL dataset. We also verify several associations between drugs and mode-of-actions, such as strong associations between hair loss and cardiovascular drug Simvastatin. Finally, the associations between 110 popular drugs and 3,519 targets are analyzed as a case study. In summary, we demonstrate the effectiveness and promise of the approach—RWR on heterogeneous networks—for identifying novel drug target interactions.


40 Pred-hERG: A novel web-accessible computational tool for predicting cardiac toxicity of drug candidates

Vinícius M Alves1, viniciusm.alves@gmail.com, Rodolpho C Braga1, rodolphobraga@yahoo.com, Meryck B Silva1, Eugene Muratov2, Denis Fourches2, Alexander Tropsha2, Carolina H Andrade1, carolina@ufg.br. (1) Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Goias 74605170, Brazil, (2) Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599, United States

Several non-cardiovascular drugs have been withdrawn from the market due to their critical side effect of inhibiting the human ether-à-go-go related gene (hERG) K+ channels, which may lead to heart arrhythmia and death. Thus, hERG safety testing is an indispensable process that is required by the US FDA. There is considerable interest in developing computational tools to filter out potential hERG blockers in early stages of drug discovery. In this work, we describe the development of a new tool for the rapid identification of potential cardiotoxic compounds by hERG inhibition. We have compiled the largest publicly available dataset of hERG binding, containing 11,958 compounds from the ChEMBL database. Once curated, this dataset contained 4,980 compounds for modeling. Several types of QSAR models have been developed and validated according to the OECD principles. The external classification accuracies discriminating blockers from non-blockers were 0.83-0.93 on external set. Model interpretation revealed several SAR rules, which can guide structural optimization of some hERG blockers into non-blockers. Virtual screening of the WDI chemical library using selected QSAR models identified 4,945 compounds as potential hERG blockers. The developed models can reliably identify blockers and non-blockers, which could be useful for the scientific community. A freely accessible web server has been developed allowing users to identify putative hERG blockers and non-blockers in chemical libraries of their interest (http://labmol.farmacia.ufg.br/predherg).


41 Mining chemical space for novel molecules: A graphical tool for working with fragment spaces

Florian Lauck, lauck@zbh.uni-hamburg.de, Matthias Rarey. Universität Hamburg - Center for Bioinformatics, Hamburg, Germany

Mining chemical space for novel molecules with desirable properties proves difficult. The collective strategy is to limit search space so that it is easier to consider only those molecules with suitable physicochemical and topological properties. One therefore requires methods and data structures for efficiently modeling this chemical subspace, as well as user-friendly tools to access this functionality. Here, we present a graphical user interface for creating, manipulating, and searching large chemical space. As a model we use a fragment space, i.e., a combinatorial chemical space consisting of molecular fragments and connection rules. Each fragment has at least one reaction site that corresponds to an open valence. These sites are modeled as artificial atoms with a defined type, called link atoms. The connection rules determine compatibility of such link atoms. When two fragments are connected, the link atoms are removed and a bond in accordance with the connection rule is introduced. A number of algorithms and strategies for working with fragment spaces have been developed in the past but were only available as separate command-line tools. Our new software combines these tools into one user-friendly application, able to visualize the contents of a fragment space (fragments and connection rules), as well as search results (novel molecules). For generating fragment spaces, an automated approach was incorporated using a set of molecules and cut rules, i.e., SMARTS pattern that define where a molecule should be cleaved and link atoms should be introduced. For retrieving molecules with structural similarity we provide two query-based search methods utilizing molecular similarity (reduced graph descriptors) and substructure search (SMARTS pattern matching). In addition, a new approach for constraint-based enumeration complements these algorithms, allowing a search based on physicochemical properties rather than structural similarity. Finally, specialized fragment spaces can be created by filtering fragments via numerous properties.


42 BCL::EvoGen: An evolutionary algorithm for focused library design

Alexander R Geanes, alexander.r.geanes@vanderbilt.edu, Edward W Lowe, Jens Meiler. Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States

In recent years, virtual high-throughput screening (vHTS) techniques have been successfully applied in the drug discovery process. In many cases, these vHTS techniques are leveraged to prioritize subsets of chemical libraries for acquisition and testing in physical screens. However, for computer-aided drug design (CADD), it is advantageous to have algorithms which are capable of designing new chemical entities for a specific biological target. An evolutionary algorithm was implemented as part of the BCL::ChemInfo suite within the Biochemistry Library (BCL), a C++ library developed at Vanderbilt University, to iteratively generate chemical species with high predicted biological activities for use in focused library design for hit-to-lead optimization. Quantitative structure activity relationships based on machine learning techniques were used to predict the biological activity of compounds in each generation. The compounds with the highest predicted activity, as well as a smaller number of lower activity species, were subjected to combination, crossover, and mutation to form the subsequent generation. Termination criteria were based on a percentage of compounds in a single generation achieving a predicted activity above a pre-determined cutoff, or after reaching a pre-set number of generations in the event the first criterion could not be satisfied. This method was benchmarked using a previously published set of 9 datasets designed for the validation of novel CADD methods. Each of the datasets was compiled from publicly available HTS data taken from PubChem, and contain a minimum of 150 active compounds each. In addition, the datasets span a range of protein targets including GPCRs, ion channels, and enzymes. Here we present the results of this focused library design application, BCL::EvoGen.


60 InChIKeys as chemical entity ids to enable in-context text indexing and to identify engine-ranked chemically similar documents

Stephen Boyer, sboyer@us.ibm.com, Tom Griffin, Cassidy Kelly, Eric Louie, Jacques Labrie, Scott Spangler, Ying Chen, Ru Fang, Su Yan. IBM Almaden Research Center, San Jose, CA 95120, United States

Chemical name annotators that find and standardize chemical names in scientific papers and patent text documents usually generate tables of SMILES or InChIs. In order to make the most use of identified compounds for text analytics, we convert the found compound names into InChIKeys and insert the InChIKey values into the original document text beside the compound name, along with the literal word "inchikey" as an entity type marker. The augmented documents are then indexed as full text using Solr/Lucene resulting in a search service with useful capabilities such as co-occurence analysis, e.g., finding all cases of any text synonym of aspirin within 10 words of cancer, finding cases of any chemical compound (any entity marker "inchikey") within 10 words of "produced" AND within 10 words of "yield". This indexing technique also supports finding chemically similar documents. The selected compounds' InChIKeys are entered together as a set of query "words", returning an engine-ranked list of documents with the most compounds overlapping the target's. Full engine capabilities such as boolean filter terms are still available to refine the set with full engine performance. The indexed augmented text also facilitates other data mining capabilities such as batch searches for large numbers of molecules over millions of documents. Our team is developing a comprehensive chemical pedia containing structure representations and attributes for millions of molecules derived from patents and other sources such as Medline Abstracts. On-going work includes entity insertions simultaneously performed on gene, drug and disease annotation types, enabling rich entity/text combination queries with no search engine modifications needed.


75 Benchmark study: Structural similarity search methods for identifying read-across analogs

Jie Shen, jshen@rifm.org, Research Institute for Fragrance Materials, Inc., Woodcliff Lake, New Jersey 07677, United States

As alternative approaches to fulfill data gaps in chemical safety assessment, in silico methods, such as read-across and quantitative structure activity relationship (QSAR), have generated much attention. Such approaches have been used in ECHA (European Chemicals Agency) submissions of chemical dossiers for REACH (Registration, Evaluation, Authorisation and Restriction of Chemicals). Searching and identifying suitable structural analogs for the chemical of interest is prerequisite for read-across. Although several methodologies and programs are available for analog searching, there are limited practical strategies or guidance for conducting analog search. This work compared several popular analog search programs using 3 different fragrance materials (as indicated in Fig. 1) reported in a published paper by Blackburn et.al.1 As shown in Fig. 1, for each material, 8 different analog search methods from 5 different programs were conducted. The top 100 analogs with Tanimoto similarity2 more than 0.5 were compared with the suitable analogs reported in Blackburn et al.'s paper.1 As shown in Fig. 1, Pipeline Pilot with FCFP4 fingerprint outperforms other methods in all 3 cases. Most of the methods have higher enrichment rate for lauryl alcohol, and have lower recovery rate for 2, 6-Dimethyl-4-heptanone. Based on this and our experience, Pipeline Pilot (FCFP4/1024 bits) is primarily used to identify suitable analogs.

Fig. 1. The enrichment rate of suitable analogs identified for 3 fragrance materials using 8 different search methods from 5 different programs. Top 100 analogs with a Tanimoto score more than 0.5 were compared with the analogs reported by Blackburn et.al.1
1. Blackburn, K.; Bjerke, D.; Daston, G.; Felter, S.; Mahony, C.; Naciff, J.; Robison, S.; Wu, S., Case studies to test: A framework for using structural, reactivity, metabolic and physicochemical similarity to evaluate the suitability of analogs for SAR-based toxicological assessments. Regulatory Toxicology and Pharmacology 2011, 60 (1), 120-135. 2. Rogers, D. J.; Tanimoto, T. T., A Computer Program for Classifying Plants. Science 1960, 132 (3434), 1115-1118.


76 Activity cliffs on chemotype-based activity landscapes

Jaime Pérez-Villanueva1, jpvillanueva@correo.xoc.uam.mx, José L. Medina-Franco2, Oscar Méndez-Lucio3, Olivia Soria-Arteche1. (1) Departamento de Sistemas Biológicos, Universidad Autónoma Metropolitana - Xochimilco, Mexico, DF 04960, Mexico, (2) Department of Research, Mayo Clinic, Scottsdale, Arizona 85259, United States, (3) Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire CB2 1EW, United Kingdom

Activity landscape modeling methods are useful tools to analyze structure-activity relationships (SAR). Among them, structure-activity similarity (SAS) maps are important representations to aid the SAR characterization of molecular databases. SAS maps are constructed by plotting pairwise comparisons of the structural similarity against the absolute value of the activity difference (or activity similarity). These maps can be employed to detect activity cliffs (compounds where small structural changes lead to large activity differences) and activity cliff generators (molecules that form an increased number of activity cliffs compared with other molecules in the dataset). In addition, pairs of compounds in the smooth regions of the SAR and scaffold hops are also easily identified in these maps. Herein we describe the activity cliff analysis of three molecular databases employing SAS maps that integrate chemotype information. The results reveal different tendencies in local landscapes for subsets of molecules classified in the most frequent chemotypes. Also, quantitative characterization of activity cliffs for compound pairs that fall in each chemotype class was carried out employing the Activity Cliff Enrichment Factor (ACEF), a new index proposed in this work. Finally, chemotype classes enriched in activity cliffs are discussed and analysed employing the activity cliff generator concept. This approach helps to analyse the behaviour of each chemotype in a given activity landscapes and to identify scaffolds with high probability to form activity cliffs.


77 Uncovering activity cliff-forming compounds using SALI values

José L Medina-Franco1, MedinaFranco.Jose@mayo.edu, Kai Fan Cheng2, Mingzhu He2, Yousef Al-Abed2, Nathalie Meurice1. (1) Department of Research, Mayo Clinic, Scottsdale, Arizona 85259, United States, (2) Center for Molecular Innovation, Feinstein Institute for Medical Research, Manhasset, New York 11030, United States

Activity cliffs have a significant impact on a number of tasks relevant in medicinal chemistry and chemoinformatics such as lead optimization, development of methods to predict the biological activity and selection of queries for similarity searching. Therefore, the identification of compounds highly associated with activity cliffs in biological data sets i.e., 'activity cliff generators', is of prime significance [1]. The identification of activity cliff generators based on Structure-Activity Similarity (SAS) maps and frequency counts has been reported [1]. Herein we propose two complementary approaches to identify activity cliff generators based on the distribution of the Structure-Activity Landscape Index (SALI) values [2]. To illustrate the approach, we discuss the SAR and the activity cliff generators identified in screening data sets from our effort to identify inhibitors of the migration inhibitory factor MIF. [1] Méndez-Lucio, O. et al. Mol. Inf., 2012, 31, 837. [2] Guha, R and Van Drie, JH. J. Chem. Inf. Model., 2008, 48, 646.


78 Unified modeling language schema for scientific and technical data and information

Donald R Burgess, dburgess@nist.gov, Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, United States

The overall goal of this project was to develop a prototype for a web-accessible online editable database for atmospheric and combustion chemistry consistent with existing workflow/formats used by national evaluation panels. The chemical informatics and IT tasks were to modernize legacy implementations, while maintaining and codifying well-established workflows that meet the scientific and technical needs of experts. Many of the databases are legacy print databases produced by IUPAC, NASA/JPL, and other committees and provided in journal articles and reports. One goal is to move toward a dynamic, regularly updated evaluated electronic database, rather than the multistep manual process that exists now. In our presentation, we will provide an overview of the needs and requirements and present a Scientific and Technical Data and Information Unified Modeling Language (SDTI-UML) schema that we have developed which provides a framework for efficient collection, rapid dissemination, identification and indexing, and facile discovery of data and information. The UML developed extends down from root Libraries containing wrapped objects Folders or Documents which contain different classes of Content and Primitive Content (documents, blobs, tables, etc) each tagged with descriptive metadata for indexing and relational associations. The UML includes the ability to included chemical information and bibliographic information with each object.


79 How to communicate effectively about chemicals with non-chemists

Chad A. Jones1, chemist.jones@gmail.com, Janet D. Stemwedel2, Raychelle Burks3,4, Brandi VanAlphen6. (1) Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT 84602, United States, (2) Department of Philosophy, San Jose State University, San Jose, CA 95112, United States, (3) Department of Chemistry, Doane College, Crete, NE 68333, United States, (4) Center of Nanohybrid Functional Materials, University of Nebraska - Lincoln, Lincoln, NE 68508, United States, (6) Unaffiliated, United States

The fear of chemicals and the chemical industry is widespread among the general public and the media. How should chemists best attempt to remedy this? What techniques are effective in making scientists influential among non-scientists? The use of factual information to counter scientifically incorrect statements is satisfying for scientists. However, this 'deficit model' has been shown to be a poor means of actually changing public opinion, broadly or on an individual level. Chemists should shift away from the "facts and lecture" deficit model approach to a conversational approach that seeks to listen to and understand their audience. Knowing their audience's cultural context and building trusting relationships have been shown to be more influential techniques. For chemists, sharing details about their personal backgrounds and training may be helpful in establishing that chemists are not only knowledgeable, but trustworthy and empathetic about public concerns. Through these methods, chemists can change minds and spread their influence more effectively about chemicals, counter misinformation and share our love of chemistry.


80 Activity landscape modeling of AKT/PKB inhibitors

Pedro J. Trejo-Soto1, piter_jo@comunidad.unam.mx, Rodrigo Aguayo-Ortiz1, Oscar Méndez-Lucio2, Alicia Hernández-Campos1, Rafael Castillo1. (1) Pharmacy Deparment, Universidad Nacional Autónoma de México, Mexico city, Mexico, (2) Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, Cambridge, United Kingdom

The AKT kinase, also known as protein kinase B or PKB, is a serine-threonine kinase that phosphorylates other proteins and molecules that act as second messengers, which are involved involved in important cellular processes, such as cell proliferation and growth, signaling, absorption and metabolism of glucose. AKT is frequently amplified and over-expressed in human cancer cells, therefore its inhibition is an atractive therapeutic target in anticancer drug development. There are three known subtypes, AKT-1, AKT-2 and AKT-3. Each subtype is associated with different types of cancers. In particular, AKT-2 is amplified in pancreatic, breast and ovarian tumors. AKT-3 is over expressed in breast and prostate cancers, the presence of AKT-1 in tumors is less common. Small molecules targeting the ATP-binding site have been reported; however, it is important to study the factors that determine the selectivity of these inhibitors in each subtype of AKT, also to determine the factors that favor the multitarget activity. Herein, we present a systematic characterization of the SAR of 50 compounds screened against the three AKT subtypes. The activity landscape across multiple AKT subtypes was characterized using the dual and triple activity difference (DAD/TAD) maps. We also employed SAS maps to systematically identify and analyze the activity cliff generators present in the dataset. Analysis of pairs of compounds with high structure similarity revealed the presence of single-, dual, and pan-receptor activity cliffs. Moreover, single-target and dual-target activity cliff generators for AKT-1/AKT-2, AKT-2/AKT-3 and AKT-1/AKT-3 were identified. In addition, docking studies were performed with the compounds that were classified as cliff generators in order to analyze the factors that are involved in the ligand-enzyme interactions. The analysis of the chemical structures in this work points to specific structural features that are helpful for the design of new AKT inhibitors.


81 Pursuit of activity enrichment in phenotypic screening using chemotype-driven approaches

Joachim L Petit, petit.joachim@mayo.edu, José L Medina-Franco, Nathalie Meurice. Department of Research, Mayo Clinic, Scottsdale, AZ, United States

At the biology-chemistry interface, phenotypic and functional screening campaigns aim at identifying small molecules modulating biological pathways and/or processes relevant to disease, without target knowledge or bias. Activity landscapes are generated that challenge the classical concept of structure-activity relationships. Activity classes are primarily defined based on bioprofile similarity to bioactivities of reference controls, and active classes include groups of hits that affect biological pathways similarly but are not necessarily similar in structures. In this context, the transition of actives in biological space to confirmed hits and early leads in chemical space requires a careful investigation of chemotype organization across activity classes. And to advance drug discovery programs, the value of chemotype-based analysis in driving hit enrichment through iterative screening and assay workflows has to be reassessed. Herein we discuss the significance of chemotype-based approaches to drive activity enrichment in iterative phenotypic screening. To illustrate the approach, we discuss chemotypic classes in light of bioprofile similarities and activity enrichment identified in in-house screening data sets resulting from our effort to identify novel therapeutics for Multiple Myeloma.


116 DNA methyltransferase Dnmt1: Regulation and novel drug-design strategies

Zeljko M Svedruzic1, zsvedruz@biol.pmf.hr, Patrik Nikolic2. (1) Faculty of Medicine, Biomedical Technology, University of Rijeka, Rijeka, Croatia, (2) Faculty of Medicine, Biomedical Technology, University of Rijeka, Campus Trsat, Radmile Matejčić 2, Rm 823, Rijeka, Croatia

DNA methyltransferase Dnmt1 is the key enzyme in functional organization of the human genome. For almost 30 years, Dnmt1 has been unsuccessfully targeted in different drug-design efforts. Most of those failures can be attributed to inadequate understanding of diverse mechanisms that regulate the enzyme activity. Dnmt1 is a large enzyme with multiple flexible domains and phosphorylation sites. Dnmt1 can functionally interact with about 40 different molecules involved in DNA repair, chromatin organization, or RNA-directed-DNA methylation. The interacting molecules include proteins, poly(ADP-ribose), and methylated and unmethylated single-stranded and double-stranded non-coding RNA molecules. We are developing inhibitors and activators of Dnmt1 using a combination of enzyme assays, computational structural analysis, and numerical evaluation of enzyme activity. Knowledge of dynamic reversible processes in the active site of Dnmt1 were used to create novel mechanism-based inhibitors that do not cause DNA damage. Molecular docking and QM/MM MD simulations were used to optimize inhibitors' geometry, flexibility, interaction with the solvent in the active site, and finally formation of a reversible covalent adduct with the enzyme. Different strategies for optimization of the PK/PD properties have been suggested. A dynamic active site loop has been identified as a part of the lock mechanism that controls allosteric regulation and processivity. Enzymatic assays and numerical methods have been developed to study processivity on the DNA substrate and the affinity for binding at the allosteric site. The presented approach allows screening for specific allosteric activators and inhibitors.


124 Scilligence's ELN for research laboratories and academia

Rajeev Hotchandani, rhotchandani@scilligence.com, Scilligence, Burlington, MA 01803, United States

The dawn of digital age and innovative companies like Google have made access to information much easily in our daily life. However, various laboratories and academic research is lagging behind in this trend. Although a trend has started in academia research groups and organic chemistry labs, a lot of these institutions are still using paper notebooks to document their everyday work. This makes knowledge preservation and sharing much more difficult. Through case study examples in this presentation, knowledge preservation, IP protection and team collaboration can be made efficient by Scilligence informatics in your laboratory environments.


131 Multicriteria drug discovery: Using CoMFA models to drive target specificity

Lei Wang, lei.wang@certara.com, Brian Masek, Fabian Boes, Bernd Wendt, Stephan Nagy. Certara, St. Louis, MO 63101, United States

A successful drug candidate will not only need to overcome ADME, physical and safety properties, but also often need to achieve a selectivity profile against related targets. The approach we present here combines a de novo design method with powerful ligand-based scoring function that consists of three different 3D QSAR models for different targets to generate selective inhibitors. A training set was developed with literature and public information where 50 molecules with activities for FactorXa, Trypsin and Thrombin were selected. Three different Topomer CoMFA models were built using the training set. By combining predictions of the activity profile with different weights and penalty score, a scoring function was created that can drive the invention of new and selective compounds.

Tuesday, August 12, 2014

The Herman Skolnik Award Symposium - AM Session
The Publication Chain Revisited: Old Links and New Connections

Palace Hotel
Room: Presidio
Andrea Twiss-Brooks, Organizers
Andrea Twiss-Brooks, Presiding
9:15 am - 12:00 pm
9:15 Introductory Remarks
9:20 82 Evolution and transformation of journals in a digital environment

Grace Baysinger, graceb@leland.stanford.edu, Swain Chemistry and Chemical Engineering Library, Stanford University, Stanford, CA 94305, United States

During the past two decades, STM journals have migrated from print to online. Manuscript submission systems, author tools, publication cycles, peer review, IP rights and subscription management, content delivery, user services, discovery tools, and preservation methods have all undergone rapid transformation. This presentation will review changes that have occurred in journals and expected trends for them in the near-term future.

9:50 83 What do the readers think? The role of the end user in the evolution of publications

Andrea Twiss-Brooks, atbrooks@uchicago.edu, John Crerar Library, University of Chicago, Chicago, IL 60637, United States

Research publications are ultimately intended to be read by the scientists. What do these scientists want and need in a publication? How do they gather their information and decide what to read? When and how much do they read? User studies of various kinds have been done to try and find answers to these questions. The results of these studies are used by libraries to make decisions about collections and services, as well as by publishers and vendors to create better products related to scientific publication. A brief overview of selected user studies and methods will be presented, including a description of and preliminary results from a recent study that included University of Chicago students.

10:20 84 Context and connection: Visualizing library data

Martin Brändle, mpbraendle@gmail.com, Unaffiliated, Switzerland

Visualization leverages our ability to find patterns and connections inside the complex data networks that exist in library catalogs, document and data repositories. This talk aims to present an overview of some of the initiatives libraries have taken to visualize and let users explore their data, and discusses open source tools and requirements.

10:50 Intermission
11:00 85 Predicted future of science and scientific publishing within the framework of a changing world

René Deplanque, rene@deplanque.de, Technical University Berlin, Germany

Abstract text not available.

11:30 86 Chemical Publications revisited

Guido Herrmann, Guido.Herrmann@thieme.de, George Thieme Verlag KG, Stuttgart, Germany

Thieme has been a chemistry publisher since 1909. The basic categories and formats (i.e., journals, reference works, encyclopedia, monographs and textbooks) have proven to be very robust and stable for more than a hundred years. In contrast the published formats (digital vs. print), the user expectations, the production processes, and the distribution channels have seen significant change over the last decade. Similarly the basic role of a publisher has remained, however the actual operations and production processes have changed drastically. The talk will look into these processes and will highlight one important and distinct aspect of chemical publishing: the strong ties and connections between the full text information and the information embedded in the chemical structures and reactions. In this area significant progress has been made over the last few years.

Tuesday, August 12, 2014

The Herman Skolnik Award Symposium - PM Session
The Publication Chain Revisited: Old Links and New Connections

Palace Hotel
Room: Presidio
Andrea Twiss-Brooks, Organizers
Andrea Twiss-Brooks, Presiding
2:00 pm - 5:05 pm
2:00 87 CAS databases: Reflecting the changes in how chemical discoveries have been disclosed

Matthew J. Toussant, mtoussant@cas.org, Chemical Abstracts Service, United States

Chemical Abstracts Service (CAS) has always been a leader in providing scientists with access to chemical information. This presentation will illustrate how CAS has adapted to the phenomenal growth in chemical information being published today. Beginning in 1907 with a group of worldwide abstractors indexing the journal literature, CAS extended its reach into the patent world. Adding other sources of chemical information, like dissertations, meeting abstracts, and conference proceedings, CAS now covers more than 1000 ahead of print journals, valuable web sources, commercial chemical suppliers and regulatory inventories to create its databases. Starting with the CAS REGISTRY in 1965, CAS has leveraged rapid changes in technology and evolving sources of disclosed chemistry, to fulfill its mission to provide the world's best digital research environment to search, retrieve, analyze and link chemical information. This presentation will cover some ways in which CAS has kept pace with the worldwide growth in disclosed chemistry and will close with some predictions about the future of chemical information.

2:30 88 InChI & the publication and information chain

Stephen Heller, steve@hellers.com, BMD, NIST, Gaithersburg, MD 20899-8362, United States

The International Chemical Identifier (InChI) chemical structure standard is starting to play a major role in changing the links and connections to chemical information. In the past few years, for the first time any journal article can freely add an InChI which will be same InChI that any other article, database, chemical catalog, and so on, can also have enabling linking and connections to the vast amounts of chemical information and data available to help and support basic and applied research and development in the chemical, biological, and medical sciences. This presentation will describe the past, present, and future of how InChI is the key to linking information.

3:00 89 From virtual communities to Web 2.0, social media, and beyond: A publishing perspective

Wendy Warr, wendy@warr.com, Wendy Warr & Associates, Cheshire, United Kingdom

Since 1992 the World Wide Web has radically changed the way that we live and work. For most of us, living without the Internet would be like living without water or electricity. As research has become increasingly collaborative, possibilities for communication and collaboration on the Web have also increased. Virtual communities in science such as EiVillage and BioMedNet began to spring up in the 1990s. The earliest virtual community in chemistry, ChemWeb.com, was announced in August 1996 and launched in April 1997. At its peak it offered access to books, journals and databases, and member-generated content, together with discussion groups, virtual conferences and a chemistry preprint server. Eventually its owners started to lose interest; if only they had known what would follow with the advent of Web 2.0 in 2004. The world has since moved on from “e-everything” (e-mail, e-journals, e-commerce) to mobile technologies and “i-everything”, and experimentation in publishing has moved on too. An example is ScienceOpen, an open access scholarly publisher with a new network-based approach, and the mantra “access, network, organize, publish”. It was launched at the end of 2013 with more than one million freely accessible papers in multiple disciplines. It offers authors tools to collect feedback in one place, manage draft versions and share files, to make collaborating on a paper together easy. Its scientific network forms the basis for public, post-publication peer review. ChemWeb.com was ahead of its time. Is the chemistry world now ready for new ventures such as ScienceOpen? In the world of open access, open data and open science what might happen next?

3:30 Intermission
3:40 90 Digital transformation - the long and winding road

David Evans1, david.evans@REEDELSEVIER.CH, Pieder Caduff2, Jürgen Swienty-Busch3. (1) Reed Elsevier Properties SA, Neuchâtel, Switzerland, (2) Reed Elsevier Properties SA, Switzerland, (3) Elsevier Information Systems GmbH, Germany

Abstract text not available.

4:10 91 Of a landmark total synthesis yet unpublished in full experimental detail – vitamin B12

Engelbert Zass, zass@chem.ethz.ch, Laboratory of Organic Chemistry, ETH Zürich, Zürich, Switzerland

While publication habits have changed since then, for total syntheses in the last century one could reasonably expect publications in journals with full experimental details. There were important exceptions, however, even then: most remarkably, the total synthesis of vitamin B12, considered a landmark in organic synthesis, involving the research groups of R.B. Woodward and A. Eschenmoser with about 100 chemists participating. While almost all of the ETH work was done by Ph.D. students, with their theses fully published already at that time in print (later also electronically), the Harvard part was done exclusively by postdoctoral fellows whose detailed reports are (as usual) not publicly available. An attempt to publish the full details beyond published lectures and summaries about the syntheses was already started at ETH in 1980, but had to be given up in 1986. A progress report is presented, and the intentions why and how to resume this publication project more than 40 years after completion of the syntheses will be discussed.

4:55 Award Presentation

Wednesday, August 13, 2014

It Takes Two To Tango: Chemistry Librarians Partnering with Publishers and Researchers To Advance the Chemical Sciences - AM Session
Symposium in Honor Dana Roth

Palace Hotel
Room: Marina
Ted Baldwin, Judith Currano, Organizers
Judith Currano, Ted Baldwin, Presiding
8:35 am - 12:00 pm
8:35 Introductory Remarks
8:40 92 Chemical information ecosystems at research universities: Users, libraries, and information providers

Grace Baysinger, graceb@stanford.edu, Swain Chemistry and Chemical Engineering Library, Stanford University, Stanford, CA 94305-5081, United States

Librarians work closely with users to help meet their research and instructional information needs. Librarians also work in close partnership with information providers. Discussions may cover a full range of the life cycle for a product – development, interface design, licensing and pricing, usage data, training and support as well as marketing. Venues for discussions may be one-on-one meetings, advisory groups, surveys, email exchanges, webinars, etc. This talk will highlight the value of these partnerships and the value that they ultimately bring to users.

9:00 93 “What happened to my library?” revisited

Susanne J Redalje, curie@u.washington.edu, University Libraries, University of Washington, Seattle, Seattle, Washington 98195, United States

A session at the ACS Spring National Conference, 2010, held in San Francisco, focused on the impact of the trend of closing branch libraries. The University of Washington Chemistry Library, along with several other UW science branches, was amongst those that closed. Collections were merged into a much larger library covering many disciplines, no longer providing a clear location for users, physically or virtually. It is now 5 years later. The Library world and technology continue to change, providing new options and new challenges. This paper will examine the continuing impact of these closures on users and address the changes in outreach that are required both by the closures and the ever changing expectations of the users. A variety of methods have been used to date, including posters in the reference area which focus on science topics which were both to highlight the chosen topics and also to say clearly `we are here'. Focus groups and surveys have been used to determine user needs, many focusing on graduate students who were particularly hard hit by the changes. Communication continues to be a main issue, complicated by a major increase in interdisciplinary research groups and centers.

9:20 94 Products, policies, pricing: How collaboration with libraries has shaped the evolution of ACS Publications 3Ps

S. Sara Rouhi, s_rouhi@acs.org, Library Relations, ACS Publications, Washington, DC 20009, United States

ACS Publications collaboration with libraries is ongoing and evolving. This presentation will outline the evolution of the ACS Librarian Advisory Board (now ACS Academic Roundtable); our collaborations with libraries to develop ACS on Campus; our ongoing library summit strategy; the role of librarian input in refining ACS Value-Based pricing and package options, and in providing insight into shared challenges, such as Open Access, Data Management, Library Systems, etc.. This presentation will feature representatives from ACS Publications sales and library relations teams as well as librarian partners we've had the privilege of collaborating with over the years. Dana Roth has played a crucial role with his insight and frank feedback on all our initiatives. We look forward to participating in this well-deserved recognition!

9:40 95 New dance steps: Emerging areas connecting us to our researchers

Ted W. Baldwin, BALDWITW@UCMAIL.UC.EDU, Science and Engineering Libraries, University of Cincinnati, Cincinnati, OH 45221, United States

The University of Cincinnati's Science and Engineering libraries are moving ahead in a number of exciting and strategic ways, in partnership with researchers in chemistry and related sciences, as well as with others across the campus. This talk will highlight our progress in these new directions, and how we are establishing new relationships with our researchers. For physical spaces, including collections and technology, changes are underway to meet emerging research needs and to increase the library's perceived value. New types of staffing, including a science informationist, increase our capacity for specialized consultation and instruction directed at the workflows of chemistry researchers. Finally, our participation in a multi-institutional shared development process for a new digital repository (based on the Hydra framework) is opening up new channels for dialogue with researchers.

10:00 Intermission
10:15 96 Academic libraries and CAS: A match made in heaven

Roger Schenck, rschenck@cas.org, Marketing, Chemical Abstracts Service, Columbus, Ohio 43202, United States

As a division of the ACS, fostering academic research is core to CAS's mission. In the endeavor to meet this goal, CAS has built and sustained long-lasting, beneficial relationships with academic libraries, and academic librarians, around the world in its efforts to provide the most current, comprehensive and complete chemical and related information for scientific discovery. Going well beyond simply using CAS's products and services, from printed Chemical AbstractsTM to CA on CDTM to STN® to SciFinder®, academic libraries and their librarians and education professionals collaborate closely with CAS to develop and enhance CAS services that more efficiently enable academic research and scientific discoveries. From helping CAS to promote the FUTURE LEADERS in Chemistry initiative, to training students and faculty, to spreading the word on important chemistry developments in CAS products, academic librarians have been an invaluable resource for CAS. In this presentation, we will explore the many ways in which CAS's successful collaboration with academic libraries has helped to ensure CAS products and services remain timely and relevant.

10:35 99 Evolving library services in the ever-changing world of chemical information: From printed to electronic to networked

Donna T. Wrublewski, dtwrublewski@library.caltech.edu, George S. Porter, Joy Painter, Kristin Buxton, Lindsay B. Cleary. Caltech Library, California Institute of Technology, Pasadena, CA 91125, United States

Access to chemical information has evolved dramatically over the past 50 years, from printed to electronic to networked. Channeling and focusing this torrent of information to meet the specific needs of researchers has increased exponentially over time, and has required librarians to become beta testers, instructors, and negotiators. Dana Roth's contributions to chemical information, and to the field of librarianship as a whole, cannot be overestimated. This talk will highlight key contributions that Roth has made to the relationships between the Caltech Library and publishers, and between the Library and the campus research community. These relationships continue to fluorish and expand, and examples of past efforts and current projects from across the Caltech Library will be discussed.

10:55 98 Navigating chemistry requirements for data management and electronic notebooks: A case study

Leah R McEwen1, lrm1@cornell.edu, Antony J Williams2, Valery Tkachenko2, Jeremy G Frey3, Simon J Coles3, Aileen E Day2, Cerys Willoughby3, William R Dichtel1. (1) Cornell University, Ithaca, NY 14853, United States, (2) Royal Society of Chemistry, United Kingdom, (3) University of Southampton, United Kingdom

Research data management is an ever-present and growing concern for academic chemistry laboratories, libraries, scholarly societies and publishers. Coordinating data management requirements with electronic notebook platform development is an excellent opportunity to consider the process of articulation and resolution among disparate perspectives in depth. When the project group is separated by over three thousand miles and includes an academic team developing an open source ELN, a cheminformatics team developing spectral handling capability and an organic chemist with over a dozen students who will ultimately utilize the platform, onsite management and coordination of the needs, training and information flow is critical. This presentation will report observations from a case study involving developers, user scientists and the role of an onsite librarian in coordinating and managing stakeholder objectives.

11:15 97 "Dancing" lessons: Teaching non-chemist librarians to communicate with chemists

Judith N. Currano, currano@pobox.upenn.edu, Chemistry Library, University of Pennsylvania, Philadelphia, PA 19104-6323, United States

Establishing relationships with the scientists who use one's facility is key to providing chemical information services to researchers; however, many new chemistry librarians and library liaisons have little or no formal training in the discipline that they support. Since the late 1990s, the Chemistry Division of the Special Libraries Association (SLA) has provided support and training to inexperienced chemistry librarians through continuing education course offerings. This presentation describes the history and evolution of the Chemistry for the Non-Chemist Librarian course and gives glimpses into ways that research chemists can work with their non-chemist librarians to increase their comfort with the language of chemistry.

11:35 Discussion
11:55 Concluding Remarks

Wednesday, August 13, 2014

Inspiring the Next Generation To Pursue Computational Chemistry and Cheminformatics - AM Session

Palace Hotel
Room: Presidio

Antony Williams, Leah McEwen, Organizers
Antony Williams, Leah McEwen, Presiding
8:20 am - 11:05 am
8:20 Introductory Remarks
8:25 100 Examples of how to inspire the next generation to pursue computational chemistry/cheminformatics

Sean Ekins1, ekinssean@yahoo.com, Alex M Clark2. (1) Collaborations in Chemistry, Fuquay Varina, NC 27526, United States, (2) Molecular Materials Informatics,, Montreal, Quebec H3J 2S1, Canada

If we are to have a future generation of scientists that focus on computational chemistry or cheminformatics we need to consider what we can do to promote the topic. Many people arrive or are drawn to the field from outside, they have a chemistry, biology, physics, information technology or other background. What can we do to bring to the field others so that we have a diverse ecosystem. This in turn may provide us with new ideas, technologies and approaches. If we are to prevent stagnation we need to consider what methods we can use to attract new students. Several examples will be provided of ways in which we can do more to inspire the next generation to pursue computational chemistry.

8:50 101 Implementing an interactive cheminformatics course for the acceleration of graduate chemical research

O. Maduka Ogba, ogbao@onid.oregonstate.edu, Paul H.-Y. Cheong. Department of Chemistry, Oregon State University, Corvallis, Oregon 97331, United States

We have established a cheminformatics course at Oregon State University. This project-based, individualized course was designed to train chemistry researchers with programming and informatics tools to automate and accelerate research projects. First third of the course was focused on a top-down approach of teaching students the basics of structures and algorithms necessary to mine large amounts of chemical data. Second third was focused on interactively learning the Python programming language through hands-on sessions whereby the participants generated and refined scripts with the instructor and their colleagues. The last third was focused on bottom-up approach whereby the students planned, designed, and coded a Python software to achieve automation and acceleration of their own research. The lectures, assignments, and topics were tailored to fit each student's research interests based on three anonymous evaluations by the students. This individualized learning experience was key to the success of the course, and the projects completed in this course are currently being utilized rigorously to expedite scholarly research.

9:15 102 Leveraging history to develop community sourced best practices in chemical information systems development: An interactive online teaching module

Leah McEwen1, lrm1@cornell.edu, Evan Smith2. (1) Cornell University, United States, (2) Princeton University, United States

As digital research tools and cheminformatics methods attain ever-greater prominence in 21st-century chemistry, chemical information literacy has become a pressing challenge. The chemistry research community has engaged in more than a century of capturing, representing, organizing and re-using chemical principles, data and expert interpretation through IUPAC and other national and international initiatives. The history of chemical information and machine documentation is an enormous untapped resource in better understanding current challenges in the design of chemical information systems and chemistry educational programming. The authors are engaged in developing interactive online teaching modules using change over time in systematic nomenclature and machine representation. These tools will help chemistry students unlock the power of information regimes as scientific instruments and directly engage in mapping historically-informed best practices in chemical information notation.

9:40 Intermission
9:55 103 Who knew I would get here from there: How I became the ChemConnector

Antony J. Williams, tony27587@gmail.com, Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina 27587, United States

As the ChemConnector and one of the people responsible for creating ChemSpider I have become well known in the chemical information world. I get to connect with some of the brightest minds in the domain, have participated in meetings with Microsoft, Google, various government institutions and with some of the top chemical and pharmaceutical companies. While my path to this point feels at some times like a random walk, and certainly I never set a trajectory to become a chemical information specialist as I am a spectroscopist by training, I look back upon my career with a lot of satisfaction. We are now at a time when information is becoming so important to support and guide the discovery process. At the same time the opportunity to communicate your science to the masses is upon us. I will discuss how I became the ChemConnector, my Twitter handle, and why I believe entering the field of cheminformatics at this time holds incredible potential for influence.

10:20 104 Chemical Informatics Project >inspires >chemistry majors

Stuart Chalk, schalk@unf.edu, Department of Chemistry, University of North Florida, Jacksonville, FL 32224, United States

In Fall 2013 students in the Chemical Information Science course at the University of North Florida were given a final project to generate a database (as a group) of chemical information that would be useful to chemistry majors. Students were required to find a dataset, download and import it into Excel, clean up the data, and organize and annotate the data and metadata using an Excel template prior to ingestion into a MySQL database. The presentation will talk about the technical approach to implementing this project and a review of its potential to open the eyes of chemistry majors in this semantic world.

10:40 105 Enabling cheminformatic exploration at the academic level: Enterprise tools, academic availability

Erin Bolstad, erinbolstad@gmail.com, ChemAxon, Budapest, Hungary

Wednesday, August 13, 2014

The IUPAC Solubility Data Series: 100 Volumes of Solubility Data Online - PM Session

Palace Hotel
Room: Marina
Cosponsored by ANYL, HIST

David Martinsen, M. Clara Magalhães, Organizers
David Martinsen, M. Clara Magalhães, Presiding
1:30 pm - 5:10 pm
1:30 Introductory Remarks: IUPAC Projects
1:40 106 Objectives of the Solubility Data Series

Mark Salomon, Mark.Salomon@maxpowerinc.com, MaxPower, Inc., Harleysville, PA 19438, United States

Many scientists rely on handbooks to find solubility data for their studies in which solubility is not their prime objective. Some handbook tables give very brief evaluations but normally no indication about the reliability of the tabulated solubility data. Many tables simply list solubilities without comments on reliability, precision and/or accuracy and do not provide the source to the tabulated data. Moreover, handbooks generally list one solubility value without error limits, without literature citation, and without mention of other, often more precise or accurate, data that were not considered. The IUPAC Solubility Data Project (SDP) addresses these problems by producing the Solubility Data Series (SDS) in which all available published literature to date on solubility data from all available literature sources for a specific solute/solvent system are succinctly detailed providing information on experimental methods and errors. Where sufficient literature data exist, contributors to the SDS provide critical evaluations comparing the literature data classifying them by either rejecting poorly conceived experimental methods and incorrect solubility values, recommending the best experimental values available, or specifying which data should be classified as tentative when literature comparisons cannot be made. At the time of this presentation, Volume 100 of the SDS will have been published, and in discussion future volumes will be described in addition to the future plans of the SDP.

2:05 107 NIST Standard Reference Data and the Solubility Data Series

Allan H. Harvey1, allan.harvey@nist.gov, Donald R. Burgess2. (1) Applied Chemicals and Materials Division, National Institute of Standards and Technology, Boulder, CO 80305, United States, (2) Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, United States

Beginning with Volume 66 in 1998, the IUPAC Solubility Data Series became the IUPAC-NIST Solubility Data Series, no longer published as monographs but instead as articles in the Journal of Physical and Chemical Reference Data. We describe this fruitful cooperation from the perspective of the journal and in the context of NIST's mission to promote the dissemination of “reference data” for use by science and industry. We will also share some thoughts about how the data compiled by the SDS and published in JPCRD might be made more accessible and useful in the future. An effort is described to make the contents of the pre-1998 volumes available on the Web in a format that will be easily searchable by researchers.

2:30 108 REST API for the IUPAC Solubility Data Series: A "Skunkworks" project

Stuart Chalk, schalk@unf.edu, Department of Chemistry, University of North Florida, Jacksonville, FL 32224, United States

The National Institute of Standards and Technology (NIST) publishes a large amount of scientific data across the disciplines. As science increasingly moves online and scientists eagerly seek quality data it makes sense that the data published by NIST be made available in a more web-enabled format. This presentation focuses on a proof of concept project to scrape data and metadata from pages of the IUPAC Solubility Data Series (http://srdata.nist.gov/solubility/) and make it available via a REST API on the authors website. Search and browse functionality are made available via the interface and data points/datasets are published at unique REST URLs for referencing. Finally, multiple export options (HTML, XML, JSON, JSON-LD) are available to allow both human and software usage of the data.

2:55 109 Database on ionic liquids solubilities in molecular solvents: Progress and prospects

Zdeněk Wagner1, Johan Jacquemin2, Magdalena Bendová1, bendova@icpf.cas.cz. (1) E. Hala Laboratory of Separation Processes, Institute of Chemical Process Fundamentals AS CR. v. v. i., Prague 6, Czech Republic, (2) School of Chemistry and Chemical Engineering, Queen's University Belfast, Belfast, United Kingdom

Since ionic liquids are classified as novel designer and green solvents, an evaluation of their mutual solubilities with traditionally used solvents is of particular interest in the design and optimization of numerous industrial processes, such as separations, as well as in environmental remediation. Due to the increasing number of published datasets, which range significantly in their quality and robustness, a critical evaluation of this data is needed urgently for the field to move into practical application. The IUPAC Project # 2011-065-3-500 Database on liquid-liquid equilibria of binary mixtures of ionic liquids and molecular compounds is aimed at compiling available literature data on solubility and liquid phase equilibria in binary systems ionic liquid + molecular solvent, critical review of the apparatuses used to measure the solubility data, and evaluation of the data with the respect to the experimental techniques used as well as to the purity of the ionic liquid/solvent measured. Critical evaluation of the compiled data will be carried out using a robust gnostic method as well as UNIFAC model parameters to provide a statistical analysis and in view of presenting eventually a set of recommended values. Data will be stored in a web-based interface that may readily be made public later for general use. In the present contribution, a report on the progress of the project will be given, along with examples of how the database is created and administrated. The individual stages of data addition and of their approval by the database administrators will be explained. An example of a critical evaluation of a selected dataset will equally be given to justify the choice of methods based on mathematical gnostics for this task.

3:20 Intermission
3:30 110 Critical evaluation of stability constant data by IUPAC

Glenn Hefter, G.Hefter@murdoch.edu.au, Chemistry Department, Murdoch University, Murdoch, WA 6150, Australia

Stability (formation) constants of metal-ion complexes with inorganic and organic ligands in aqueous solution are employed widely for modelling chemical speciation in areas as diverse as medicine, engineering, process control, extractive metallurgy, environmental management and so on. The available data are both extensive and widely dispersed across the scientific literature, which creates difficulties for potential users of such data. Fortunately, there has been a long-standing tradition of serious attempts at compiling stability constant and related thermodynamic data, starting with the efforts of Sillén and Martell for the UK Chemical Society and more recently in the form of electronic data bases. Unfortunately, most of these compilations do not attempt to distinguish reported results according to their reliability, that is, they are not “critical”. This is a major disadvantage for the non-expert user of such data. This situation was recognised a long time ago by key researchers in the field who set up a group within IUPAC in the 1970s to oversee production of critical evaluations of stability constant and related thermodynamic data. Having gone through a number of transformations, this work continues as part of the activities of the IUPAC Sub-Committee on Solubility and Equilibrium data. This talk will provide an overview of the many contributions that have been made by the IUPAC group to this important task.

3:55 111 Models to evaluate experimental solubility data for crystalline nonelectrolyte solutes in organic mono-solvents and solvent mixtures

William E. Acree, Jr., acree@unt.edu, Chemistry, University of North Texas, Denton, Texas 76203, United States

Methods used in the critical evaluation of published solubility data for crystalline nonelectrolytes dissolved in neat organic mono-solvents and organic solvent mixtures depend to a large extent on the quantity and type of experimental data to be evaluated. In those instances where independent measurements exist, one can compute the mean value and standard deviation, and see how the individual replicate measurments differ from each other. If replicate measurements are not available one can obtain an indication of the internal consistency of a data set by using solution models. The presentation will discuss the applicability of the following models: the Combined Nearly Ideal Binary Solvent (NIBS)/Redlich-Kister model for evaluating measured solubility data in binary solvent mixtures; the Abraham general solvation parameter model for evaluating solubility data of a given solute dissolved in a series of organic solvents; and the Apelblat and Buchowski λh models for evaluating how the measured solubility data varies with temperature. In addition the predictive aspects of the Combined NIBS/Redlich-Kister and Abraham general solvetion models will be illustrated. The Combined NIBS/Redlich-Kister model allows one to predict solute solubility in ternary and higher-order multicomponent solvent mixtures using the equation coefficients calculated from the measured solubility data for the solute dissolved in the contributing sub-binary solvent mixtures. The Abraham general solvation parameter model enables one to predict solute solubility in additional organic solvents from solute descriptors calculated from the measured solubility of the solute in select organic mono-solvents

4:20 112 Thermodynamics of electrolyte solubility in mixed solvents: Silver halides

Earle Waghorne, earle.waghorne@ucd.ie, UCD School of Chemistry and Chemical Biology, University College Dublin, Dublin, Ireland

Values for the solubility products, KS0, the enthalpies of solution, Δsol, and the equilibrium products for AgXi(i−1)− silver halide complexes,βi, of the silver halides in three non-aqueous solvents: methanol, acetonitrile and dimethylsulfoxide, and in their aqueous mixtures, are reviewed. The solvent systems provide examples for three types of mixed aqueous solvent system: aqueous alcohol mixtures and aqueous mixtures with dipolar aprotic solvents that are weakly or strongly basic. As is clear from the figures, the solubilities depend on both the concentration and nature of the organic co-solvent. It can also be seen that variations in the solution enthalpies don't simply mirror those in the solubility products. The data are discussed in terms of solvent – solute and solvent – solvent interactions present in the systems.


4:45 113 Possible contributions from the Solubility Data Project for arsenic and carbon dioxide environmental impacts mitigation

M. Clara F. Magalhães, mclara@ua.pt, Department of Chemistry and CICECO, University of Aveiro, Aveiro, Portugal

Arsenic is progressively becoming a major environmental problem in some parts of the world, being the responsible for large-scale disasters involving thousands of people in particular regions of Asia, Africa, and Central and South America. The crystallization of arsenic containing solid phases can be the process that ensures a long-term immobilization of arsenic in the environment. However, in order to understand which solid phases are stable under given environmental conditions, it is necessary to know each solid phase stability conditions. Solubility data are very important tools to calculate thermodynamic parameters that will help in the construction of stability field diagrams. Nevertheless, only a few data exist and much more work has to be developed on this subject. On the other hand, the progressive increase of the concentrations of carbon dioxide in the atmosphere has been considered one of the responsible for the increase of the global atmosphere average temperature. Oceans have been thought as possible sinks for carbon dioxide. In order to decrease the amounts of atmospheric carbon dioxide new technologies are being developed in which carbon dioxide is injected under pressure, in deep geological structures. In order to optimize the predictive models actually used, accurate data for the high-pressure solubility of carbon dioxide in different solvents are needed. Arsenic and carbon dioxide are two of many examples of environmental global problems, where critically evaluated solubility and related equilibria data can provide useful information in order to their mitigation

Wednesday, August 13, 2014

ChemEpInformatics: In the Pursuit of Epidrugs Using Chemoinformatics and Computational Approaches - PM Session

Palace Hotel
Room: Presidio

Jose Medina-Franco, Nathalie Meurice, Organizers
Jose Medina-Franco, Nathalie Meurice, Presiding
1:15 pm - 5:20 pm
1:15 Introductory Remarks
1:20 114 Targeting HDAC8 with derivatives of valproic acid: Design, synthesis, theoretical, and experimental evaluation as anticancer agents

Jose Correa Basurto, corrjose@gmail.com, SEPI, Escuela Superior de Medicina, IPN, Mexico, D.F. 11340, Mexico

Valproic acid (VPA) is extensively used as an anticonvulsive agent and as a treatment for other neurological disorders. It has been shown that VPA exerts an anti-proliferative effect on several types of cancer cells by inhibiting the activity of histone deacetylases (HDACs). However, VPA has some disadvantages, among which are poor water solubility and hepatotoxicity. Therefore, the aim of our research was to explore the binding site of VPA on HDAC8 using docking and molecular dynamics simulations, then, a set of VPA derivatives were designed and evaluated computationally to improve its physicochemical properties and anti-proliferative, which were synthesize and tested biologically. The results demonstrate that VPA is recognized at HDAC8 hydrophobic channel whereas its derivatives bind on different HDAC8 sites by hydrogen bonds, hydrophobic interactions and p-p interactions. The IC50 values of the VPA derivatives determined using HeLa cells are in mM range. This result indicates that VPA derivatives have greater anti-proliferative effects than VPA. Hence, these results suggest that these VPA derivatives may represent a good alternative for anticancer treatment.

1:45 115 Development of isoform-selective protein arginine methyltransferase inhibitors

Y. George Zheng1, yzheng@uga.edu, Ivaylo Ivanov2, Kun Qian1, Chunli Yan2. (1) Department of Pharmaceutical & Biomedical Sciences, University of Georgia, Athens, Georgia 30602, United States, (2) Department of Chemistry, Georgia State University, Atlanta, Georgia 30302, United States

Protein arginine methylation is an epigenetic modification mark critical for a variety of biological processes, including chromatin restructuring, RNA splicing, and signal transduction. Misregulation of protein arginine methyltransferase (PRMT) expression and activities have been linked to cardiovascular disorders, immune disease, cancer, and many other pathological conditions.The enzymatic reaction of arginine methylation is catalyzed by pprotein arginine methyltransferases (PRMTs) which transfer the methyl group from S-adenosyl-L-methionine (AdoMet) to the guanidino group of arginine residues in protein substrates, resulting in mono and di-methylarginine residues in substrate proteins. Several PRMT genes and proteins have been validated as new disease biomarkers and therapeutic targets in various cancer models. Developing selective PRMT inhibitors is a challenging task for both academic labs and pharmaceutical companies. Most current PRMT inhibitors display limited specificity and selectivity, indiscriminately targeting many methyltransferase enzymes that use AdoMet as a cofactor. We report the identification and characterization of diamidine compounds that specifically inhibit PRMT1, the primary type I arginine methyltransferase. Docking, molecular dynamics and MM/PBSA analysis were conducted to understand the binding modes of these inhibitors and the molecular basis of selective inhibition for PRMT1. Kinetic assays showed that furamidine, one lead inhibitor, functions predominantly by competing with the H4 peptide substrate instead of the cofactor SAM when binding the target enzyme. Furthermore, cellular studies revealed the inhibitor is permeable across the plasma membranes and can effectively inhibit intracellular PRMT1 activity and block cell proliferation in a set of leukemia cell lines with different genetic lesions. This work represents one of the best isoform-selective PRMT inhibitors to date.

2:10 116 DNA methyltransferase Dnmt1: Regulation and novel drug-design strategies

Zeljko M Svedruzic1, zsvedruz@biol.pmf.hr, Patrik Nikolic2. (1) Faculty of Medicine, Biomedical Technology, University of Rijeka, Rijeka, Croatia, (2) Faculty of Medicine, Biomedical Technology, University of Rijeka, Campus Trsat, Radmile Matejčić 2, Rm 823, Rijeka, Croatia

DNA methyltransferase Dnmt1 is the key enzyme in functional organization of the human genome. For almost 30 years, Dnmt1 has been unsuccessfully targeted in different drug-design efforts. Most of those failures can be attributed to inadequate understanding of diverse mechanisms that regulate the enzyme activity. Dnmt1 is a large enzyme with multiple flexible domains and phosphorylation sites. Dnmt1 can functionally interact with about 40 different molecules involved in DNA repair, chromatin organization, or RNA-directed-DNA methylation. The interacting molecules include proteins, poly(ADP-ribose), and methylated and unmethylated single-stranded and double-stranded non-coding RNA molecules. We are developing inhibitors and activators of Dnmt1 using a combination of enzyme assays, computational structural analysis, and numerical evaluation of enzyme activity. Knowledge of dynamic reversible processes in the active site of Dnmt1 were used to create novel mechanism-based inhibitors that do not cause DNA damage. Molecular docking and QM/MM MD simulations were used to optimize inhibitors' geometry, flexibility, interaction with the solvent in the active site, and finally formation of a reversible covalent adduct with the enzyme. Different strategies for optimization of the PK/PD properties have been suggested. A dynamic active site loop has been identified as a part of the lock mechanism that controls allosteric regulation and processivity. Enzymatic assays and numerical methods have been developed to study processivity on the DNA substrate and the affinity for binding at the allosteric site. The presented approach allows screening for specific allosteric activators and inhibitors.

2:35 117 Identification and design of new C5-DNA methyltransferase inhibitors and their biological activity

Paola Barbara Arimondo, paola.arimondo@etac.cnrs.fr, USR3388 - ETaC, CNRS-Pierre Fabre, Toulouse, France

DNA methylation is involved in the regulation of gene expression and plays an important role in normal developmental processes and disease. In particular, the epigenetic landscape is altered in cancers where abnormal hypermethylation leads to silencing of certain genes such as tumor suppressor genes. In mammals, DNA methyltransferases are the enzymes responsible for DNA methylation on the position 5 of cytidine in a CpG context. Few direct enzyme inhibitors are known and those have several drawbacks. In order to identify novel inhibitors, we developed three chemical strategies. First a fluorescent High-Throughput Screening for the inhibition of the murine catalytic Dnmt3a/3L complex on the chemical library of the Muséum Naturelle d'Histoire Naturelle and found twelve hits with low micromolar activities. Interestingly, they showed little cytotoxicity. Dichlone, a small halogenated naphthoquinone, classically used as pesticide and fungicide, showed the lowest EC50 at 460 nM. Two molecules including Dichlone, efficiently reactivated YFP gene expression in a stable HEK293 cell line by promoter demethylation. Their efficacy was comparable to the DNMT inhibitor of reference 5-azacytidine. Second, based on molecular modeling studies of quinoline inhibitor SGI1027 in the crystal structure of M.Hha I C5 DNA methyltransferase, suggesting that the quinoline and the aminopyridimine are important for the interaction with the substrates and the protein, we synthesized twenty five new derivatives. Among them, four compounds showed an activity comparable to parent compound SGI-1027. The compounds were more potent against human catalytic DNMT3A than against human DNMT1 and induced the reexpression of a reporter gene, controlled by a methylated CMV promoter, in leukemia KG-1 cells. We carried out structure activity relationship studies to point out the substituents important for inhibition. Third, we carried out a modulation study of the non-nucleoside inhibitor N-Phthaloyl-L-tryptophan or RG108. The indole, carboxylate and phthalimide moieties were modified. Homologated and conformationally constrained analogs were prepared. Among them, two constrained compounds and two NPys derivatives were found at least 10-fold more potent than the reference compound. The cytotoxicity on the tumor DU145 cell line of the most potent inhibitors was correlated to their inhibitory potency. Finally, docking studies were conducted in order to understand their binding mode. Altogether, these studies provide insights for the design of the next-generation of DNMT inhibitors. References: Ceccaldi A, Rajavelu A, Ragozin S, Sénamaud-Beaufort C, Bashtrykov P, Testa N, Dali-Ali H, Maulay-Bailly C, Amand S, Guianvarc'h D, Jeltsch A, Arimondo PB. Identification of Novel Inhibitors of DNA Methylation by Screening of a Chemical Library. ACS Chem Biol. 2013 8(3):543-8. Rilova E., ErdmannA., GrosC., MassonV., AussaguesY., Poughon-CassaboisV., RajaveluA., JeltschA., MenonY., NovosadN., Gregoire J.M., VispéS., SchambelP., AusseilA., Sautel F., ArimondoPB and CantagrelF.Design, synthesis and biological evaluation of 4-amino-N-(4-aminophenyl)-benzamides analogues of quinoline-based SGI-1027 as inhibitors of DNA methylation.Chem.Med.Chem 2014 in press. Asgatay S, Champion C, Marloie G, Drujon T, Senamaud-Beaufort C, Ceccaldi A, Erdmann A, Rajavelu A, Schambel P, Jeltsch A, Lequin O, Karoyan P, Arimondo PB, Guianvarc'h Synthesis and Evaluation of Analogues of N-Phthaloyl-l-tryptophan (RG108) as Inhibitors of DNA Methyltransferase 1. J Med Chem. 2014 57(2):421-34.

3:00 118 Drug repurposing and epigenetics: Olsalazine is a hypomethylating compound active in a cellular context

José L. Medina-Franco1, Oscar Méndez-Lucio2, Joachim Petit1, Jeremy Tran3, James Bogenberger1, Mark Muller3, Raoul Tibes1, Nathalie Meurice1, Meurice.Nathalie@mayo.edu. (1) Department of Research, Mayo Clinic, Scottsdale, Arizona 85259, United States, (2) Facultad de Química, Universidad Nacional Autónoma de México, Mexico City, Mexico, (3) College of Medicine, University of Central Florida, Orlando, Florida 32827, United States

DNA hypomethylating drugs that inhibit DNA methyltransferases (DNMTs) are promising compounds for the treatment of cancer and other diseases. Herein, we describe the characterization of olsalazine, an approved anti-inflammatory drug, as a novel DNA hypomethylating agent [1]. Olsalazine was identified by a fast computer-guided similarity searching of DrugBank, a database of approved drugs to a previously identified inhibitor of DNMTs [2]. In order to examine the ability of olsalazine to inhibit DNMT activity, we utilized a novel DNA methylation re-programming system that operates in the context of living cells [3]. The cell based screen used in this study is highly tractable, internally controlled and well-suited for a drug repurposing strategy in epigenetics. Olsalazine very closely mimics the action of 5-aza-2'-deoxycytidine, a known hypomethylating drug used in the clinic for the treatment of myelodysplastic syndromes (MDS) and acute myeloid leukemia (AML), with minimal cytotoxicity at the concentrations tested. Based on this proof of concept, a more systematic study of the similarity relationships of approved drugs to known epigenetic agents had been undertaken, and will assess the significance of these compounds to MDS and AML. [1] Mendez-Lucio, O. et al. ChemMedChem, 2014, 9, 560. [2] Kuck, D. et al. Bioorg. Med. Chem. 2010, 18, 822 [3] Morano, A. et al. Nucl. Acids Res., 2014, 42, 804.

3:25 Intermission
3:40 119 Computer-aided hit/probe discovery for methyllysine reader proteins

Bradley M. Dickson, Lindsey I. James, Brandi M. Baughman, Stephen V. Frye, Dmitri Kireev, dmitri.kireev@unc.edu. Center for Integrative Chemical Biology and Drug Discovery, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States

Chemical marks on histones, collectively called the histone code, are involved in regulation of gene expression, cell cycle and genome stability. Proteins that mediate the histone signaling by binding to the histone marks constitute a largely untapped source of therapeutic targets. The Center for Integrative Chemical Biology and Drug Discovery is involved in the discovery of chemical probes for methyllysine (Kme) readers since 2009. In silico approaches have been instrumental in all stages of this endeavor. For instance, pharmacophore- and structure-based virtual screening has lead to identification of small-molecule hits for L3MBTL1 [1], a malignant brain tumor (MBT) protein. Structure-based design was used throughout the probe development for L3MBTL3 [2], another MBT protein. Molecular simulations were helpful in understanding structural mechanisms of UHRF1, a ubiquitin ligase [3]. Current work includes hit/probe design for UHRF1 and identification of endogenous histone substrates for dual-domain readers. The talk will provide a brief overview of the past work and a discussion of new results. [1] Kireev et al, Identification of non-peptide malignant brain tumor (MBT) repeat antagonists by virtual screening of commercially available compounds. J Med Chem 2010, 53 (21), 7625-7631 [2] James et al, Discovery of a chemical probe for the L3MBTL3 methyllysine reader domain. Nat Chem Biol 2013, 9 (3), 184-191 [3] Rothbart et al, Multivalent histone engagement by the linked tandem Tudor and PHD domains of UHRF1 is required for the epigenetic inheritance of DNA methylation. Genes & Development 2013, 27 (11), 1288-1298

4:05 120 Molecular design approaches to target histone methyltransferases

Alberto Del Rio, alberto.delrio@gmail.com, Institute of Organic Synthesis and Photoreactivity (ISOF), National Research Council (CNR), Bologna, Bologna 40129, Italy and Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Alma Mater Studiorum, University of Bologna, Bologna, Bologna 40126, Italy

In the last years the study of histone post-translational modifications (PTMs) marked an extraordinary progression and a great number of epigenetic modifications has been identified and associated to different fundamental biological processes and pathological conditions. Consistently, several drug discovery programs aimed to devise small-molecules able to pharmacologically modulate epigenetic enzymes have been initiated. In this framework, computer-aided approaches spanning from ligand to structure-based methods promise to play an essential role to speed-up the identification of new chemical entities. Herein we provide an overview of molecular design approaches to target protein lysine methyltransferases (PKMT), which are a group of histone-modifying enzymes that emerged recently as important targets for cancer therapy. We will present a typical computer-aided approach to search for novel inhibitors of SMYD3, a PKMT found to be upregulated in colorectal cancer cells and tissues. Results of in vitro and cell-based assays demonstrate that small-molecules identified with in silico techniques can modulate the activity of SMYD3, thus supporting the usage of computational approaches as a valuable set of tools for the early-stage identification of new epihits.

4:30 121 Structural basis for selective inhibition of hSirt2 by ligand induced rearrangement of the active site

Tobias Rumpf1, tobias.rumpf@pharmazie.uni-freiburg.de, Matthias Schiedel1, Berin Karaman2, Claudia Rössler3, Brian J North5, Kathrin I Ladwein1, Markus Gaier1, David A Sinclair5, Mike Schutkowski3, Wolfgang Sippl2, Oliver Einsle4, Manfred Jung1. (1) Institute of Pharmaceutical Sciences, Albert-Ludwigs-University Freiburg, Freiburg, Baden-Württemberg 79104, Germany, (2) Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle, Sachsen-Anhalt 06120, Germany, (3) Institute for Biochemistry and Biotechnology, Martin-Luther-University Halle-Wittenberg, Halle, Sachsen-Anhalt 06120, Germany, (4) Institute for Biochemistry BIOSS Centre for Biological Signalling Studies, Albert-Ludwigs-University Freiburg, Freiburg, Baden-Württemberg 79104, Germany, (5) Department for Genetics, Havard Medical School, Boston, Massachusetts 02115, United States

Sirtuins are a unique and highly conserved class of NAD+-dependent lysine deacylases. Among these the human isotype Sirt2 has been implicated in the pathogenesis of cancer, inflammation and neurodegenerative diseases but a rational basis for the development of optimized inhibitors was lacking so far. Here we present the first high-resolution structures of human Sirt2 in complex with highly selective drug-like inhibitors that show a unique inhibitory mechanism. Potency and the unprecedented selectivity towards Sirt2 are based on a ligand-induced structural rearrangement of the active site. Application of the most potent of these Sirtuin-rearranging ligands, termed SirReal2, leads to tubulin hyperacetylation in Hela cells and induces degradation of the checkpoint protein BubR1 consistent with Sirt2-inhibition in vivo.

4:55 122 Decoding genetic and epigenetic networks for vitamin C mediated cell reprogramming

Panwen Wang, Yan Wang, Junwen Wang, junwen@hku.hk. Department of Biochemistry, The University of Hong Kong, Hong Kong, Hong Kong NA, China

Somatic cells can be transformed into induced Pluripotent Stem Cells (iPSCs), a process called reprogramming. Recently, researchers found that vitamin C, a natural molecule that is essential for human health, can promote reprogramming. However, the complete mechanism underlying this process remains unknown. Here we aim to develop a computational framework to uncover the mechanism including signaling pathway, genetic and epigenetic regulatory networks based on our previous work ChIP-Array (Qin, et al., 2011) and EpiRegNet (Wang, et al., 2011). ChIP-Array uses ChIP-seq and transcriptome profiling data to find direct and indirect targets of a particular transcription factors. EpiRegNet explores the regulatory relations between histone modifiers and target genes, buy investigating the correlations of the modifiers' ChIP-seq signals and the expression of their potential their targets. We will extend these strategies to identify signaling, regulatory and epi-regulatory networks for vitamin C mediated cell reprogramming. For the signaling pathway, we use a tissue-specific protein-protein interaction network that is weighted by the known pathways and co-express correlation, and apply a fast Random Walk with Restart algorithm to it. The initial pathways are generated and then optimized by the Depth-First Search algorithm. Several case studies demonstrate the effectiveness of our computational framework. The framework will lead more thorough understanding of the mechanism of cell reprogramming and discover transcription factors or epigenetic modifiers that are suitable for potential drug targets.
Reference: Qin, J., et al. ChIP-Array: combinatory analysis of ChIP-seq/chip and microarray gene expression data to discover direct/indirect targets of a transcription factor. Nucleic Acids Res 2011;39(Web Server issue):W430-436. Wang, L.Y., et al. EpiRegNet: constructing epigenetic regulatory network from high throughput gene expression data for humans. Epigenetics 2011;6(12):1505-1512.

Wednesday, August 13, 2014

Exploring the Application of New Technologies in Chemical Research and Education - PM Session

Palace Hotel
Room: California Parlor
Cosponsored by CHED

David Martinsen, Organizers
David Martinsen, Presiding
1:30 pm - 5:15 pm
1:30 Introductory Remarks
1:35 123 New target prediction and visualization tools incorporating open source molecular fingerprints for TB mobile version 2

Sean Ekins1, ekinssean@yahoo.com, Alex M Clark2, Malabika Sarker3. (1) Collaborative Drug Discovery, inc, Burlingame, CA, United States, (2) Molecular Materials Informatics,, Montreal, Quebec, Canada, (3) SRI International, CA, United States

We recently developed a freely available mobile app (TB Mobile) for both iOS and Android platforms that displays Mycobacterium tuberculosis (Mtb) active molecule structures and their targets with links to associated data. The app was developed to make target information available to as large an audience as possible and therefore has utility for education and researchers. We now report a major update of the iOS version of the app. This includes enhancements that use an implementation of ECFP_6 fingerprints that we have made open source. Using these fingerprints, the user can propose compounds with possible anti-TB activity, and view the compounds within a cluster landscape. Proposed compounds can also be compared to existing target data, using a näive Bayesian scoring system to rank probable targets. We have curated an additional 60 new compounds and their targets for Mtband added these to the original set of 745 compounds. We have also curated 20 further compounds (many without targets in TB Mobile) to evaluate this version of the app with 805 compounds and associated targets. TB Mobilecan now manage a small collection of compounds that can be imported from external sources, or exported by various means such as email or app-to-app inter-process communication. This means that TB Mobilecan be used as a node within a growing ecosystem of mobile apps for cheminformatics. It can also cluster compounds and use internal algorithms to help identify potential targets based on molecular similarity. TB Mobile represents a valuable dataset, data-visualization aid and prediction tool.

2:05 124 Scilligence's ELN for research laboratories and academia

Rajeev Hotchandani, rhotchandani@scilligence.com, Scilligence, Burlington, MA 01803, United States

The dawn of digital age and innovative companies like Google have made access to information much easily in our daily life. However, various laboratories and academic research is lagging behind in this trend. Although a trend has started in academia research groups and organic chemistry labs, a lot of these institutions are still using paper notebooks to document their everyday work. This makes knowledge preservation and sharing much more difficult. Through case study examples in this presentation, knowledge preservation, IP protection and team collaboration can be made efficient by Scilligence informatics in your laboratory environments.

2:35 125 Accessing 3D printable chemical structures online

Vincent F. Scalfani1, vfscalfani@ua.edu, Antony J. Williams2, Robert M. Hanson3, Jason E. Bara4, Aileen Day2, Valery Tkachenko2. (1) University Libraries, University of Alabama, Tuscaloosa, AL 35487, United States, (2) eScience, Royal Society of Chemistry, Cambridge, Cambridgeshire CB4 0WF, United Kingdom, (3) Department of Chemistry, St, Olaf College, Northfield, MN 55057, United States, (4) Department of Chemical & Biological Engineering, University of Alabama, Tuscaloosa, AL 35487, United States

We have been exploring routes to create 3D printable chemical structure files (.WRL and .STL). These digital 3D files can be generated directly from crystallographic information files (.CIF) using a variety of software packages such as Jmol, PyMol, or Chimera. After proper conversion to the .STL (or .WRL) file format, the chemical structures can be fabricated into tangible plastic models using 3D printers. This technique can theoretically be used for any molecular or solid structure. Researchers and educators are no longer limited to building models via traditional piecewise plastic model kits. As such, 3D printed molecular models have tremendous value for teaching and research. As the number of available 3D printable structures continues to grow, there is a need for a robust chemical database to store these files. This presentation will discuss our efforts to incorporate 3D printable chemical structures within the Royal Society of Chemistry's online compound database.

3:05 Intermission
3:15 126 Libraries as hubs for emerging technologies

Jeffrey R. Lancaster, jeffrey.lancaster@columbia.edu, Science & Engineering Libraries, Columbia University Libraries, New York, NY 10027, United States

Libraries occupy a neutral space that is increasingly recognized as a fertile home for innovation. Though this new position is sometimes at odds with its traditional identity, libraries are beginning to develop services built around emerging technologies, changing user needs, and evolving modes of research. I will discuss what Columbia University Libraries is currently doing to support these changing patron expectations, and I will address our current thinking around building service models for these new technologies. With several case studies, I will focus on how chemists – faculty and students – are adapting to using the new technologies for their research, teaching, and personal interests. Additionally, I will discuss some possible future directions for these technologies, reasons why libraries will continue to add value to new technologies, and areas where libraries can partner with others, including publishers, professional societies, and industrial organizations in supporting and developing emerging technologies.

3:45 127 Global science capacity building and the Maker Movement: Do-it-yourself lab equipment with Tekla Labs

Julea Vlassakis, jvlassakis@berkeley.edu, Bioengineering, UC Berkeley, United States

Scientific research and technological innovation are critical for sustainable economic growth in resource-limited settings, but commercial laboratory equipment is prohibitively expensive. Similarly, educators around the world cannot afford to purchase basic laboratory equipment that would enhance their ability to bring science to life in their classrooms. Tekla Labs, a UC Berkeley/UCSF-based student organization, has started an online community for sharing designs to build reliable, low-cost laboratory equipment. Our goal is to empower scientists and educators to build their own research infrastructure using local and repurposed supplies. Our talk will describe our efforts to engage and connect the DIY community with scientists and educators through our online community and competitions, such as our recent international BuildMyLab design contest. We received 174 creative designs and instructions for building DIY laboratory equipment through the competition, which was held with Instructables.com. Winners such as the hotplate and laminar flow hood and others such as centrifuges and digital microscopes are essential items in many life sciences laboratories. These DIY alternatives are constructed from readily available raw materials and electronics components and are a small fraction of the cost of similar commercial systems. We will integrate these instructions into our existing repository of DIY lab equipment designs available to researchers and educators worldwide. We will also discuss the challenges of mediating international partnerships, with examples from our current efforts to perform test builds of our designs with collaborators in need worldwide. Finally, we will share our vision for the future of DIY laboratory equipment as an enabler for technological and economic development in resource-poor settings, and for improved immersive scientific education around the world.

4:15 128 Google Glass based immunochromatographic diagnostic test analysis

Steve Feng1,2, stevewfeng@gmail.com, Romain Caire1,2, Bingen Cortazar1,2, Mehmet Turan1,2, Andrew Wong1,2, Aydogan Ozcan1,2,3,4. (1) Department of Electrical Engineering, University of California, Los Angeles, Los Angeles, CA 90095, United States, (2) Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA 90095, United States, (3) California NanoSystems Institute, University of California, Los Angeles, Los Angeles, CA 90095, United States, (4) Department of Surgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, United States


Recent advances in wearable computing devices such as smart glasses provide new opportunities for performing various diagnostic tasks. We demonstrate the ability for one such smart device, the Google Glass, to perform qualitative decisions and quantitative measurements on lateral flow immunochromatographic diagnostic tests using a hands-free voice-controllable rapid diagnostic test (RDT) reader software application without any external hardware attachments. After users run our application to image RDTs with attached Quick Response (QR) code identifiers generated though a web tool, the images are digitally transferred to our servers for automated evaluation of all RDTs in the image. The resulting diagnoses are returned to the user's Glass device and made available along with their QR code data and other associated information (e.g. geographic) on a central server, which provides the test results via tabular and geospatial (i.e., world map with geo-tagged data) representations. We demonstrate this system's ability to evaluate qualitative (i.e., yes/no) human immunodeficiency virus (HIV) and quantitative prostate-specific antigen (PSA) tests. For the quantitative PSA tests, we imaged and measured both Free and Total PSA tests at concentrations ranging from 0 to 200 ng/mL and generated calibration curves mapping image line intensities to absolute PSA concentrations. Providing real-time spatiotemporal tracking of various diseases and medical conditions, hands-free sensing and imaging platforms on wearable computing devices such as Google Glass could prove quite useful for epidemiology, mobile health and telemedicine applications.

4:45 129 Frugal science and global health: Democratizing access to scientific tools

Manu Prakash, manup@stanford.edu, Stanford University, United States

Somebody once said, “What a damn fool can do for a dollar, an engineer can do for a nickel.” Thinking about cost as an engineering constraint brings new life to ideas. This is what makes the difference between an idea influencing a hundred people or a billion. With our planet literally teaming with problems, it's time to take cost constraints into serious consideration. As physicists, we like to make stuff. We use these skills (and field work) to design solutions for extremely resource constrained settings, specially in the field of global health. I will discuss our current work from field diagnostics to high-throughput vector ecology and hands on science education.

Thursday, August 14, 2014

General Papers - AM Session
Computational Chemistry and Informatics

Palace Hotel
Room: Marina
Erin Bolstad, Organizers
Erin Bolstad, Presiding
9:40 am - 12:05 pm
9:40 Introductory Remarks
9:45 130 Ten years of innovative collaborative drug discovery

Barry A. Bunin, bbunin@collaborativedrug.com, Management, Collaborative Drug DIscovery (CDD), Burlingame, CA 94010, United States

Today, as chemical and biological information grows, researchers increasingly need simple tools to simultaneously manage their own and their collaborators' data. In addition there is a growing need to securely integrate public data with private data. New approaches that would allow scientists to do research more effectively need to handle chemical complexity (registration, stereochemistry, mixtures, batches) and biological complexity (enzyme, phenotypic, cell, animals, IC50, Z/Z', and metadata), as well as natural workflows for secure collaborations. This presentation will describe the creation and use cases with hosted web-database technologies like the CDD Vault that provides a more collaborative hosted informatics technology for commercial and neglected disease applications. The major components of effective scientific-community based research include: (1) unifying goal or focus on common therapeutic areas/diseases; (2) multiple research areas/expertise; (3) uniform database platform for effective data accumulation and management; (4) easy access and sharing of information; (5) potential for unlimited growth. The Collaborative Drug Discovery (CDD) Vault was built utilizing innovative web technologies in order to provide a platform that allows scientists to archive, mine, and securely share research data with an initial focus on infectious diseases of the developing world. Since the technology is “therapeutic area agnostic” it has general been proven equally applicable for commercial applications. Collaborative technology allows researchers to securely collaborate with technical experts around therapeutic or target areas thus advancing research facilitating the discovery of new drug candidates. It also allows scientists to speed up research by sharing unpublished data. Approved examples working with small companies, academics, large collaborations and funding bodies using the CDD Vault database platform will be presented.

10:05 131 Multicriteria drug discovery: Using CoMFA models to drive target specificity

Lei Wang, lei.wang@certara.com, Brian Masek, Fabian Boes, Bernd Wendt, Stephan Nagy. Certara, St. Louis, MO 63101, United States

A successful drug candidate will not only need to overcome ADME, physical and safety properties, but also often need to achieve a selectivity profile against related targets. The approach we present here combines a de novo design method with powerful ligand-based scoring function that consists of three different 3D QSAR models for different targets to generate selective inhibitors. A training set was developed with literature and public information where 50 molecules with activities for FactorXa, Trypsin and Thrombin were selected. Three different Topomer CoMFA models were built using the training set. By combining predictions of the activity profile with different weights and penalty score, a scoring function was created that can drive the invention of new and selective compounds.

10:25 am 132 WITHDRAWN
10:45 Intermission
11:00 133 Conditional structure-activity-similarity (SAS) maps

Gerry Maggiora2,3, gerry.maggiora@gmail.com, Martin Vogt1, Preeti Iyer1, Jurgen Bajorath1. (1) Department of Life Science Informatics, Rheinische Friedrich-Wilhelms-Universitat, Bonn, Germany, (2) University of Arizona Bio5 Research Institute, Tucson, Arizona 85721, United States, (3) Translational Genomics Research Institute, Phoenix, Arizona 85004, United States

SAS maps provide a simple and compact two-dimensional graphical portrayal of the activity differences and similarities of compound pairs. Different regions of these maps correspond to different pairwise features, namely, activity cliffs, similarity cliffs, and “smooth” SAR. Although this provides a useful representation of SARs it suffers from the fact that all of the data is pairwise. A new method is described for generating SAS maps with respect to specific reference compounds yielding unique maps that are“conditioned on” the reference compounds used in their construction. This approach provides a more compound-centric view that facilitates the analysis of SAR properties. An analysis is also presented that describes a means for assessing the statistical properties of the specific features in each conditional SAS map.

11:20 134 Fragment-based computational approach to study the phase behavior of biopolymers

Jan-Willem Handgraaf1, janwillem.handgraaf@culgi.com, Rubèn Serral Gracià1, Peter Schiffels2, Johannes G. E. M. Fraaije1,3. (1) Culgi B.V., Leiden, The Netherlands, (2) Fraunhofer-Institut für Fertigungstechnik und Angewandte Materialforschung, Bremen, Germany, (3) Leiden Institute of Chemistry, Leiden University, Leiden, The Netherlands

The computational study of molecular processes in nature, such as polymer dynamics or aggregation, are for the most part still out of reach for conventional simulation methods such as Molecular Dynamics or Monte Carlo. Here we present a fragmentation engine designed to handle in principle any molecular architecture. The methodology is based on the well-known notion that atomic groups in molecules can be lumped into single particles, or “beads” [1]. This so-called “coarse-graining” allows for (much) larger length and time scales that can typically be attained by conventional simulation methods.
The problem of fragmenting an atomistally detailed molecule is transformed to a global minimization problem. A scoring function determines how good or bad a given fragmentation is. Then the algorithm is reduced to finding the global minimum of the scoring function. The global minimization is performed by a Monte Carlo evolution where the lowest scoring fragmentation is stored [3]. This typically succeeds finding the global minimum for a given molecular architecture within a reasonable amount of time.
From a fundamental chemical informatics point of view, it is interesting to note that that we can represent a collection of different molecules by a much smaller set of distinct fragments, and that the frequency distribution of fragments follows Heap's power law. Extrapolation indicates that one could fragment the entire PubChem database into a mere few thousands distinct fragments. The great practical advantage is that in this way, by pre-parameterization, one can in principle speed up molecular affinity calculations of thermodynamic accuracy by many orders of magnitude, while still maintaining chemical specificity.
So far we have successfully applied the fragment-based computational approach to molecular systems and architectures found in the chemical [3], oil [4] and pharmaceutical industry. The method has been refined and validated against industrially relevant data. Here we will demonstrate the versatility of the methodology by applying it to the biopolymer lignin. First we discuss the problem of generating atomic structures of lignin oligomers from 2D NMR data, followed by a description of the actual fragmentation to coarse-grained lignin models. We finally discuss the application of these coarse-grained lignin models by performing simulations to study the phase behavior in aqueous solution. 1. See: Carbone, P.; Karimi-Varzaneh, H. A.; Müller-Plathe, F. Faraday Discuss. 2010, 144, 25–42, and references therein.
2. Fraaije, J. G. E. M., Nath, S. K., van Male, J., Becherer, P., Klein Wolterink, J., Handgraaf, J.-W., Case F., Tanase,, C. Serral Gracià, R., Culgi Manual version 8.0, www.culgi.com (2013) (ISBN: 978-90-817846-0-3).
3. Handgraaf, J.-W.; Serral Gracià, R.; Nath, S. K.; Chen Z.; Chou, S.-H.; Ross, R. B.; Schultz, N. E.; Fraaije, J. G. M. E., Macromolecules 2011, 44, 1053-1061.
4. Fraaije, J. G. E. M.; Tandon, K.; Jain, S.; Handgraaf, J.-W.; Buijse, M., Langmuir 2013, 29, 2136-2151.

11:40 135 Future of large-scale computational screening of porous materials

Jihan Kim, jihankim@kaist.ac.kr, Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, Republic of Korea

Porous materials such as zeolites and metal-organic frameworks are seen as promising materials for a wide range of energy and environmental related applications, which include carbon capture, gas storage and separations. Unfortunately, the number of porous material that can be experimentally synthesized is extremely large, making it very time-consuming to identify the optimal structures for a given application. In this presentation, I will discuss some of the computational methods and techniques developed to efficiently screen a large database of porous materials. There will be additional discussions with regards to some future technologies and methods that can lead to further advancements in efficient materials screening. Finally, new ideas and concepts that can help increase the general public's interest in porous materials screening will be introduced to stimulate further discussions.

Thursday, August 14, 2014

General Papers - AM Session
Toolkits and Databases

Palace Hotel
Room: Presidio
Erin Bolstad, Organizers
Erin Bolstad, Presiding
9:45 am - 12:25 pm
9:45 Introductory Remarks
9:50 136 Integrating Jmol/JSpecView into the Eureka Research Workbench

Stuart Chalk1, schalk@unf.edu, Matthew Morse1, Israel Hurst1, Anthony Williams2, Valery Tkachenko2, Alexey Pshenichnov2, Robert Hanson3. (1) Department of Chemistry, University of North Florida, Jacksonville, FL 32224, United States, (2) Royal Society of Chemistry, Wake Forest, NC 27587, United States, (3) Department of Chemistry, St. Olaf College, Northfield, MN 55057, United States

The Eureka Research Workbench (http://eureka.sourceforge.net) is an online environment for capturing the scientific process currently under development. This presentation discusses the integration of the Jmol/JSpecView into the environment to enable experimental data stored in ExptML files (http://exptml.sourceforge.net) to be viewed. Ideas to semantically link data into Jmol will also be discussed.

10:15 137 Open innovation and chemistry data management contributions from the Royal Society of Chemistry resulting from the Open PHACTS project

Antony J Williams1, tony27587@gmail.com, Valery Tkachenko1, Ken Karapetyan1, Alexey Pshenichnov1, Colin Batchelor2, Jon Steele2, David Sharpe2. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina 27587, United States, (2) eScience, Royal Society of Chemistry, Cambridge, Cambridgeshire, United Kingdom

The Royal Society of Chemistry was pleased to contribute to the Open PHACTS project, a 3 year project funded by the Innovative Medicines Initiative fund from the European Union. For three years we developed our existing platforms, created new and innovative widgets and data platforms to handle chemistry data, extended existing chemistry ontologies and embraced the semantic web open standards. As a result RSC served as the centralized chemistry data hub for the project. With the conclusion of the Open PHACTS project we will report on our experiences resulting from our participation in the project and provide an overview of what tools, capabilities and data have been released into the community as a result of our participation and how this may influence future projects. This will include the Open PHACTS open chemistry data dump including the chemistry related data in chemistry and semantic web consumable formats as well as some of the resulting chemistry software released to the community. The Open PHACTS project resulted in significant contributions to the chemistry community as well as the supporting pharmaceutical companies and biomedical community.

10:35 138 Using outreach to inform, maintain, and evaluate your collection

Kiyomi Deards, kdeards2@unl.edu, Department of Research and Instructional Services, University of Nebraska-Lincoln, Lincoln, NE 68506, United States

Increasing responsibilities, coupled with larger student bodies and corresponding increases in teaching and research faculty and staff, have made it difficult for many librarians and information professionals to spend time on collection development. This presentation will demonstrate how both formal and informal outreach efforts can inform collection development of Chemistry resources, the evaluation of existing resources, and guide collection maintenance efforts.

10:55 139 PubChem: Celebrating ten years online

Evan Bolton, bolton@ncbi.nlm.nih.gov, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, United States

PubChem is an open archive of chemical substances and their bioactivities celebrating ten years of service. With humble beginnings, PubChem has grown substantially through the years entirely from voluntary contributions by the chemical biology community, including chemical suppliers, universities, publishers, government agencies, domestic and international resources. This talk will give a brief overview of PubChem and explore some of the historical aspects of the project and future directions.

11:15 Intermission
11:25 140 Clustering the Royal Society of Chemistry chemical repository to enable enhanced navigation across millions of chemicals

Ken Karapetyan1, karapetyank@rsc.org, Valery Tkachenko1, Antony J Williams1, Oliver Kohlbacher2, Phillipp Thiel2. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina 27587, United States, (2) Applied Bioinformatics Group, University of Tuebingen, Tuebingen, Germany

The Royal Society of Chemistry has hosted the ChemSpider database and associated platforms for over five years. Technologies made significant progress over that period but, more importantly, the community needs in terms of the variety of data types as well as search performance have increased. The preprocessing of chemicals for improved similarity searching and compound database navigation is seen as one crucial component of major development efforts to architect a new data repository. This component is engineered and implemented in collaboration with the group of Professor Oliver Kohlbacher at University of Tübingen. They have developed an approach for clustering large chemical libraries based on a fast, parallel, and purely CPU-based algorithm for 2D binary fingerprint similarity calculation. Using this method, the complete similarity network of our seed set with tens of millions of chemicals has been analyzed at a Tanimoto threshold of 0.6 and all similarity links were fed into our database. The latter is highly beneficial and will allow us to create more complex and enriching visualizations of similar compounds with associated bioactivity data and physicochemical properties for the RSC chemical repository users. This presentation will provide an overview of our experiences in applying clustering to our compound data and how it will be used to enrich data navigation on the RSC data repository.

11:45 141 Experiences and adventures with noSQL and its applications to cheminformatics data

Valery Tkachenko1, tkachenkov@rsc.org, Antony Williams1, Ken Karapetyan1, Alexey Pshenichnov1, Mikhail Rybalkin2. (1) Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina 27587, United States, (2) Cheminformatics, GGA Software Services, St Petersburg, Russian Federation

The Royal Society of Chemistry hosts an increasing number of chemistry related databases and have utilized SQL-based technologies for our development platforms in general. In recent years the interest in noSQL databases has exploded as the associated technologies have developed and have shown great promise in terms of enhanced performance. We have collaborated with GGA Software Services to implement their noSQL technologies and have integrated it into the compound repository presently being developed as part of the underpinning architecture for compound data management at the RSC. This presentation will provide an overview of the reasons why we have integrated a noSQL solution, quantitative analysis of the benefits of inclusion and our thoughts regarding further approaches to optimize search performance for the chemical compound repository.

12:05 142 Building an online data repository for 100,000 dyes

David Hinks1, david_hinks@ncsu.edu, Valery Tkachenko2. (1) North Carolina State University, Raleigh, NC, United States, (2) Royal Society of Chemistry, Wake Forest, NC 27587, United States

The Max Weaver Dye Library housed at the College of Textiles at North Carolina State University contains a physical sample library of almost 100,000 dyes. Over the next few years these dyes will be mirrored in the form of a web-accessible collection of electronic chemical structure representations and, as data is generated, will include access to associated spectroscopic, crystallographic and organic synthesis data. As one of the world's largest physical collections of dyes the release of the data to the community will allow access to a database allowing for molecular modeling. This presentation will provide an overview of the progress to put the data repository of chemical and analytical data online as a community resource and our future plans for this rich data collection.

Thursday, August 14, 2014

General Papers - PM Session
Toolkits and Databases

Palace Hotel
Room: Marina
Erin Bolstad, Organizers
Erin Bolstad, Presiding
1:30 pm - 3:20 pm
1:30 143 Mining PubChem data: Interfaces, approaches, and best practices

Evan Bolton, bolton@ncbi.nlm.nih.gov, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, United States

PubChem is a sizeable resource of chemical biology information with many tens of millions of chemical structures, hundreds of millions biological activity outcomes, and many billions of cross references: to external resources, to biological assays, to related chemicals, to the biomedical literature, to proteins, to genes, to pathways, and more. How does one efficiently access such a massive corpus of information? This talk will give an overview of PubChem programmatic interfaces, data access approaches, and best practices to help empower researchers to efficiently access information contained within PubChem.

1:50 144 The Royal Society of Chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world

Antony J Williams, tony27587@gmail.com, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov. Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina 27587, United States

Semantic web technologies have quickly penetrated all areas of traditional and new database systems and have become the de facto standard in information exchange and communication. The Royal Society of Chemistry has built a new chemistry data repository with the semantic web at the core of the system. Every module of the data repository contains a semantic web layer and is able to interact internally and externally using standard approaches and formats including RDF, appropriate ontologies, SPARQL querying and so on. In this presentation we will review the challenges associated with developing this new system based on semantic web technologies and how the approach that we have taken offers distinct advantages over the original data model designed to produce the ChemSpider database. Its advantages include extensibility, an ontological underpinning, federated integration and the adoption of modern standards rather than the constraints of a standard SQL model.

2:10 Intermission
2:20 145 Dereplication applications for computer-assisted structure elucidation (CASE) and the ChemSpider database

Patrick Wheeler1, info@acdlabs.com, Arvin Moser1, Joe DiMartio1, Mikhail Elyashberg2, Kirill Blinov3, Sergey Molodstov4, Anthony Williams5. (1) Advanced Chemistry Development, Toronto, ON, Canada, (2) Moscow Department, Advanced Chemistry Development, Russian Federation, (3) Pr-t, 33 k.1 kv. 51, Molecule Apps, LLC, Moscow, Russian Federation, (4) Novosibirsk Institute of Organic Chemistry, Siberian Division, Russian Academy of Sciences, Novosibirsk, Russian Federation, (5) Royal Society of Chemistry, Wake Forest, NC 27587, United States

Many Computer-Assisted Structure Elucidation (CASE) applications focus on a structure generation process [1,2] while neglecting a crucial dereplication step. Structure dereplication is a pre-screening search to locate an identical structure or fragment, thus, speeding up the elucidation of a “known unknown” compound, and potentially saving many hours of work.
ACD/Structure Elucidator, a widely used CASE tool, already supports such dereplication methods through the use of internal databases and an available database from PubChem. A key collaboration of ACD/Labs and the Royal Society of Chemistry expands the ACD/Structure Elucidator tools to search across a ChemSpider library using MS and/or NMR data. This presentation will highlight the workflow, substructure filtering and the ranking process for structure identification. ChemSpider is a free chemical structure database of over 22,000,000 compounds, so this addition vastly expands the structure space which can be searched by this already useful CASE tool.

2:40 146 Faculty profiling and searching in the Eureka Research Workbench using VIVO and ScientistsDB

Stuart Chalk1, schalk@unf.edu, Matthew Morse1, Israel Hurst1, Anthony Williams2, Valery Tkachenko2, Alexey Pshenichnov2. (1) Department of Chemistry, University of North Florida, Jacksonville, FL 32224, United States, (2) Royal Society of Chemistry, Wake Forest, NC 27587, United States

The Eureka Research Workbench (http://eureka.sourceforge.net) is an online environment for capturing the scientific process currently under development. This presentation discusses the integration of the VIVO platform (http://vivoweb.org) and ScientistsDB (http://www.scientistsdb.com) for collaboration of scientists – extending the scope of Eureka outside of the faculty members research group. The new VIVO API is used to provided federated search of faculty profiles, identify related scientists, and subsequently extract recent publications and grants. Infobox metadata from ScientistsDB are searched using the Mediawiki API. Management of a faculty members VIVO/ScientistsDB profile information will also be discussed.

3:00 147 Semantic enrichment of ChemSpider data: Usage and applications

Valery Tkachenko, TkachenkoV@rsc.org, Royal Society of Chemistry, Wake Forest, NC 27587, United States

ChemSpider is a free, online database for chemical information that is maintained by the Royal Society of Chemistry. Previously, ChemSpider data was only represented using HTML, a human-accessible format, with a small fraction of the data made accessible to machines via a mixture of web services (WSDL) and REST Web services. This reduced the potential for data integration, and restricted the ability of software developers to reuse ChemSpider data in their consumer applications. In this paper, we describe the enrichment of the ChemSpider database using Semantic Web technologies. We also present an exemplar consumer application that supports data discovery and visualisation. The ChemSpider data is given structure and semantics (meaning) using a suite of schemata and ontologies that are developed using RDF and OWL, and is exposed as Linked Data via dereferencable URIs and a SPARQL endpoint. As a larger fraction of the ChemSpider data is represented using machine-accessible formats, both the potential for data integration and the ability of programmers to develop consumer applications are greatly enhanced. The new system is also highly extensible, scalable and was designed to be a part of larger Linked Data federated systems.