Vol. 67, No. 3: Fall 2015

Chemical Information Bulletin

A Publication of the Division of Chemical Information of the ACS

Fall 2015 — Vol. 67, No. 3

Image

Boston skyline photo from flickr user Brian Talbot

Vincent F. Scalfani, Editor,
University of Alabama
vfscalfani@ua.edu

 

ISSN: 0364–1910

Chemical Information Bulletin, © Copyright 2015 by the Division of Chemical Information of the American Chemical Society

Message from the Chair

Image

As of this writing, we are now only approximately one month from the Boston meeting, and CINF attendees have a great deal to anticipate for the Fall 2015 program.  I want to highlight a few of the special symposia, activities and events planned for Boston. 

A special event, planned in conjunction with the ACS Office of Graduate and Undergraduate Programs Office and CHED, is The Careers in Chemical Information and Chemoinformatics Panel Discussion & Brunch, scheduled for Sunday, August 16, 9am–11am, (Boston Convention and Exhibition Center, Room 52AB). This will be an opportunity for students to gain insight into career opportunities in the field of chemical information and to ask questions regarding career paths and choices.  Panelists include Christopher A. Lipinski, Ph.D., LLC., Scientific Advisor, Melior Discovery; Retired Senior Research Fellow, Pfizer; Tom Marman, PhD,Manager, Patent Searching & Information, Pfizer Legal Division; Rajarshi Guha, Ph.D., National Center for Advancing Translational Sciences (NCATS), NIH; Sean Ekins, PhD, DSc, Collaborations In Chemistry; Carmen Nitsche, CINforma Consulting and  North America Pistoia Alliance, Inc.;  Erja Kajosalo, Chemistry & Chemical Engineering Librarian, MIT; Melissa Landon, PhD, Schrödinger Inc., and Kevin Theisen, iChemLabs. Please share this information with any students or professionals who are interested in learning about possible career opportunities or making a career change.  Thanks to Lori Betsock and Joe Sostaric of the ACS undergraduate and graduate program offices for their support of this event.

This meeting in particular will highlight Wikipedia and Chemistry, with both a symposium

Sunday afternoon, (organized by Ye Li and Martin Walker and cosponsored by CHED), and a Wikipedia Edit-a-thon for Notable Chemists and Chemistry being held Wednesday afternoon. The Wikipedia Edit-a thon will consist of a training and editing session to improve coverage of notable chemists and chemistry topics on Wikipedia.  Attendees may come and go during the session, but instructions will be provided during the first hour, and you will need to bring your own laptop. Space is somewhat limited, so please register for this event.  For information or to register, contact Keith Lindblom in the ACS Office of Public Affairs at k_lindblom@acs.org or 202-872-6214. Sponsored by the ACS Office of Public Affairs, ACS Division of Chemical Information (CINF), and ACS Committee on Public Relations and Communications (CPRC).

On Monday, we will have a very special symposium to honor the memory of Jean Claude Bradley, “Father of the Open Notebook Science (ONS).”  Dr. Bradley was an active promoter of open science in the chemistry community and in 2013 was invited to the White House for an “Open Science Poster Session.” Jean-Claude was on the faculty of Drexel University and served as E-Learning Coordinator. Please join us on Monday, August 17th, for a symposium organized in his memory and to honor his significant contributions.

The CINF Tuesday Luncheon Speaker promises to be very special indeed. Michele Derrick, Schorr Family Associate Research Scientist, MFA , a scientific conservationist with the Boston Museum of Fine Arts (http://www.mfa.org/) and one of the developers of the CAMEO database.( http://cameo.mfa.org/wiki/About_CAMEO) will be our luncheon speaker. The Conservation and Art Materials Encyclopedia Online (CAMEO) is an electronic database that compiles, defines, and disseminates technical information on the distinct collection of terms, materials, and techniques used in the fields of art conservation and historic preservation. Please don’t forget to sign up for the CINF Luncheon with ACS when you register or please contact me to see if there are additional tickets available.

I hope I will see you at our Welcoming Reception Sunday, August 16, 6:30-8:30pm (Lighthouse Blrm 1 - Seaport World Trade Center) or at the Herman Skolnik Symposium and Reception honoring Dr. Jürgen Bajorath, Boston Convention and Exhibition Center, Room 254A or at the luncheon on Tuesday.  Please introduce yourself if you attend any of these CINF events. We always welcome new members and volunteers, so please contact me if you are interested.

 

Looking forward to meeting you.

Rachelle Bienstock,
Chair, ACS Division of Chemical Information
Rachelleb1@gmail.com

 

 

Letter from the Editor

Thank you for reading the ACS Chemical Information Bulletin (CIB). I always enjoy editing and typesetting the CIB. I find the process enjoyable because the CIB is an important contribution to the chemical information world.  Moreover, I frequently learn something new with typography and the desktop publishing software I use, Adobe InDesign. For example, in this issue, I added a linked table of contents. You can now skip directly to the section of interest by clicking on the chapter name. It is a small addition, but should make navigating the PDF version of the CIB easier and more efficient. I still have on my list of goals for the CINF Communications and Publications Committee to create an EPUB version of the CIB. I will hopefully have some news regarding this in the near future.

I am very excited to release this issue of the CIB. In addition to the core sections of the CIB, we have two book reviews by our expert book reviewer, Bob Buntrock. He reviewed Svetla Baykoucheva’s Managing Scientific Information and Research Data and 100 Chemical Myths: Misconceptions, Misunderstandings, Explanations by Lajos Kovacs et al. Bob also graciously wrote a nice tribute to the late Bob Massie. We are very thankful for all of Bob Buntrock’s contributions to the CIB!

Those familiar with the CIB will know that Svetla Baykoucheva introduced interviews into the CIB and has conducted numerous interviews with researchers, information specialists, and librarians since 2006 (http://www.acscinf.org/content/interviews). These interviews have become an integral component of the CIB. More recently, Svetlana Korolev has contributed excellent interviews as well. For this issue, I interviewed Svetla Baykoucheva and Kitty Porter. I think you will enjoy these interviews! It was a pleasure and honor to be able to contribute to the interviews section.

Lastly, thanks to our sponsors and everyone who contributed to this issue of the CIB! Now, that the Bulletin is finished, I should get working on my ACS talk. See you in Boston.

Vincent F. Scalfani, Editor
The University of Alabama
vfscalfani@ua.edu

 

Assistant Editor's Column: Science and Popular Culture Part II

It has been another interesting year for science and popular culture, though less chemistry with the end of Breaking Bad.

From Esther Inglis-Arkell, a reminder that Superman’s kryptonite doesn’t exist and why. At last fall’s Convergence, there was a panel discussion on popular culture and forensic science, “Getting Away with Murder.”

Epic science fiction films that are heavy on the hard science are becoming a new fall tradition. Last year’s film was Interstellar, with physicist Kip Thorne serving as a consultant and executive producer. He also authored a book and article about the science behind the film. In a second article, he and his co-authors suggested that Interstellar’s wormhole visualizations could be used to teach those concepts to students. Not surprisingly, a number of scientists and science journalists reviewed the film and its science, including Katie Mack, Phil Plait, Annalee Newitz (plus a follow up), the Smithsonian’s Cathleen Lewis, Discovery.com’s Ian O’Neill, the Library of Congress’ David Grinspoon (in an interview with Mother Jones), and Robert Naeye for Sky and Telescope. The Kavli Institute had a Google Hangout with several astrophysicists answering questions about the wormholes and black holes. Adam Rogers wrote about Thorne and the film for Wired. Lee Billings’ blog post at Scientific American about what the movie got wrong about interstellar travel was followed a few weeks later with an interview with Thorne. Neil deGrasse Tyson offered his thoughts via Twitter (Storified here). The reviews over the science in Interstellar led James Erwin over at Slate to recommend less scientific nitpicking over science fiction films. But then earlier this year, Charlie Jane Anders discussed a Berkeley Science Review essay that suggested that scientific accuracy is one of many valid ways to judge a story.

Not to ignore older works, marine biologist David Shiffman wrote about the shadow that Jaws has cast over the last forty years with its inaccuracies about shark behavior. And with the release of Jurassic World this summer, it was an opportunity to revisit the not-so-great science of Jurassic Park with pieces by Brian Switek and Phil Plait, as well as a piece by the film’s technical consultant Jack Horner on the improved plausibility of the new film.

There were a few articles this year about the Mad Max: Fury Road, and there were a few articles about its science. Noah Gittell wrote about how it addressed climate change, and Kyle Hill speculated on the disease afflicting the war boys. Jupiter Ascending was not as critically acclaimed, but Astra Bryant’s grant review summary for canine-human hybrids was an entertaining complement to the film, possibly moreso.

The University of Nebraska-Lincoln Libraries (co-sponsored by the chemistry and physics departments and Doane College), continued their Sci Pop Talks! series with speakers discussing Game of Thrones, Hollywood fires and explosions, and Marvel comics and radiation. The six talks from earlier this year are available on YouTube.

Orphan Black returned this spring with more discussion about the genetic engineering technology behind the show, from io9, The Mary Sue (where Casey Griffin and Nina Nesseth recapped the episodes), and a Longreads interview with science consultant Cosima Herter. The new summer series Humans is exploring artificial intelligence, and Jovana Grbic interviewed the show’s creators.

A few weeks ago at Comic-Con here in San Diego, I attended two NASA panels, including one about future Mars exploration and The Martian, both the novel and upcoming movie. Panelists included two NASA scientists, new astronaut Victor Glover, novelist Andy Weir, and one of the film’s producers. It was a delight to hear about NASA’s future plans for reaching Mars, and the science behind the film will be get a lot of coverage this fall.

Teri Vogel, Assistant Editor
tmvogel@uscd.edu

 

ACS CINF Social Networking Events at the Fall ACS Meeting

 

     Image       

                         CINF logo

 

 

Please join us at these Division of Chemical Information Events!

The ACS Division of Chemical Information is pleased to host the following social networking events at the Fall 2015 ACS National Meeting in Boston, MA.

Sunday Welcoming Reception & Scholarships for Scientific Excellence Posters. 6:30-8:30 pm, Sunday, August 16th – Lighthouse Ballroom 1, Seaport Hotel World Trade Center

Reception co-sponsored by Journal of Cheminformatics (Springer), Optibrium, PerkinElmer, Thieme Chemistry, and AAAS/Science.

Scholarships for Scientific Excellence sponsored exclusively by Royal Society of Chemistry.

Careers in Chemical Information and Cheminformatics Panel Discussion & Brunch. 9:00-11:00 am Sunday, August 16th – Room 52AB, Boston Convention and Exhibition Center

Sponsored by:

ACS Graduate & Postdoctoral Scholars Office and ACS Undergraduate Programs Office.

Tuesday Luncheon (Ticketed Event – Contact Division Chair, Rachelle Bienstock). 12:00-1:30 pm Tuesday, August 18th – Room 52A, Boston Convention and Exhibition Center

Sponsored exclusively by the Royal Society of Chemistry.

Speaker: Michele Derrick

The Schorr Family Associate Research Scientist at Museum of Fine Arts, Boston

Presentation: CAMEO: A Database for Technical Information on Materials in Museums

Herman Skolnik Award Symposium & Reception honoring Prof. Dr. Jürgen Bajorath.

Symposium: 8:00 am-5:00 pm Tuesday, August 18th - Room 104A, Boston Convention and Exhibition Center

Reception: 6:30-8:30 pm Tuesday, August 18th – Room 254A, Boston Convention and Exhibition Center

Sponsored by: ACS Publications, Cresset, Novartis, Pfizer and Schrödinger

 

Awards and Scholarships

Image

 

 

 

 

Chemical Structure Association Trust

Applications Invited for CSA Trust Grant for 2016

The Chemical Structure Association (CSA) Trust is an internationally-recognized organization established to promote the critical importance of chemical information to advances in chemical research.  In support of its charter, the Trust has created a unique Grant Program and is now inviting the submission of grant applications for 2016.

Purpose of the Grants

The Grant Program has been created to provide funding for the career development of young researchers who have demonstrated excellence in their education, research or development activities that are related to the systems and methods used to store, process and retrieve information about chemical structures, reactions and compounds.  One or more Grants will be awarded annually up to a total combined maximum of ten thousand U.S. dollars ($10,000).  Grantees have the option of payments being made in U.S. dollars or in British Pounds equivalent to the U.S. dollar amount. Grants are awarded for specific purposes, and within one year each grantee is required to submit a brief written report detailing how the grant funds were allocated. Grantees are also requested to recognize the support of the Trust in any paper or presentation that is given as a result of that support.

Who is Eligible?

Applicant(s), age 35 or younger, who have demonstrated excellence in their chemical information related research and who are developing careers that have the potential to have a positive impact on the utility of chemical information relevant to chemical structures, reactions and compounds, are invited to submit applications.  While the primary focus of the Grant Program is the career development of young researchers, additional bursaries may be made available at the discretion of the Trust.  All requests must follow the application procedures noted below and will be weighed against the same criteria.

Which Activities are Eligible?

Grants may be awarded to acquire the experience and education necessary to support research activities; for example, travel to collaborate with research groups, to attend a conference relevant to one’s area of research (including the presentation of an already-accepted research paper), to gain access to special computational facilities, or to acquire unique research techniques in support of one’s research.

Application Requirements: 

Applications must include the following documentation:
A letter that details the work upon which the Grant application is to be evaluated as well as details on research recently completed by the applicant;

The amount of Grant funds being requested and the details regarding the purpose for which the Grant will be used (e.g. cost of equipment, travel expenses if the request is for financial support of meeting attendance, etc.). The relevance of the above-stated purpose to the Trust’s objectives and the clarity of this statement are essential in the evaluation of the application);

A brief biographical sketch, including a statement of academic qualifications; 

Two reference letters in support of the application. 

Additional materials may be supplied at the discretion of the applicant only if relevant to the application and if such materials provide information not already included in items 1-4.   A copy of the completed application document must be supplied for distribution to the Grants Committee and can be submitted via regular mail or e-mail to the Committee Chair (see contact information below).

Deadline for Applications: 

Applications for the 2016 Grant is March 25, 2016. Successful applicants will be notified no later than May 2, 2016.

Address for Submission of Applications: 

The application documentation should be forwarded to:  Bonnie Lawlor, CSA Trust Grant Committee Chair, 276 Upper Gulph Road, Radnor, PA 19087, USA.  If you wish to enter your application by e-mail, please contact Bonnie Lawlor at chescot@aol.com prior to submission so that she can contact you if the e-mail does not arrive.

Chemical Structure Association Trust:  Recent Grant Awardees

2015 – Dr. Marta Encisco

Molecular Modeling Group, Department of Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Australia was awarded a Grant to cover travel costs to visit collaborators at universities in Spain and Germany and to present her work at the European Biophysical Societies Association Conference in Dresden, Germany in July 2015.

2015 – Jack Evans

School of Physical Science, University of Adelaide, Australia was awarded a grant to spend two weeks collaborating with the research group of Dr. Francois-Xavaier Coudert (CNRS, Chimie Paris Tech). 

2015 – Dr. Oxelandr Isayer

Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmaacy, University of North Carolina at Chapel Hill, was awarded a Grant to attend summer classes at the Deep Learning Summer School 2015 (University of Montreal) to expand his knowledge of machine learning to include Deep Learning (DL). His goal is to apply DL to chemical systems to improve predictive models of chemical bioactivity.

2015 – Aleix Gimeno Vives

Cheminformatics and Nutrition Research Group, Biochemistry and Biotechnology Dept., Universitat Rovira I Virgili was awarded a Grant to attend the Cresset European User Group Meeting in June 2015 in order to improve his knowledge of the software that he is using to determine what makes an inhibitor selective for PTP1B.

2014 – Dr. Adam Madarasz

Institute of Organic Chemistry, Research Centre for Natural Sciences, Hungarian Academy of Sciences. He was awarded a Grant for travel to study at the University of Oxford with Dr. Robert S. Paton, a 2013 CSA Trust Grant winner, in order to increase his  experience in the development of computational methodology which is able to accurately model realistic and flexible transition states in chemical and biochemical reactions.

2014 – MJosé Ojeda Montes

Department of Biochemistry and Biotechnology, University Rovira i Virgili, Spain. She was awarded a Grant for travel expenses to study for four months at the Freie University of Berlin to enhance her experience and knowledge regarding virtual screening workflows for predicting therapeutic uses of natural molecules in the field of functional food design.

2014 – Dr. David Palmer

Department of Chemistry, University of Strathclyde, Scotland.  He was awarded a Grant to present a paper at the fall 2014 meeting of the American Chemical Society on a new approach for representing molecular structures in computers based upon on ideas from the Integral Equation Theory of Molecular Liquids.

2014 – Sona B. Warrier

Departments of Pharmaceutical Chemistry, Pharmaceutical Biotechnology, and Pharmaceutical Analysis, NMIMS University, Mumbai. She was awarded a Grant to attend the International Conference on Pure and Applied Chemistry to present a poster on her research on inverse virtual screening in drug repositioning.

2013 – Dr. Johannes Hachmann

Department of Chemistry and Chemical Biology at Harvard University, Cambridge, MA.   He was awarded the Grant for travel to speak on “Structure-property relationships of molecular precursors to organic electronics” at a workshop sponsored by the Centre Européen de Calcul Atomique et Moléculaire (CECAM) that took place October 22 – 25, 2013 in Lausanne, Switzerland.

2013 – Dr. Robert S. Paton

University of Oxford, UK.  He was awarded the Grant to speak at the Sixth Asian Pacific Conference of Theoretical and Computational Chemistry in Korea on July 11, 2013. Receiving the invitation for this meeting provided Dr. Paton with an opportunity to further his career as a Principal Investigator.

2013 – Dr. Aaron Thornton

Material Science and Engineering at CSIRO in Victoria, Australia. He was awarded the Grant to attend the 2014 International Conference on Molecular and Materials Informatics at Iowa State University with the objective of expanding his knowledge of web semantics, chemical mark-up language, resource description frameworks and other online sharing tools.  He also visited Dr. Maciej Haranczyk, a prior CSA Trust Grant recipient, who is one of the world leaders in virtual screening.

2012 – Tu Le

CSIRO Division of Materials Science & Engineering, Clayton, VIV, Australia. Tu C. was awarded the Grant for travel to attend a cheminformatics course at Sheffield University and to visit the Membrane Biophysics group of the Department of Chemistry at Imperial College London.

2011 – J. B. Brown

Kyoto University, Kyoto, Japan. J.B. was awarded the Grant for travel to work with Professor Ernst Walter-Knappat the Freie University of Berlin and Professor Jean-Phillipe Vert of the Paris MinesTech to continue his work on the development of atomic partial charge kernels.

2010 – Noel O’Boyle

University College Cork, Ireland. Noel was awarded the grant to both network and present his work on open source software for pharmacophore discovery and searching at the 2010 German Conference on Cheminformatics.

2009 – Laura Guasch Pamies

University Rovira & Virgili, Catalonia, Spain.  Laura was awarded the Grant to do three months of research at the University of Innsbruck, Austria.

2008 – Maciej Haranczyk

University of Gdansk, Poland. Maciej was awarded the Grant to travel to Sheffield University, Sheffield, UK, for a 6-week visit for research purposes.

2007 – Rajarshi Guha

Indiana University, Bloomington, IN, USA. Rajarshi was awarded the Grant to attend the Gordon Research Conference on Computer-Aided Design in August 2007.

2006 – Krisztina Boda

University of Erlangen, Erlangen, Germany. Krisztina was awarded the Grant to attend the 2006 spring National Meeting of the American Chemical Society in Atlanta, GA, USA.

2005 – Dr. Val Gillet and Professor Peter Willett

University of Sheffield, Sheffield, UK.  They were awarded the Grant for student travel costs to the 2005 Chemical Structures Conference held in Noordwijkerhout, the Netherlands.

2004 – Dr. Sandra Saunders

University of Western Australia, Perth, Australia. Sandra was awarded the Grant to purchase equipment needed for her research.

 

2003 – Prashant S. Kharkar

Institute of Chemical Technology, University of Mumbai, Matunga, Mumbai. Prashant was awarded the Grant to attend the conference, Bioactive Discovery in the New Millennium, in Lorne, Victoria, Australia (February 2003) to present a paper, “The Docking Analysis of 5-Deazapteridine Inhibitors of Mycobacterium avium complex (MAC) Dihydrofolate reductase (DHFR).”

2001 – Georgios Gkoutos

Imperial College of Science, Technology and Medicine, Department of Chemistry. London, UK. Georgios was awarded the Grant to attend the conference, Computational Methods in Toxicology and Pharmacology Integrating Internet Resources, (CMTPI-2001) in Bordeaux, France, to present part of his work on internet-based molecular resource discovery tools.

Image InfoChem Image

CINFlogo

CINF Scholarship for Scientific Excellence Co-sponsored by InfoChem and Springer

The international scholarship program of the Division of Chemical Information (CINF) of the American Chemical Society (ACS) co-sponsored by InfoChem (www.infochem.de) and Springer (www.springer.com) is designed to reward students in chemical information and related sciences for scientific excellence and to foster their involvement in CINF.

Up to three scholarships valued at $1,000 each will be awarded at the 251st ACS National Meeting in San Diego, CA, March 13 - 17, 2016.  Student applicants must be enrolled at a certified college or university; postdoctoral fellows are also invited to apply. They will present a poster during the Welcoming Reception of the Division on Sunday evening at the National Meeting.  Additionally, they will have the option to show their poster at the Sci-Mix session on Monday night.  Abstracts for the poster must be submitted electronically through MAPS, the abstract submission system of ACS.

To apply, please inform the Chair of the selection committee, Stuart Chalk at schalk@unf.edu, that you are applying for a scholarship. Submit your abstract at http://maps.acs.org using your ACS ID.  If you do not have an ACS ID, follow the registration instructions and submit your abstract in the CINF program in the session “CINF Scholarship for Scientific Excellence. Student Poster Competition.”  MAPS will open for abstract submissions in August 2015. Please check the ACS website for the exact dates of the abstract submission period. Additionally, please send to me by January 31, 2016, a 2,000-word abstract in electronic form describing the work to be presented. Any questions related to applying for one of the scholarships should be directed to the same e-mail address.

Winners will be chosen based on content, presentation and relevance of the poster and they will be announced during the Sunday reception. The content will reflect upon the student’s work and describe research in the field of cheminformatics and related sciences.

Stuart Chalk

About InfoChem
InfoChem GmbH based in Munich, Germany, is a market leader in structure and reaction handling and retrieval. Founded in 1989, InfoChem focuses on the production and marketing of new chemical information products, including structural and reaction databases, and the development of software tools required for these applications. The main software tools provided are the InfoChem-Fast Search Engine (ICFSE), the InfoChem Chemistry Cartridge for Oracle (ICCARTRIDGE) and the widely used InfoChem reaction classification algorithm CLASSIFY. InfoChem distributes one of the largest structural and reaction files worldwide, currently containing 7 million organic compounds and facts and 4 million reactions covering the chemical literature published since 1974 (SPRESI). In addition, InfoChem provides tools for the automatic recognition and extraction of chemical entities and their conversion into chemical structures as well as the semantic enrichment of chemical science documents. Springer GmbH (Berlin) has held a majority interest in InfoChem since 1991. For more information go to www.infochem.de.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Committee Reports

Education Committee

Dear Colleagues,

Originally compiled by Dr. Gary Wiggins, Indiana University, Chemical Information Sources was published as a book in 1991.  Gary released it as a Wikibook in 2007.  Chemical Information Sources continued as a collaborative effort by chemistry librarians, mostly from the Association of Research Libraries (ARL), under Ben Wagner’s leadership from 2011-2014.  Thank you, Ben, for all your leadership and efforts during this period!  Many thanks, too, to all the authors who helped maintain its content. 

In the fall of 2014, to provide a long-term sustainable plan for maintaining the content, the ACS Chemical Information (CINF) Division’s Education Committee agreed to “adopt” three Wikibooks that had been created by Gary Wiggins:  

Chemical Information Sources Wikibook

SIRCh: Selected Internet Resources for Chemistry

Chemical Information Instructional Materials

Initial efforts by the CINF Education Committee will focus on the first two wikibooks.

Editor-in Chief: We are pleased to announce that the ACS CINF Education Committee formally approved appointing Chuck Huber for a 3-year term as Editor-in-Chief for the Chemical Information Sources Wikibook during the ACS Spring Meeting in Denver (March 21, 2015).

Technical Editor: Effective January 2015, Dr. Martin Walker has agreed to serve as Technical Editor for the Chemical Information Sources Wikibook.  In anticipation of forthcoming changes, Martin created an archived version on March 7, 2015.  For example, we hope to modernize the navigation and make it mobile-friendly.  We also plan to use metadata templates for commonly cited sources in order to maintain the URL for a resource in one location. We also plan to merge the selected internet resources with Chemical Information Sources Wikibook content covering that same topic. 

Authors: As part of these efforts, the person responsible for editing a chapter will be reviewed.  If you have edited a chapter previously and wish to continue doing so, please contact Chuck Huber.  If you are interested in being a new editor, please indicate which chapter and contact Chuck. His email address is huber@library.ucsb.edu.  If you have suggestions about new chapters that might be added, please also contact Chuck.

Because editorial efforts and authorship of wikibook content is only available in the “view history” tab, to recognize the work more formally and to give credit to author’s efforts, we plan to create a table of contents on the CINF’s Chemical Information Literacy page that will include the names of chapter authors.

To help keep the Chemical Information Sources Wikibook useful, the content must be updated regularly.  The ACS CINF Education Committee is honored to support this key resource and to provide a long-term sustainable path for maintaining it.  We thank everyone who has supported it to date and look forward to working with authors and readers in the future!

Grace Baysinger
Chair, CINF Education Committee

Book Reviews

Baykoucheva, S., Managing Scientific Information and Research Data, Chandos Publishing, Amsterdam, Boston, 2015 (Paperback $67.11 and ISBN 9780081001950).

When Svetla asked me to write a pre-publication review of this book for the CIB, I gladly accepted. Since the keyword “managing” was prominent, and also that interviews with information industry experts were included, I guess I assumed that the book would be about information managers.  Although I never was a manager in my career, I have plenty of experience with information managers, most good, some not so good.  When I received the manuscript, I was pleasantly surprised to see that it was about managing information at the personal, group, and organizational levels, especially the former, resulting in much broader appeal.

Given the increasing prevalence of electronic data acquisition as well as publishing and information retrieval, the need for individual scientists, especially chemists, to manage the increasing flood of information and data is paramount to success.

The Introductory chapter gives an account of the career of the author as well as a chronicle of the interviews she has done both in previous issues of the CIB and in this book. Chapter 2 cuts a wide and timely swath through a wide variety of issues, several controversial, in scientific communication, including challenges to traditional models and modes, refereeing, peer review, Open Access and the upsurge in Open Access Journals (including “predatory” journals), and a three-page list of new models of communication and publishing. Social media (blogs, Twitter, Facebook, Instagram, etc.) are discussed, including the reasons for using them and who uses them.  Further aspects of Open Access and the reviewing process are discussed in the chapter conclusion.

Ethics are discussed in Chapter 3, including debates on priority, editorial bias, manipulation of impact factors, peer review, detection of scientific fraud, citing decisions, pseudoscience vs. fraud vs. hoaxes, and a list of six organizations working to prevent unethical publication.  Chapter 5 covers the finding and acquisition of information as well as managing the retrieval process.  Resources, both free and fee-based, are briefly described. The list includes PubMed/MEDLINE, PubChem, Google Scholar, Reaxys, SciFinder, Scopus, and Web of Science.  Retrieval comparisons are presented for sample searches on CAplus (STN), MEDLINE, and CAPlus/MEDLINE searched together, and between MEDLINE, Scopus, and Web of Science. The use of Chemical Abstracts Registry Numbers (CASRN) is also discussed.

The continuing importance of scientific information literacy and the evolving roles of academic librarians are discussed in Chapter 6. The emergence of end-user friendly resources has at times led to an unfortunate perception that library input is no longer needed.  However, librarians are needed to assist in training and education for both the resources and the necessary information management tools. Training and education in information literacy covers the gamut of retrieval, evaluation, and management of the information as well as ethics and writing skills.  A long list of the functions performed by citation managing programs is followed by a list and brief descriptions of the popular programs. Illustrated examples are given. Design of information literacy instruction is presented as well as evaluation of the effectiveness of the programs.  Data for student evaluation of the effectiveness of the literacy programs at the author’s school (University of Maryland College Park, UMCP) are presented. The chapter concludes with a list of sample questions assigned in chemistry courses at UMCP.

Coping with the large task of dealing with Big Data is the theme of Chapter 8. Types of data are discussed followed by curation and authentication of data, management, archiving, and preservation. The management of metadata (data about data) is then discussed as well as data provenance and the use of DOI identifiers, data standards, and 10 criteria for citing data.  The roles of libraries and librarians in data curation, preservation, and storage are discussed, concluding with the question of whether academic librarians are promising more than they can deliver.

Electronic notebooks, ELNs, are discussed extensively in Chapter 9.  The pluses and minuses of both ELNs and paper notebooks are elaborated including acceptance by researchers, schools, and other organizations. The impact of ELNs not only affects data acquisition but also other aspects of management including preservation, determination of provenance, incorporation of all data especially the apparatus from where they are generated, and determination and reduction of data in error or even fraud are discussed.

Chapter 11 covers the controversial subject of measurement of the impact of academic research (although the discussion could easily be broadened to all research).  Use and misuse of Impact Factors and related metrics is the chapter focus. Eugene Garfield has observed that Impact Factors are designed to evaluate journals, not author contributions. However, even evaluation of journals is subject to spin, including spin from the publishers.  Resources described include ISI, the Science Citation Index (SCI), Journal Impact Factors (including drawbacks), Journal Citation Reports, Essential Science Indicators, h-index, and Google Scholar Citations.  Alternatives including Journal Metrics from Elsevier are also discussed.  The pressure put on researchers to publish in journals with high Impact Factors is extensive.

Other aspects of reader “attention” to publications including social media and altimetrics, are covered in Chapter 14.  As publication increasingly drifts into alternative media, including social media, Facebook, Twitter, online libraries, blogs on science and their comments, the need arises for evaluation of the impact and readership.  Altimetrics, monitoring attention to scholarly input on social media, attempts to do just that.  Are simple compilation of hits and visits to a site truly indicative of interest? Probably even less so than simple citation counts to traditionally published articles.  In both cases, there are several reasons for a reader to cite a resource. One provenance issue is that bots or crawlers can’t be distinguished from personal visits by individuals. Unlike traditional citation metrics, these metrics can begin to be tracked within days or even hours after publication, as opposed to years. The stakes are even higher for grants, funding, researcher reputation, promotion, etc.  The author states that “counts mean nothing unless they can be interpreted”, the same can be said for citation data.

Chapter 15 covers unique identifiers for author names, document identifiers, and chemical compound identifiers.  Author names are subject to wide variation due to editorial policies, name changes, transliteration protocols, cultural influences (especially for Asian names), and outright errors.  Publisher treatment of names is discussed for SciFinder, Scopus, and Web of Science with illustrations of how the author’s name is handled. (Even with a unique name, publisher variations in my name are extensive.)  Unique name identifiers including ORCID, International Standard Name Identifier, and ResearcherID are described. (I’m suitably inspired to get my ORCID number). The proliferation of the availability of most articles in digital form has led to the digital object identifier (DOI). Identifiers for chemical names include CASRN, InChI (IUPAC International Chemical Identifier), and SMILES.

Chapter 16, the Epilog stresses again the need for teaching information literacy and management, a most exciting area for libraries and librarians.

Five of the chapters are interviews. John Fourkas, associate editor of the Journal of Physical Chemistry, provides his views on the publishing process and the ethical questions raised in Chapter 3.  Cherifa Boukacem-Zeghmouri provides insight on the evolution of electronic resources, including social media, in Europe.  “The Complexities of Chemical Information” is an interview with Gary Wiggins, founder inter alia of CHMINF-L. I especially identify with his quote, “Many intricacies of the CA and MEDLINE files are masked by user-friendly systems like SciFinder.”  Always interesting, Eugene Garfield gives another fascinating interview “From the Science Citation Index to the Journal Impact Factor and Web of Science”.   I concur with the need to at least scan tables of contents of journals and I regularly did that even after I left the lab.  The last interview in the book is with Bonnie Lawlor, “What it Looked Like to Work at ISI,” amplifying her excellent chapter in the ACS Symposium Book (1).

I have just a few minor quibbles. The discussion of open access in Chapter 2 could include citations to the work of Henry Rzepa and Peter Murray-Rust as well as the chemical structure and data checking program of the latter, used to encourage authors and publishers to correct errors.  Categories of Open Access journals are discussed, but there is no comparison of author publication fees.  In Chapter 6, CAS Registry Numbers have much more extensive use than just searching for chemical properties.  They are the search item of choice for all aspects of chemical compounds in any file in which they exist.

I found some valuable insights in reading this book. Because of mentors, I had developed excellent paper notebook practices both in academic and commercial labs.  I was also the victim of notebook fraud committed by a predecessor, and its impact on company research and my job.  Although decades removed from the lab, I can see the advantages of ELNs in modern research for management of the mountains of data being produced.  I’m way overdue on acquiring and using citation management software and I’m prepared to take the plunge.  Other than reading select blogs for scientific information, I don’t use social media for those purposes, but I appreciate the need for documentation in those media.  Even though I was never an academic, I’ve been involved in chemical and patent information literacy education for decades (including college and high school classes and end-user training) and I appreciate the author’s activities in this area, especially Chapter 6.  Since I’ve just submitted a chapter on citation practices for the chemical information issue of the Journal of Chemical Education, I’m also appreciative of the coverage of those subjects in this book and I’ll add a note and citation in proof in my article.  I also identify with the quote by Joshua Schimel in Chapter 8, on the progression from data to understanding via information and knowledge, a concept I’ve used frequently.  As a decades-long user of abstracted and indexed resources, I agree with Gary Wiggins that those resources are superior to end-user friendly files for optimal retrieval of chemical information.

Bibliographies appear at the end of each chapter and an index is included.  I highly recommend this book.  Librarians should acquire three copies: one for their collections, one for themselves, and one for their administrators in order to furnish information management advice at that level and avoid mismanagement.

(1) Lawlor, B. The Institute for Scientific Information: A Brief History. In The Future of the History of Chemical Information; McEwen, L. R., Buntrock, R. E., Eds.; ACS Symposium Series 1164, American Chemical Society, Washington, DC, 2014; pp 109-126.

Robert E. (Bob) Buntrock
Buntrock Associates
Orono, ME

 

Kovacs, L., Csupor, D., Lente, G., Gunda, T., 100 Chemical Myths: Misconceptions, Misunderstandings, Explanations, Springer: Heidelberg, New York, 2014. 396 p. + xxii. ISBN 978-3-319-08418-3 Hardcover, 51.99 Euros ($66.49 Amazon).

Translated from the Hungarian (original 2011), this excellent book indeed covers 100 “myths” in chemistry and related sciences.  The myths/chapters are grouped in four sections: General Food, Medicines, and Catastrophes and Poisons.  All chemicals and topics are cross referenced within the text if mentioned or described in another chapter.  Structures are shown for all chemicals, either 2D (with indicated stereochemistry) or space filling or both.  All concepts appear in boldface when first mentioned and are also described in the Glossary.  The topics are of general interest in both the U.S. and Europe, but given the authorship, a few are more of interest to Hungarians.  Organized by chapter, source references as well as references for further reading are shown at the end of the book.

The stated mission is to correct chemical myths, especially the all too-rampant chemophobia. The early chapters deal with more general topics including chemophobia, “natural” vs. synthetic chemicals, delusions of the safety of natural products, health factors of organic foods, toxicity and dosage, factual vs. misinterpretations of TSCA chemicals, REACH regulation, and the relative risks of food additives. Some topics covered in subsequent chapters include the Ozone Hole, lead toxicity, food dyes and preservatives, fats and oils, sweeteners, MSG, salt, caffeine, food safety and fraud, generic vs. proprietary medicines, various medicinals and placebos, herbals, homeopathy, vitamins, the hoax of detox, antioxidants, poisons, mercury, ozone, diamonds, water, war gases, organophosphates, BPA (safe at regulated levels), and ending with Erin Brockovich and chromates. Definitive answers are not always given, but the discussions and further reading should help the readers make their own decisions.

Did I always agree with the author’s findings?  No, but my disagreements were few and unimportant.  For example, I don’t agree with the research on the source of less sliding friction in ice skating.  The researchers apparently never skated outdoors at a range of temperatures.  Overall, an excellent catalog of both chemical information and misinformation. Current hot topics in the U.S. including the chemophobia of the “Food Babe” and pros and cons of vaccinations are not included but many other newsworthy topics are.  Highly recommended for a wide variety of audiences.

Robert E. (Bob) Buntrock
Buntrock Associates
Orono, ME

Robert J. Massie, 1949-2015

When Vin Scalfani asked for someone to write a tribute to Bob Massie, I said I could if no one else was able to. Maybe this “veteran” searcher (aka dinosaur) was not the best choice, but I did work with Bob throughout his career at CAS and knew him reasonably well. I spent several terms on the ACS Joint Board-Council Committee on CAS (CCAS), including a few during Bob’s administration.  In my career, I had interactions with at least four heads of CAS and he was the best. I typically worked for chemists and I wondered what it would be like working with a CAS head who was a career manager, but the experience was great. Over the decades, I also interacted with CAS and CAS staff both as a user of CAS-produced information and as a consultant a few times.

For obituaries of Robert J. Massie (known to many both in and outside of CAS as Bob) see those in C&EN ( http://cen.acs.org/articles/93/web/2015/06/Robert-Massie-Dies-66.html ) and the Columbus Dispatch (http://www.dispatch.com/content/stories/business/2015/06/11/chemical-abstracts-leader-a-civic-titan.html ). 

In addition, an announcement of Bob’s retirement in 2014 was also a tribute to him and his career at CAS (http://cen.acs.org/articles/91/i28/CAS-Head-Retire-2014.html). Bob received several awards for his work, including the International Patent Information Award for 2011 from PIUG, the Patent Information Users Group (http://cen.acs.org/articles/89/i25/Patent-Information-Award-Robert-Massie.html ), the 2003 Patterson Crane Award from the Columbus and Dayton Sections of the ACS (http://pubs.acs.org/isubscribe/journals/cen/81/i19/html/8119awards2.html), and the Miles Conrad Award (http://www.nfais.org/miles-conrad-lectures)

Throughout the years, CAS and CAS management, as a branch of the ACS, have been criticized both within and outside of ACS for being too revenue oriented for a scientific organization.  Although I was never involved in any management decisions; I was made aware of the fact that not-for-profit publishers were increasingly required to compete with for-profits.  In addition, ACS Governance set revenue goals for CAS publications which fell mainly on CAS and affected their pricing and marketing. Several tributes to Bob observed that CAS (and therefore ACS) was in trouble financially and in marketing when Bob took over in 1992.  His leadership is credited with turning CAS around and contributing to its current (and hopefully continuing) success. Improved relations with and operations of STN are cited (I can testify to that observation) as well as the release and development of SciFinder.

Bob’s activities were not confined to CAS. The Columbus Dispatch obituary is titled, “Chemical Abstracts leader was a civic titan.”  He served on several boards in the Columbus area ranging from the Columbus Symphony to hospitals and education.  A tribute to Bob on his retirement by the head of Battelle also cited his CAS and civic accomplishments (http://www.dispatch.com/content/stories/editorials/2014/03/31/chemical-abstracts-owes-much-to-massie.html).

Even as a CAS outsider, I found working with Bob to be a great experience.  His management skills were always apparent.  He was a good listener, entertained debates, and was able to summarize and provide consensus for the resulting decisions.  Although not a techie, he was quick to learn and was able to provide business management, but also retain the respect of all for the need to manage a scientific service organization in the proper and mutually productive manner.

My memories and impressions of Bob are similar to that of a Columbus board member cited in the Columbus Obituary, “He consistently made things better, and his quick wit and ornery twinkle made the journey all the more enjoyable.”  My best memory of Bob’s wit was when SciFinder was being developed.  Someone at CAS told me that the operating name for the project was Artemis, for the Goddess of the Hunt. They also said that when Bob saw the demo, he exclaimed, “This dog can hunt.”  The search was on for a product name.  I found that one legend named Laelaps as the hunting dog of Artemis so I submitted that as a product name.  By the next CCAS meeting SciFinder was rolled out and I asked Bob the origin of his endorsement.  With his characteristic twinkle and wit he said that he’d always been impressed with Sam Ervin’s typical pronouncement at the Watergate hearings, “This dog can hunt.”

I last saw Bob in 2012 at the ACS Meeting in Philadelphia.  We had planned to meet for lunch, but since I only have a “dumb” phone, that didn’t happen.  We did have a brief chat and promised to try to get together soon.  Due to my only occasional meeting attendance and Bob’s retirement and illness, that never happened.  I was also to have interviewed him last year on his retirement, but that regretfully never happened either.

From a fellow Bob, Bob Massie: thanks for the memories. You’ll be missed.

Robert E. (Bob) Buntrock
Buntrock Associates
Orono, ME

 

 

An Interview with Svetla Baykoucheva

Chemist, Microbiologist, Translator, Editor, Interviewer, Librarian, and Author Extraordinaire

By Vincent F. Scalfani

Image

 

 

 

 

 

Svetla Baykoucheva with her dog Max (right) and a friend’s dog (Cosmo), at Great Falls, Maryland

Bio: Svetla Baykoucheva (Baykousheva) has BA and MS degrees in chemistry, a PhD in microbiology, and an MLIS. For more than 20 years she performed lab-bench research in biological membranes and lipid metabolism and has published more than 40 articles in peer-reviewed scientific journals such as the Journal of Biological Chemistry, Biochemistry, Journal of Chromatography, and FEBS Letters. She spent a significant part of her career at the Institute of Microbiology of the Bulgarian Academy of Sciences. Her initial research was focused on the chemical basis of bacterial pathogenicity and the mechanisms by which virulent strains of bacteria survive and overcome the defense systems of the body. As a post-doctoral fellow of the International Atomic Energy Agency, she specialized for one year at the University of Paris VI on the use of isotopes in studying bacterial membranes. While performing research on the metabolism of polyunsaturated fatty acids in Ohio State University, she enrolled in the Kent State University’s Library & Information Science Program. For eight years she was manager of the ACS Library and Information Center in Washington, D.C. Since 2005 she is the head of the White Memorial Chemistry Library at the University of Maryland College Park, where she manages a busy academic branch library and teaches scientific information. Svetla has also served for five years as editor of the Chemical Information Bulletin (CIB).  In recognition of her work, and particularly for all her efforts to transition the CIB from print to online, she received the prestigious Val Metanomski Meritorious Service Award, which is given to members of the ACS Division of Chemical Information (CINF) who have made outstanding contributions to the Division. Under Svetla’s editorship, CIB gained an attractive new look and layout and was enriched with a wider range of content, including many interviews with scientists, librarians and publishers. Her book “Managing Scientific Information and Research Data” was just published by Elsevier (Chandos Publishing Imprint).

 

“We should really feel lucky that we are living at a time when so much scientific information is available and so many sophisticated tools allow us to retrieve, refine, and manage it.”

            —Svetla Baykoucheva, Managing Scientific Information and Research Data (2015)

 

Vincent F. Scalfani:

Svetla, I first met you in New Orleans at the 245th American Chemical Society Meeting in 2013. You presented a talk in the Chemical Information Division entitled “Role of personal interests, motivation, and timing in the transitioning to a new career.” I remember being fascinated with the diversity of your background as well as the depth of your knowledge in numerous areas such as microbiology, publishing, and scientific information. Can you give us a brief overview of your journey to becoming the Head of the White Memorial Chemistry Library at the University of Maryland, College Park?

Svetla Baykoucheva:

My bio describes my career path. What it does not say, though, is why I have chosen this path. My “dual life” started when I had to decide what undergraduate education I should pursue. My mother was a journalist, and I was brought up in an environment where literature, history, philosophy, and languages were discussed all the time. I was also very interested in the sciences, and particularly in chemistry. My high school in Bulgaria was an English-language school, where everything (except for Bulgarian history and literature) was taught in English. Our chemistry teacher challenged us both scientifically and personally. He had no mercy for us: if someone said something wrong or stupid, he would call it wrong or stupid. He might have influenced my decision to choose chemistry for my undergraduate education.  Another role model for me was Marie Curie, whose life and work inspired me to become a scientist. 

Early in my research career I started following the essays of Eugene Garfield that he was publishing in Current Contents. Although he called them “Essays of an information scientist”(Garfield, 2014), they were devoted to many other topics. I still vividly remember some of these essays — “Are you what you wear?”, “I never forget a face!”, “Memory and super memory: I’ll never forget what’s his name!” Garfield wrote about ice cream, the hazards of sunbathing, and windsurfing. But he also wrote about Scientometrics, Nobel Prizes, citation indexing, scientific publishing, and the British Library. In his essays, he demonstrated that you can talk about serious things without being boring. And that you can write in such a way that even people who are not experts in the field can understand what you are saying.

Reading Garfield’s essays and discussing them with colleagues at the beach while attending international scientific conferences on the Black Sea (I have described this elsewhere (Baykoucheva, 2007)) became a favorite pastime for me. Along with performing research in the lab, I began writing articles for popular science and literary journals on a broad range of topics. My interest in languages created a parallel career for me as a translator and editor of scientific and other publications.

My stay in Paris for one year as a postdoctoral fellow of the International Atomic Energy Agency allowed me to learn new research techniques and broadened my interests in literature and history. Upon my return to Bulgaria, I published articles about cultural life in France and historical places that I had visited.  I wrote about the megaliths in Brittany, Carcassonne (an ancient fortress in the South of France), French literary awards, the French literary program “Apostrophes,” the events of May ’68 in Paris, and many other things. All these topics required extensive research in history, literature and even politics. When looking back, I can now see that the seeds for my transition from the lab bench to information science were planted at that time.

VFS:

Congratulations on your new book, “Managing Scientific Information and Research Data.” In the past, you have been very active with publishing peer reviewed articles. Is this your first book? What was your experience like transitioning from writing articles to writing a monograph?

SB:

Writing and publishing a book is a very different experience from writing an article for a peer review journal. This is my first book, and when I was writing it, I kept in mind that it might be read by people who are not experts in the topics I was discussing. A book gives you more flexibility to approach topics from different angles. Sometimes, I was not sure where to draw the line between presenting an issue and sharing my personal experiences. When writing a paper for peer review journals, you don’t have these concerns. The production stage was a very challenging experience for me. There are many citations in the book, and making sure that all are correct was a daunting task. I had to go through the whole book many, many times, and each time I found something that needed to be corrected. I have read somewhere that an author never finishes a book; he just lets it go. The challenge is even bigger when you are writing on topics that are changing so quickly. I agreed to do the book index myself and had to learn how to use a program that could index large PDF files. This made my involvement in the production even more intense. While working on the book, I learned many new things. Without having to do the research for the book, I might have never learned them. I had complete freedom to write the book as I wanted. The editor, George Knott, and the publisher, Glyn Jones, as well as the staff of Elsevier/Chandos Publishing provided me with great support throughout the whole process.

VFS:

Why did you write “Managing Scientific Information and Research Data?”  What gap does your book fill in the scholarly literature? 

SB:

With science becoming more and more interdisciplinary and the volume of data growing so fast, there is a need to look at scientific information and how we manage it from a new and broader perspective. The book discusses this topic from many different angles. Scientific communication, ethics in publishing, new communication models, peer review, data management, eScience, and electronic laboratory notebooks are all discussed in the context of the main theme of the book. A critical analysis of the traditional metrics for evaluating research, as well as the new area of Altmetrics, which measures attention to research, are discussed in several chapters. The format of the book is unusual: some of the chapters are reviews of particular areas, others are interviews with experts in scientific information and publishing, and there are also chapters that provide practical information that can be used to teach information literacy or just to improve your own research skills. 

VFS:

What messages do you hope readers will take home from “Managing Scientific Information and Research Data?” Moreover, what questions or opportunities do you hope your book creates?

SB:

My goal in writing this book was to introduce students, researchers, and librarians to some new areas of publishing and scientific information. I also wanted academic librarians to see their roles in a new light and get them excited to do new things. I hope I was able to convey my enthusiasm for teaching information literacy, as I am convinced that information literacy, as it is discussed in the book, will be one of the most interesting areas of engagement for academic librarians, but they have to do it with passion.

VFS:

When you were Editor of the ACS Chemical Information Bulletin from 2005-2010, you conducted numerous interviews with scientists, editors, and scientific information experts (www.acscinf.org/content/interviews). After 2010, you continued to do such interviews as a contributor to the CIB. How have these interviews advanced your understanding of how to manage scientific information and research data? Further, your interviews are an integral component of your new book. How do these interviews fit into managing scientific information and data?

SB:

The interviews that I did for the Bulletin provided me with many ideas and helped me see STEM publishing and scientific information from many different sides. The interviews included in the book are very important, as they approach these topics in different ways. One of the interviews is with John Fourkas, associate editor of the ACS Journal of Physical Chemistry, who shares an insider’s view on how articles submitted for publication are processed and evaluated. Chérifa Boukacem-Zeghmouri gives an interesting perspective on how graduate students and experienced researchers in French academic institutions gather information and use social media. Gary Wiggins discusses the challenges presented by the complexity of chemical information and the changing role of science librarians. Eugene Garfield describes how he came up with the idea of using citations in scientific articles to organize and manage information and to create the Science Citation Index. The latter became the foundation on which Web of Science and other important information products were built. The interview with Bonnie Lawlor was previously published in the Bulletin (Baykoucheva, 2010). Bonnie worked at the Institute for Scientific Information (ISI) for 28 years and in her interview she vividly describes the atmosphere there in the 1960s, when the Science Citation Index, Current Contents, and other innovative products were created. She also talks about what it was like to work with Eugene Garfield. All of these interviews, in one way or another, cast light on the central topic of the book: managing scientific information and research data.

VFS:

In your introductory chapter you wrote that “organizing scientific information is at the core of doing science.” This is a surprisingly simple statement, but also incredibly profound. Has this thought guided your career?

SB:

I would like to answer this question with a quote from my book:

“We cannot imagine what science would have looked like today without the Periodic Table of the Elements in which Dmitrii Mendeleev not only arranged the existing chemical elements, but also included reserved spaces for those not yet discovered... The management of scientific information starts with how scientists gather information, organize their data, and communicate their findings. Today, they can “hang out” in the same environment where they can do so many things: search for literature and property information at the same time; see how many times an article they were looking at has been viewed, downloaded, and cited; forward an interesting article to others and comment on it; and find out what others are saying about their own research on Twitter and Facebook. Creating, organizing, searching, finding, and managing scientific information are all “moments” that blend seamlessly with research activity at the lab bench and into our lives.”

VFS:

Over the past several decades you have been part of many changes in the way scientific information and research data are disseminated and managed. For example, you led the transition of publishing the CIB from print to online back in 2010 and have more recently introduced electronic laboratory notebooks at the University of Maryland, College Park. What do you see as some of our biggest challenges moving forward with managing scientific and research data? What advice would you give to current and future researchers, librarians, and information specialists? 

SB:

The biggest challenge is keeping up with the technology. Each time a new tool or device comes out, many people rush to use it. We should not miss, though, what is important in terms of content.

In 2012 I attended the Annual Conference of the International Federation of Library Associations and Institutions (IFLA) held in Helsinki (Finland). An interesting discussion took place when a librarian from a Finnish public library reported how the library had introduced two new services. One of them was called “Ask Us Anything,” and the other one was promoted as “Ask a Librarian.” The first service received many requests, while people rarely used the “Ask a Librarian” service. The conclusion was that the word “librarian” might have somehow made people more reluctant to seek help from that service. Academic libraries are confronted with many challenges and are forced to redefine their role in supporting research and education in their institutions. This will require re-skilling of librarians and new attitudes.

We are at a stage when the academic libraries are trying to play a role in data management and eScience. Time will show whether these efforts will be worth pursuing further.

VFS:

Lastly, in my subtitle I characterized you as a Chemist, Microbiologist, Translator, Editor, Interviewer, Librarian, and Author. Is there anything I missed? What else brings you joy either professionally or in your personal life?

SB: 

I have been through many hobbies, interests, sometimes even obsessions. I rarely read novels, anymore. I read history books, literary criticism, books on technology, scientific writing, and biographies. I like gadgets and have done a lot of digital photography and digital videos. When working at ACS, I made a video with an interview with Peter Stang, the editor of the Journal of the American Chemical Society (JACS). This video was shown at formal events at an ACS national meeting in New York. Video editing takes a lot of time, and with my writing, I don’t have time to do it very often, anymore.  I have posted some of my photographs of libraries and some cultural landmarks (Musée d’Orsay, the grave of Scott and Zelda Fitzgerald) on the web page of the Chemistry Library of the University of Maryland College Park (http://www.lib.umd.edu/chemistry/photo-gallery/home). I also love dogs, hiking, and traveling.

As I have described elsewhere (Baykoucheva, 2007), the scope of my research has allowed me to establish close professional and even personal ties with many scientists in the United States, France, and many European countries. Without the support of many organizations, administrators, friends, colleagues, and my family, I wouldn’t have been able to achieve what I have achieved. When working at the ACS, I was able to gain an insider’s view of the scientific publishing field, attend many professional conferences in the United States and abroad, and establish long-lasting connections with many scientists, editors of scientific journals, publishers, and librarians. CINF has supported many of my activities and allowed me to use in the book extensive quotes from my interviews published in the Bulletin. I was very fortunate to come across so many interesting opportunities and meet such extraordinary people.

Links to articles related to “Managing Scientific Information and Research Data”

From the Science Citation Index to the Journal Impact Factor and Web of Science

http://scitechconnect.elsevier.com/science-citation-index-impact-factor/

Scientific Fraud: “Why researchers Do It?” [sic]

http://scitechconnect.elsevier.com/scientific-fraud-researchers/ (includes a free copy of Chapter 3 of the book, “Ethics in publishing”)

Untangling authors’ names:

http://scitechconnect.elsevier.com/untangling-authors-names/  

References

Baykoucheva, S. (2007). A Career in Science and Politics: Guy Ourisson (1926-2006). Chemical Information Bulletin, 59(2), 4-6. http://hdl.handle.net/1903/11414.

Baykoucheva, S. (2010). From the Institute for Scientific Information (ISI) to the National Federation of Advanced Information Services (NFAIS): Interview with Bonnie Lawlor. Chemical Information Bulletin. 62, from http://acscinf.org/content/institute-scientific-information-isi-national-federation-advanced-information-services-nfais

Garfield, E. (2014). Essays of an Information Scientist - Eugene Garfield.   Retrieved August 31, 2014, from http://www.garfield.library.upenn.edu/essays.html

An Interview with Kitty Porter

An Interveiew with Kitty Porter
Chemical Information Expert and Retired Reference Librarian at Vanderbilt University
By Vincent F. Scalfani

Bio: Kitty received a BS in Chemistry from Denison University and an MLS from the University of North Carolina (UNC).  In 1974 she became a cataloger at the Duke Medical Center Library and in 1980 took over the Chemistry Library. In 1998 she and her husband moved to Vanderbilt University where she joined the staff of the Sarah Shannon Stevenson Science and Engineering Library as a reference librarian and liaison for Chemistry and Chemical and Biomolecular Engineering.  Later she assumed responsibility for Materials Science and Mechanical Engineering. She was active in SLA where she served as Treasurer and Chair of the Chemistry Division, and in the American Chemical Society where she was chair of the Education Committee in the Division of Chemical Information.

Vincent F. Scalfani:

Kitty, congratulations on your retirement! While we have never met, I have been aware of your great work in the chemical information field ever since I began my career as a Science & Engineering Librarian three years ago. Can you give us an overview of your career as a librarian?

Kitty Porter:

I graduated from UNC in 1973, having studied about an amazing new entity, OCLC.  Hard to imagine life before that!! My first job was cataloging at the Duke Medical Center Library 5 PM to 9 PM as there were no desks free during the day. We made our own catalog cards and one of my first tasks was revising filing in the card catalog. BORING!  They were good about giving me six months off though to go live in the Netherlands with my family while my husband took a sabbatical.  In 1980 I took over the Chemistry Library. I had a half time assistant, a catalog of only my own branch, a silent 700 terminal for searching, and a building full of great chemists. Before I left, we had a locally developed online catalog with two dedicated computers and a bank of Macs for patrons. During the Duke years I got active in both SLA and ACS. In 1998 my husband was recruited by Vanderbilt and we moved lock, stock, and barrel. I traded my branch library for a central science facility, a good trade for me. I really liked having on-site colleagues to share the desk schedule and the library life. After a year or so the Chemistry Department asked me to teach Chem 250, Chemical Information.  I jumped at the chance and had a great time with it. I learned a lot at Vanderbilt, the system was generous with travel money, so I got to attend a lot of meetings, and the divisional library set up gave us a fair amount of independence to try things, especially when we were lucky enough to hire Tracy Primich to be our director.  I’d have to say the years until she left for greener pastures were the best for me in terms of my professional life.

VFS:

Your “Finding Physical & Chemical Properties” guide is extremely useful and has been linked to by many other University Libraries (http://researchguides.library.vanderbilt.edu/Property). Even with modern chemistry databases and property search options, it seems that oftentimes nothing beats a manually-curated guide for locating property data. What prompted you to develop this guide?  I’m curious if you had any thoughts about other similar comprehensive guides and/or small databases that chemistry librarians should be focused on creating? Perhaps unrelated to physical and chemical properties?

KP:

I created the guide while I was at Duke.  Evenings and weekends the chemistry library was staffed by student assistants and they often had problems helping people find properties. The first version was linked to the Duke collection. In those days there weren’t a lot of things on the open web so that part of the guide was pretty small.  When I moved to Vanderbilt, I took the guide along with me and adapted it to the Library of Congress classification and the Vandy collection. Over the years the online part has grown as more and more of the traditional resources went online. There are always resources that disappear or are moved to storage, so every summer I would update the information. I don’t know what will happen to it now, or even if it is really needed anymore. I think that this is a very individual kind of resource.  Such things are created to help with special kinds of information or collections. For us at Vanderbilt another important small database is one about maps as there is a medium-sized map collection and the map librarian is not always around to answer questions and help find specific maps.

VFS:

How has chemical librarianship changed over the past several decades?

KP:

Ha!! I will sound like a dinosaur here I think. It is hard to imagine what a library was like back in 1974 when I started as a medical library cataloger. When I started at Duke in the Chemistry Library, we didn’t have an online catalog. In fact, all we had was our own shelflist.  If I wanted to find a book, even one in my own library, I had to call the main library reference desk and ask them to look it up for me.  This was a problem at first until I got to know the collection. There was no end-user searching, and only a silent 700 acoustic-coupler terminal for searching.  One fun thing I did was to go to Columbus to CAS for a week’s training in how to name compounds and do CA searches.  They trained us so well, that I became a point person for grad students in naming their compounds. Today no one has to do this with the available software for naming compounds. I was on the original Web task force in 1993 and spent a lot of time creating a web presence for my library.  In those days there was time to do a special page every Friday on what is new and fun.  I remember including Driveways of the Rich and Famous in one week’s offering.  It wasn’t too long until there was way too much being introduced to gather and summarize on a weekly basis.  Too bad because that was really fun. I also learned to write my own HTML doing this and that proved very valuable. In those years, I saw a lot of the grad students and even some faculty as they had to come in to the library for everything. I had a good grasp of what everyone was doing, and could see the books they scanned on the new book shelf and what they checked out. That changed pretty quickly when we moved to Vanderbilt.  The Science Library is on the ground level of the science complex, just an elevator ride from the chemistry labs so it isn’t distance that kept people away.  Vanderbilt was quick to jump on ejournals and it wasn’t long until we had online most of the things that chemists needed. So, I had to work harder to keep up with what people were doing and who they all were. When I came we had a large section for display of current journal issues, more than seven double ranges three slanted shelves high. Today there is one range with current issues only on one side.  That’s one huge change.  The other is the availability of user-friendly databases, especially SciFinder.  That pretty much ended mediated searching. Over the years, the number of reference questions has dwindled and the content of the questions has changed.  Now we librarians are pretty much the “go-to” people for all kinds of electronic resource problem solving.  This means we had to learn a whole new set of skills.  While still at Duke I was lucky enough to go through an Apple network training course so I could be a point person for our switch from PCs to Macs. That proved to be a really useful opportunity even though the practice changed and at VU we were definitely NOT a Mac shop.

These last several things are part of the biggest change, the declining importance of the physical collection.  This is a hard one for those of us well-steeped in library tradition to accept.  Over the years I have bought a lot of chemistry books and seen them used heavily on reserve and checked out often.  Now those very volumes are seldom read.   We had to do a lot of weeding do make space for other kinds of services so would send them to storage, offer them to patrons, or even pitch them. Our monograph circulation has dropped way off. Just as with journals, many people would rather get a book online. So why should we keep shelf after shelf of dusty books. The books that we have been buying are mostly ebook collections and individual volumes. While some of us might regret that, I have to admit that when I do research, I vastly preferred to use ejournals and ebooks myself.

VFS:

Can you tell us about your involvement in the ACS Chemical Information Division over the course of your career? Were you active in any other professional societies?

KP:

I joined the ACS as soon as I started in the Duke Chemistry Library in 1980 and was active in the Chemical Information Division Education Committee.   Most years I went to both annual meetings.  They were always lots of fun.  In addition to the programs, the chance to get together with other chemistry librarians, all of whom are great fun at meetings, was something special.  Also I enjoyed the interactions with vendors. This is really the way to get noticed and have your opinion matter.  It was satisfying being someone a vendor called to get a comment on a new or proposed product, or the chance to be a beta test site. 

I also belonged to SLA and was treasurer and also Division chair and program planner.  I enjoyed that as well.  SLA meetings, being smaller, were in smaller cities so I got to see Winnipeg, Canada and San Antonio, Texas.
I served on the ACS Library Advisory Board in its first iteration. The yearly meetings in DC were really interesting as well as fun.  It was another opportunity to work closely with librarians in government and industry and to appreciate the very different problems they face.

VFS:

What advice would you give to new chemistry librarians? What are the biggest challenges we need to overcome?

KP:

We need to remain useful.  The new librarians already come with a set of skills that we oldies had to learn on the job.  We should be learning a lot about data curation and offering to help our users deal with the data required by granting agencies. We should be out talking to people in their labs and meetings. I really enjoyed my teaching and the students always told me how valuable it was for them.  So we should be out there convincing departments that a basic information course for first year grad and upper-level undergraduates in really essential (if they don’t already know that). We need to be able to at least help people with media production and 3D printing.

They need to keep their eye on the world out there and pay attention to what people are doing. Some of the most interesting new developments are coming out of research groups or being done by groups of grad students.
I would also point out to them that it is really important to have fun and to get to know your people.  Chemists are interesting people who have a lot to teach about life as well as chemistry. 

VFS:

What was your favorite job responsibility as a reference librarian?  What are you most proud of accomplishing during your career as a librarian? 

KP:

My favorite job responsibility was teaching, whether a seminar for incoming grad students in chemistry or in chemical engineering, a one-shot presentation to a drug development class, an engineering class or even a freshman writing seminar or my one-credit chemical information class.  Especially with the grad students you form a relationship and they come back to you with their questions as long as they are around.  I think I am most proud of my Chem 250 course.  I tried to keep it always fresh and up-to date.  I had a student who took it twice (paid for it both times!) and he said it was very useful both times as it had changed so much.

VFS:

So what’s next for Kitty Porter? What do you hope to accomplish during retirement? What do you enjoy doing in your personal life?

KP:

Well, I hope to go visit my grandkids more. They are now in high school and middle school and very involved with soccer year round.  So if I want to see them outside of Christmas week in Tennessee or a week or two at Sunset Beach, North Carolina in the summer, I have to travel to Oakland, CA and Columbia, MD.  Also I am taking piano lessons and I finally have enough time to practice as much as I want to.  There’s a great exercise class three days a week at the local Rec center I take so I can stay mobile.  Then there are my dogs to keep busy with walks and ball games. I would like to join a book group, take watercolor lessons, and get back to my Spanish studies. I still have the last two Harry Potter novels, the Lord of the Rings, and the Narnia chronicles plus a pile of novels to read in Spanish. I have never had time to do any volunteering so I’d like to find some way to use my expertise and experience to help out, maybe at the public library and the animal shelter.  I joined the board of the Friends of Fort Negley, a Union Army civil war fort that is in Nashville. Vanderbilt has a retirement learning institute that is part of the OSHER program and offers interesting classes.  I just took one on soul food cooking taught by Alice Randall and her daughter that was wonderful fun.  My games closet is full of new jigsaw puzzles, and there is a bike in the garage and an elliptical trainer in our tiny exercise corner. There are lots of geological sites in Tennessee that I want to see. So I think I will keep busy.

 

Notes From Our Sponsors

Image

 

 

Image

 

Division of Chemical Information Sponsors Fall 2015

The American Chemical Society Division of Chemical Information is very fortunate to receive generous financial support from our sponsors. Their support allows us to maintain the high quality of the Division’s programming, to promote communication between members at social functions at the ACS fall 2015 National Meeting in Boston, MA, and to support other divisional activities during the year, including scholarships to graduate students in chemical Information.

The Division gratefully acknowledges contributions from the following sponsors:

Gold           ACS Publications

                   Royal Society of Chemistry                 

 

Bronze       ACS Graduate & Postdoctoral Scholars Office

                   ACS Undergraduate Programs Office

                   Journal of Cheminformatics (Springer)

                   Cresset

                   Novartis

                   Optibrium

                   PerkinElmer

                   Pfizer

                   Thieme Chemistry      

 

Contributors  AAAS/Science

                     Bio-Rad Laboratories

 

Opportunities are available to sponsor Division of Chemical Information events, speakers, and material. Our sponsors are acknowledged on the CINF web site, in the Chemical Information Bulletin, on printed meeting materials, and at any events for which we use their contribution. For more information please review the sponsorship brochure at http://www.acscinf.org/PDF/CINF_Sponsorship_Brochure.pdf. Please feel free to contact me if you would like more information about supporting CINF.

Phil Heller
Chair, Fundraising Committee  
Email: sponsorship@acscinf.org
Tel: 917-450-4591

The ACS CINF Division is a non-profit tax-exempt organization with taxpayer ID no. 52-6054220

 

The RSC Historical Collection: a new digital archive

Image

 

 

 

A brand new digital archive is now available from the Royal Society of Chemistry.

The Historical Collection brings together the most important works from our archives and features over 380,000 pages giving access to 500 years of scientific history, making it a fascinating and invaluable addition to any science library.

The Historical Collection includes:

Society Publications (19492012)

Browse copies of Chemistry in Britain from the ‘60s, Monographs for Teachers and Royal Institute of Chemistry lecture notes.

Society Minutes (18411966)

Explore the history of the Royal Society of Chemistry.

Historical Books and Papers (15051991)

Access some of the oldest chemical science publications in existence and gain understanding of the key moments that shaped our current understanding of chemistry.

Access to the Historical Collection will give real insight into the history and development of science, an understanding that may help shape future discoveries.

Interested in gaining access to this wealth of information? Contact sales@rsc.org for more information.

The Royal Society of Chemistry is the world’s leading chemistry community, advancing excellence in the chemical sciences. With over 51,000 members and a knowledge business that spans the globe, we are the United Kingdom’s professional body for chemical scientists, supporting and representing our members and bringing together chemical scientists from all over the world.

A not-for-profit organization with a heritage that spans 170 years, we have an ambitious international vision for the future. Around the world, we invest in educating future generations of scientists. We raise and maintain standards. We partner with industry and academia, promoting collaboration and innovation. We advise Government on scientific policy. And we promote the talent, information and ideas that lead to great advances in science.

In a complex and changing world, chemistry and the chemical sciences are essential. They are vital in our everyday lives and will be vital in helping the world respond to some of its biggest challenges.

We’re working to shape the future of the chemical sciences, for the benefit of science and humanity.

Our global publishing business helps us to do just that. We publish more than 40 peer-reviewed journals, two magazines, and over 1,300 books, spanning analytical science, biological chemistry, catalysis, chemical biology & medicinal chemistry, energy, engineering, environmental science, food science, general chemistry, inorganic chemistry, materials science, nanoscience, organic chemistry and physical chemistry.

To find out more about our products and services, visit our website www.rsc.org/publishing

ACS Graduate & Postdoctoral Scholars Office Highlights

Image

 

 

www.acs.org/grad

Academic Employment Initiative (AEI) poster session (at SciMix) www.acs.org/aei. Faculty from departments that are currently searching for new faculty are highly encouraged to attend.

The ACS Graduate & Postdoctoral Chemist magazine www.acs.org/gradchemist

The ACS Preparing for Life After Graduate School program (www.acs.org/gradworkshop). A career development workshop from ACS.

This two-day workshop is designed to inform chemistry graduate students and postdocs about their career options and how to prepare for them:

Examining careers for PhD chemists
Describing careers in business and industry
Knowing critical non-technical skills
Finding employment opportunities.

To bring this workshop to your department, see www.acs.org/gradworkshop or contact GradEd@acs.org; 202-872-7707.

Springer Chemistry News

 

Improved Impact Factors for Springer Chemistry open access journals

Journal of Cheminformatics has received its new improved 2014 Impact Factor of 4.55. There are also great articles published in the cross-journal Jean-Claude Bradley Memorial Series (http://bit.ly/JeanClaudeBradley)

Image

 

 

 

 

 

 

Chemistry Central Journal latest Impact Factor is 2.19

Image

 

 

 

 

 

 

Changing landscape of scientific publishing: Open access, open data, and more

Learn more during the talk of Springer publishing editor Charlotte Hollingworth on Monday, August 17, 2015, at 5:25 pm during the CINF session on “The Growing Impact of Openness in Chemistry: A Symposium in Honor of JC Bradley” (Room 103 - Boston Convention & Exhibition Center)

Learn even more by visiting the Springer booth # 649 at the ACS Fall exhibition, open Sunday: 6 – 8:30 PM, Monday and Tuesday 9AM – 5PM

Steffen Pauly, Editorial Director Chemistry

ORCID iD: http://orcid.org/0000-0001-9768-9315

www.springer.com
www.chemistrycentral.com

Optibrium and NextMove Software collaborate to introduce matched series analysis within StarDrop

Image

 

Optibrium Ltd, is pleased to announce the launch of version 6.1 of its StarDrop platform to guide researchers in the design of high quality compounds in drug discovery. This latest release introduces the Mats and SAR transfer technology, developed by Dr. Roger Sayle and his team at NextMove Software Ltd. Based on matched molecular series analysis, these algorithms predict new chemical substitutions that are likely to improve target activity.

The addition of matched series analysis extends the capabilities of StarDrop’s Nova module that automatically generates and prioritizes novel compound ideas. Matched series analysis goes beyond conventional “matched pair analysis” by using data from longer series of matched compounds (and not just pairs) to make more relevant predictions for a particular chemical series. In addition, all predictions are backed by experimental results which can be viewed and assessed when considering the suggestions (J. Med. Chem., 201457(6), 2704–2713). StarDrop’s unique capabilities for multi-parameter optimisation and predictive modeling enable efficient prioritization of the resulting ideas to identify high-quality compounds with the best chance of success.

Dr. Matthew Segall, Optibrium’s CEO, commented: “We are delighted to collaborate with NextMove Software, which has a proven track record of developing innovative informatics solutions for pharma companies worldwide. We are committed to working with NextMove to provide access to the leading compound optimization technologies through StarDrop’s unique environment that guides efficient discovery of novel, high quality drugs.”

Dr. Roger Sayle, CEO of NextMove Software added “We are excited to be working together with Optibrium and combining the Matsy technology with StarDrop’s intuitive interface for compound generation.”

For further information on Optibrium and StarDrop, please visit www.optibrium.com/stardrop/, or contact info@optibrium.com

Image

 

 

 

 

 

PerkinElmer’s ChemDraw Software Marks 30-Year Milestone with Launch of Enhanced Solution

Image

 

 

 

No. 1 Chemical Structure Drawing Tool Offers New Features for Chemistry and Biology Researchers 

WHAT:          

PerkinElmer, Inc., a global leader focused on improving the health and safety of people and the environment, recently announced the launch of an enhanced version of its ChemDraw  software.  For more than 30 years, ChemDraw software has been the chemical structure drawing solution of choice for scientists.  It allows for quick and accurate creation of publication-ready, chemically intelligent drawings for use in electronic laboratory notebooks, databases and publications, and for querying chemical databases. 

Scientists across academia, government, pharmaceuticals, biotech, chemical processing, environmental, food & beverage, and oil production use ChemDraw software as their chemical structure drawing application.  Key applications include: drug discovery, water and soil safety, creation of new food flavorings, environmental pollutant detection, industrial analysis, and review of by-products from oil residue. 

“ChemDraw software has evolved from a desktop offering in 1985 to a best-in-class solution that can be integrated with our electronic lab notebook, cloud-based platform and data visualization tools, along with mobile devices for learning and sharing in the classroom,” said Karen Madden, President, Informatics, PerkinElmer.  “Our informatics offerings help scientists capture, visualize, analyze, and convert their data into knowledge for better outcomes in human and environmental health.”

KEY FEATURES:   Features of the enhanced ChemDraw software include:

Streamlined product packaging: ChemDraw Prime (complete entry-level version), ChemDraw Professional (full-featured, advanced version) and ChemOffice Professional (single, top-level product).

Support for MacOS 10.10 Yosemite and Apple Retina display: Mac customers using this operating system and Retina-enabled devices can now take advantage of ChemDraw software’s features and functionality.

Enhanced handling of advanced stereochemistry: to ensure that significant stereochemical information is retained when moving structural information in standard structure-data files (SDfiles) between applications.
Ability to search for compounds and reactions in the SciFinder database from the Chemical Abstracts Service (CAS) directly from within the ChemDraw software without time-consuming cutting and pasting.

MORE:           

ChemDraw software supports MAC and PC platforms and has a mobile version for use with iPad devices which features Flick-to-Share technology, enabling users to draw structures anywhere and share with colleagues and classmates. It is also embedded in PerkinElmer’s E-Notebook software,  Elements platform, and Lead Discover software powered by the TIBCO Spotfire platform. ChemDraw software is also the standard drawing tool for submitting new chemical compounds to the U.S. Patent and Trademark Office and reviewing them.

ABOUT PERKINELMER: 

PerkinElmer, Inc. is a global leader focused on improving the health and safety of people and the environment.  The Company reported revenue of approximately $2.2 billion in 2014, has about 7,700 employees serving customers in more than 150 countries, and is a component of the S&P 500 Index.  Additional information is available through 1-877-PKI-NYSE, or at www.perkinelmer.com.

Image

Thieme Chemistry Releases Science of Synthesis 4.1 with New Content and Enhanced Usability

Thieme Chemistry has announced Science of Synthesis (SOS) 4.1, the latest release of its unique full-text resource for methods and experimental procedures in synthetic organic chemistry. Available as of June, Science of Synthesis will include the latest Knowledge Updates and additions from the Reference Library – a total of approximately 1,650 printed pages of new material. An enhanced interface design and increased content linking through Digital Object Identifiers further enrich the user experience.

The latest release of Science of Synthesis will see the addition of SOS Knowledge Updates comprising approximately 500 printed pages. It includes an entirely new chapter on five-five-fused hetarenes featuring examples of more unusual selenium and tellurium systems. The use of supercritical carbon dioxide as a reaction medium for organic synthesis is another focus. These are only the latest in a continuous series of knowledge updates that follow proven editorial processes and strict criteria for method selection to ensure consistently high content quality. New content will frequently be added to the digital version, which continues to be the most up-to-date evaluated digital reference work available, reflecting the latest developments in synthetic methodology.

The available content from the Science of Synthesis Reference Library has also been expanded to include two new volumes comprising a total of 1,168 printed pages. C-1 Building Blocks in Organic Synthesis (2 vols.), edited by Piet W. N. M. van Leeuwen and written by 54 experts, reviews a wide range of reactions to form C—C bonds, including reactions involving catalytic methods, an area that has seen significant developments in recent years. The authoritative overview includes contributions on the first catalysts to enable the introduction of fluoromethyl groups in aromatics.

Science of Synthesis 4.1 also comes with an enhanced interface design that features book covers with zoom functionality to facilitate navigation and allow for a quick overview of volume editors. The linking to the primary literature through Digital Object Identifiers is increased to further enrich the user experience. The latest SOS version also includes a number of bug fixes and general software improvements to ensure smooth and stable product performance. Many of the suggested fixes arose after intensive discussions with the customer base worldwide (through WebEx sessions, company visits and international roadshows).

To get access to Science of Synthesis 4.1 or a free trial please visit: http://sos.thieme.com

For more information about Science of Synthesis please visit the Website at www.thieme-chemistry.com/sos/

 

 

 

 

Technical Program Listing

ACS Chemical Information Division (CINF)
250th ACS National Meeting, Fall 2015
Boston, MA (August 16-20, 2015)

CINF Symposia

Erin Davis, Program Chair

[Created Wed Aug 12 2015, Subject to Change; Check ACS Online Program for Latest Changes]

CINF: Substance Identifiers, Addressing the Challenges Presented by Chemically Modified Biologics: The Role of InChI & Related Technologies 8:30am - 10:10am
Sunday, August 16
Room 104A - Boston Convention & Exhibition Center
Stephen Heller, Keith Taylor, Organizing
Stephen Heller, Keith Taylor, Presiding
8:30am-8:35am Introductory Remarks
8:35am-9:05am CINF 1: Generating canonical identifiers for glycoproteins and other chemically modified biopolymers
Roger Sayle1 , roger@nextmovesoftware.com, John May1 , Noel O'Boyle1
Abstract | Slides (pdf)
9:05am-9:35am CINF 2: Toward addressing informatics challenges presented by antibody drug conjugates
Sai Chetan Sukuru1 , chetan.sukuru@pfizer.com, Tianhong Zhang2 , Lawrence Tumey1 , Elwira Muszynska3 , Megan Tran4 , Frank Loganzo3
Abstract| Slides (pdf)
9:35am-10:05am CINF 3: Representation of chemically modified proteins in the Substance Index SPL Files
Yulia Borodina1 , yulia.borodina@fda.hhs.gov, Gunther Schadow2
Abstract| Slides (pdf)
10:05am-10:10am Concluding Remarks
CINF: The Growing Impact of Big Data in the World of Chemical Information 8:30am - 11:50am
Sunday, August 16
Room 104B - Boston Convention & Exhibition Center
Sean Ekins, Rudolph Potenzone, Antony Williams, Organizing
Sean Ekins, Rudolph Potenzone, Antony Williams, Presiding
8:30am-8:35am Introductory Remarks
8:35am-9:00am CINF 7: Challenges in big data chemistry using publicly available chemical information
Sunghwan Kim1 , kimsungh@ncbi.nlm.nih.gov, Gang Fu1 , Volker Hähnke1 , Lianyi Han1 , Bo Yu1 , Lewis Geer1 , Benjamin Shoemaker1 , Asta Gindulyte1 , Siqian He1 , Paul Thiessen1 , Evan Bolton1 , Stephen Bryant1
Abstract| Slides (pdf)
9:00am-9:25am CINF 8: Multiplexing analysis of 1000 approved drugs across 70 million PubChem entries: Will the correct structures please stand up?
Christopher Southan1 , cdsouthan@gmail.com
Abstract| Slides (pdf)
9:25am-9:50am CINF 9: How the availability of online data and datasets can underpin a platform of connected data
Antony Williams12 , tony27587@gmail.com
Abstract
9:50am-10:05am Intermission
10:05am-10:30am CINF 10: Applying cheminformatics and bioinformatics approaches to neglected tropical disease big data
Sean Ekins12 , ekinssean@yahoo.com, Jair Lage De Siqueira3 , Laura_Isobel McCall3 , Malabika Sarker4 , Maneesh Yadav4 , Elizabeth Ponder5 , Adam Kallel1 , Barry Bunin1 , James McKerrow3 , Carolyn Talcott4
Abstract
10:30am-10:55am CINF 11: Chemocentric informatics analysis of 'omics' data identifies novel associations between histone deacetylase inhibitors and neurodisease
Mary Bradley1 , mary_p_bradley@hotmail.com
Abstract| Slides (pdf)
10:55am-11:20am CINF 12: Chemical biology informatic approaches to identify and validate new therapeutic targets
Peter Kutchukian1 , peterkutchukian@hotmail.com
Abstract
11:20am-11:45am CINF 13: Analyzing ToxCast data using nebula (neighbor-edges based and unbiased leverage algorithm)

Huixiao Hong1 , Huixiao.Hong@fda.hhs.gov
Abstract

11:45am-11:50am Closing Remarks
CINF: Applications of Cheminformatics to the Diverse World of Natural Products 10:30am - 11:55am
Sunday, August 16
Room 104A - Boston Convention & Exhibition Center
Roger Schenck, Antony Williams, Organizing
Valerie Biehl, Antony Williams, Presiding
10:30am-10:35am Introductory Remarks
10:35am-11:00am CINF 4: Naming algorithms for derivatives of peptide-like natural products
Roger Sayle1 , roger@nextmovesoftware.com, Noel O'Boyle1 , Christopher Southan2
Abstract| Slides (pdf)
11:00am-11:25am CINF 5: Applications of cheminformatics to the diverse world of natural products
Antony Williams13 , tony27587@gmail.com, Serin Dabb2
Abstract
11:25am-11:50am CINF 6: Reliable structure characterization and elucidation: Finding and confirming the truth
Patrick Wheeler1 , pwheeler@yahoo.com, Antony Williams2
Abstract
11:50am-11:55am Concluding Remarks
CINF: Careers in Chemical Information and Cheminformatics Panel Discussion & Brunch 9:00am - 11:00am
Sunday, August 16
Room 52AB - Boston Convention & Exhibition Center
CINF: Visualizing Chemistry Data to Guide Optimization 1:00pm - 5:10pm
Sunday, August 16
Room 104B - Boston Convention & Exhibition Center
Erin Davis, Matthew Segall, Organizing
Erin Davis, Matthew Segall, Presiding
1:00pm-1:05pm Introductory Remarks
1:05pm-1:30pm CINF 25: Integrating data visualization into the drug discovery workflow
Patrick Walters1 , pat_walters@vrtx.com, Guy Bemis2 , Jun Feng2 , Brian Goldman2 , Georgia McGaughey2 , Jeff Orr2 , Emanuele Perola2 , Susan Roberts2 , Jason Yuen2 , Jonathan Weiss2
Abstract
1:30pm-1:55pm CINF 26: Data visualization: New directions or just familiar routes?
Edmund Champness1 , ed.champness@optibrium.com, Peter Hunt1 , Matthew Segall1
Abstract| Slides (pdf)
1:55pm-2:20pm CINF 27: Reaction discovery and optimization tools for visualizing chemistry data
Joshua Bishop1 , josh.bishop@perkinelmer.com, Phil McHale1 , Philip Skinner2 , megean schoenberg1
Abstract
2:20pm-2:45pm CINF 28: Visualization of structure-activity relationship patterns and compound design using the SAR Matrix method
Dilyana Dimova1 , dimova@bit.uni-bonn.de, Jürgen Bajorath1 , bajorath@bit.uni-bonn.de
Abstract
2:45pm-3:00pm Intermission
3:00pm-3:25pm CINF 29: Visualization and manipulation of Matched Molecular Series for decision support
Noel O'Boyle1 , baoilleach@gmail.com, Roger Sayle1
Abstract| Slides (pdf)
3:25pm-3:50pm CINF 30: Design and characterization of chemical space networks
Martin Vogt2 , martin.vogt@bit.uni-bonn.de, Gerald Maggiora1 , Jürgen Bajorath2
Abstract| Slides (pdf)
3:50pm-4:15pm CINF 31: Interactive web-based tools for navigating the biologically relevant chemical space

Obdulia Rabal1 , orabal@unav.es, Julen Oyarzabal1
Abstract| Slides (pdf)

4:15pm-4:40pm CINF 32: Compact models for compact devices: Visualisation of SAR data using mobile apps
Alex Clark1 , aclark.xyz@gmail.com
Abstract| Slides (pdf)
4:40pm-5:05pm CINF 33: Fast, visual, and compelling analysis of datasets from similarity to SAR
Mike Hartshorn1 , Daniel Ormsby1 , Christoph Mueller1 , Rob Brown2 , rob.brown@dotmatics.com, Jesse Gordon3 , Tamsin Mansley3 , Clare Tudge3
Abstract
5:05pm-5:10pm Concluding Remarks
CINF: Wikipedia and Chemistry: Collaborations in Science and Education 1:00pm - 5:05pm
Sunday, August 16
Room 104A - Boston Convention & Exhibition Center
Ye Li, Martin Walker, Organizing
Ye Li, Martin Walker
Cosponsored by: CHED, Presiding
1:00pm-1:05pm Introductory Remarks
1:05pm-1:25pm CINF 14: Chemistry and Wikipedia: Coverage, evolution, and citations
Elsa Alvaro2 , elsa.alvaro@northwestern.edu, Angel Yanguas-Gil1
Abstract
1:25pm-1:45pm CINF 15: Chemistry collaborations on Wikipedia
Martin Walker1 , walkerma@potsdam.edu
Abstract
1:45pm-2:05pm CINF 16: Improving the knowledge about chemistry: The two leading encyclopedias, Wikipedia and RÖMPP, cooperate in Germany
Guido Herrmann1 , guido.herrmann@thieme.de
Abstract
2:05pm-2:25pm CINF 17: PubChem Wikipedia integration and potential for future collaboration
Jian Zhang1 , jiazhang@ncbi.nlm.nih.gov, Paul Thiessen1 , Asta Gindulyte1 , Evan Bolton1
Abstract
2:25pm-2:45pm CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
Roger Sayle1 , roger@nextmovesoftware.com, Daniel Lowe1
Abstract
2:45pm-3:00pm Intermission
3:00pm-3:20pm CINF 19: Tools and strategies: Incorporating Wikipedia-based assignments into a course

Eryk Salvaggio1 , eryk@wikiedu.org, Jami Mathewson1 , jami@wikiedu.org
Abstract

3:20pm-3:40pm CINF 20: Wikipedia editing in chemistry classrooms: Resonance and gaps between educational needs and Wikipedia community practices
Ye Li1 , liye@umich.edu
Abstract| Slides (pdf)
3:40pm-4:00pm CINF 21: Improving Wikipedia topics, a chemistry outreach activity
Keith Lindblom1 , k_lindblom@acs.org
Abstract
4:00pm-4:20pm CINF 22: Value of the Mediawiki platform for providing content to the chemistry community
Antony Williams12 , tony27587@gmail.com
Abstract
4:20pm-4:40pm CINF 23: Chemical collaborations in the wiki realm
Andy Mabbett1 , mabbetta@rsc.org
Abstract
4:40pm-5:00pm CINF 24: Panel Discussion: Wikipedia and MediaWiki: Collaborations and Education in Chemistry
Ye Li2 , liye@umich.edu, Martin Walker1 , walkerma@potsdam.edu
Abstract
5:00pm-5:05pm Concluding Remarks
CINF: CINF Scholarships for Scientific Excellence: Student Poster Competition 6:30pm - 8:30pm
Sunday, August 16
Lighthouse Blrm 1 - Seaport Hotel and World Trade Center
6:30pm-8:30pm CINF 34: P-OSRA: Polymer Optical Structure Recognition Application

Bryn Reinstadler21 , br6@williams.edu, Hans Horn2
Abstract

6:30pm-8:30pm CINF 35: Withdrawn
6:30pm-8:30pm CINF 36: Knowledge-based approach to the parameterization of small molecule force fields based on crystal structures
Florian Roessler2 , fdr20@cam.ac.uk, Oliver Korb1 , Robert Glen3 , Peter Bond4
Abstract
6:30pm-8:30pm CINF 37: Pilot study of clustering based safety assessment for fragrance ingredients
Jie Shen1 , jshen@rifm.org, Lambros Kromidas1
Abstract
6:30pm-8:30pm CINF 38: Investigation of the endocrine disruption potential of bisphenol A replacement compounds
Hui Wen Ng2 , ng.huiwen33@gmail.com, Roger Perkins2 , Weida Tong2 , Huixiao Hong1
Abstract
6:30pm-8:30pm CINF 39: Chemical alerts and QSAR models based on dynamically-generated annotated linear structural fragments
Darshan Mehta1 , mehta.182@osu.edu, James Rathman1 , Chihae Yang1
Abstract
6:30pm-8:30pm CINF 40: Developing group contributions for predicting transition state structures
Pierre Bhoorasingh1 , bhoorasingh.p@husky.neu.edu, Richard West1
Abstract| Slides (pdf)
6:30pm-8:30pm CINF 41: Changes in scholarly publishing practices in the chemical sciences: A focus on early career chemists
Marianne NOEL1 , noel@ifris.org
Abstract
6:30pm-8:30pm CINF 42: Predicting Tox21 assay outcome by quantitative structure-activity relationship and machine learning methods
Mikyung Lee1 , mikyung.lee11@gmail.com, Dac-Trung Nguyen2 , Ruili Huang2
Abstract
6:30pm-8:30pm CINF 43: Chess-like algorithms behind Chematica's retrosynthetic planning

Sara Szymkuc1 , sara.szymkuc@icho.edu.pl, Ewa Gajewska1 , Tomasz Klucznik1 , Piotr Dittwald1 , Michal Startek3 , Karol Molga1 , Michal Bajczyk1 , Bartosz Grzybowski21
Abstract

6:30pm-8:30pm CINF 44: Retrosynthesis of complex molecules using Chematica
Ewa Gajewska1 , ewa.p.gajewska@gmail.com, Sara Szymkuc1 , Tomasz Klucznik1 , Piotr Dittwald1 , Michal Startek3 , Karol Molga1 , Michal Bajczyk1 , Bartosz Grzybowski21
Abstract
6:30pm-8:30pm CINF 45: Mining chemical databases to obtain knowledge based information of non-covalent interactions

Mathew Koebel1 , mathew.koebel@stlcop.edu, Suman Sirimulla1
Abstract

6:30pm-8:30pm CINF 46: In silico assessment of toxicity endpoints: Case-studies using CORINA Symphony and ChemTunes Studio

Christof Schwab1 , Joerg Marusczyk1 , Aleksey Tarkhov1 , Thomas Kleinoeder1 , Dimitar Hristozov4 , Bruno Bienfait5 , Oliver Sacher1 , James Rathman34 , rathman.1@osu.edu, Chihae Yang24
Abstract

6:30pm-8:30pm CINF 47: Chemogenomics-assisted anti-obesity drug discovery
Rima Hajjo1 , hajjo@email.unc.edu, Alexander Tropsha1 , alex_tropsha@unc.edu
Abstract
CINF: Workflow Tools & Data Pipelining in Drug Discovery 8:00am - 10:20am
Monday, August 17
Room 103 - Boston Convention & Exhibition Center
Erin Davis, Tim Dudgeon, Organizing
Erin Davis, Tim Dudgeon, Presiding
8:00am-8:05am Introductory Remarks
8:05am-8:30am CINF 59: When command line tools meet KNIME: Using the best of the two worlds to support drug discovery teams
Man-Ling Lee1 , man-ling.lee@gmx.net
Abstract| Slides (pdf)
8:30am-8:55am CINF 60: Pipelining in mind: Compound library preprocessing in an interactive workflow
Matthias Hilbig1 , Matthias Rarey1 , rarey@zbh.uni-hamburg.de
Abstract
8:55am-9:20am CINF 61: New web based collaborative environment for cheminformatics workflows
Tim Dudgeon1 , tdudgeon@informaticsmatters.com
Abstract| Slides (pdf)
9:20am-9:30am Intermission
9:30am-9:55am CINF 62: Workflows supporting drug discovery against malaria
Barry Hardy1 , barry.hardy@douglasconnect.com
Abstract
9:55am-10:20am CINF 63: Accessing knowledge and design insights from a fully-annotated kinase-focused compound collection
Natasja Brooijmans1 , nbrooijmans@blueprintmedicines.com
Abstract
CINF: Retrosynthesis, Synthesis Planning, Reaction Prediction: When Will Computers Meet the Needs of the Synthetic Chemist? 9:00am - 11:50am
Monday, August 17
Room 104A - Boston Convention & Exhibition Center
David Evans, Wendy Warr, Organizing
David Evans, Wendy Warr, Presiding
9:00am-9:05am Introductory Remarks
9:05am-9:30am CINF 48: What are the next steps in your synthesis? The Reaxys experience
Juergen Swienty Busch1 , juergen@swienty-busch.de
Abstract
9:30am-9:55am CINF 49: Green chemistry in synthesis planning systems: A role for biocatalysis data and sustainability metrics?
Peter Johnson1 , p.johnson@leeds.ac.uk, Vilmos Valko1 , Anthony Cook23
Abstract
9:55am-10:20am CINF 50: Synthetically accessible virtual inventory (SAVI)
Yuri Pevzner1 , Wolf-Dietrich Ihlenfeldt2 , Marc Nicklaus1 , mn1@helix.nih.gov
Abstract| Slides (pdf)
10:20am-10:35am Intermission
10:35am-11:00am CINF 51: Analyzing success rates of supposedly 'easy' reactions
Roger Sayle1 , roger@nextmovesoftware.com
Abstract| Slides (pdf)
11:00am-11:25am CINF 52: Computer-inspired organic synthesis: Building on success
Jonathan Goodman1 , jmg11@cam.ac.uk
Abstract| Slides (pdf)
11:25am-11:50am CINF 53: Using reaction driven de novo design as a “retrosynthetic” analysis tool
Brian Masek1 , brian.masek@certara.com, Stephan Nagy1 , David Baker1 , Roman Dorfman1 , Farhad Soltanshahi1 , Karen Dubrucq1
Abstract
CINF: Enabling Machines to 'Read' the Chemical Literature: Techniques, Case Studies & Opportunities 9:30am - 11:55am
Monday, August 17
Room 104B - Boston Convention & Exhibition Center
Daniel Lowe, Organizing
Daniel Lowe, Presiding
9:30am-9:35am Introductory Remarks
9:35am-10:00am CINF 54: CHEMDNER-Patents: Automatic recognition of chemical and biological entities in patents

Martin Krallinger2 , Florian Leitner3 , Obdulia Rabal1 , orabal@unav.es, Miguel Vazquez2 , Julen Oyarzabal1 , Alfonso Valencia2
Abstract

10:00am-10:25am CINF 55: SureChEMBL: An open patent chemistry resource
George Papadatos1 , georgep@ebi.ac.uk, Mark Davies1 , Nathan Dedman1 , Anne Hersey1 , John Overington1
Abstract| Slides (pdf)
10:25am-10:50am CINF 56: Deuterogate: Causes and consequences of automated extraction of patent-specified virtual deuterated drugs feeding into PubChem
Christopher Southan1 , cdsouthan@gmail.com
Abstract| Slides (pdf)
10:50am-11:05am Intermission
11:05am-11:30am CINF 57: Evaluating US patent full text documents with chemical ontologies
Lutz Weber1 , lutz.weber@ontochem.com
Abstract
11:30am-11:55am CINF 58: Text-mining to produce large chemistry datasets for community access
Antony Williams2 , tony27587@gmail.com, Daniel Lowe1 , Igor Tetko3 , Carlos Coba4 , Valery Tkachenko2 , Alexey Pshenichnov2 , Ken Karapetyan2
Abstract
CINF: CINFlash: Workflow Tools Lightning Round 10:30am - 12:00pm
Monday, August 17
Room 103 - Boston Convention & Exhibition Center
Erin Davis, Organizing
Erin Davis, Presiding
10:30am-10:35am Introductory Remarks
10:35am-12:00pm CINF 64: CINFlash: Workflow tools lightning round
Erin Davis1 , erinbolstad@gmail.com
Abstract
CINF: Retrosynthesis, Synthesis Planning, Reaction Prediction: When Will Computers Meet the Needs of the Synthetic Chemist? 1:30pm - 4:40pm
Monday, August 17
Room 104A - Boston Convention & Exhibition Center
David Evans, Wendy Warr, Organizing
David Evans, Wendy Warr, Presiding
1:30pm-1:55pm CINF 65: SynTree, chemical synthesis on a PC
John Figueras1 , jjfigueras@gmail.com
Abstract| Slides (pdf)
1:55pm-2:20pm CINF 66: Empowering chemists in synthesis planning – lessons from the evolution of ARChem
Orr Ravitz2 , orr.ravitz@gmail.com, Anthony Cook3 , Zsolt Zsoldos1 , Peter Johnson3
2 John Wiley & Sons, Toronto, Ontario, Canada; 3 School of Chemistry, University of Leeds, Leeds, United Kingdom

Abstract

2:20pm-2:45pm CINF 67: Computer-aided synthesis design (CASD) and forward reaction prediction tools for both idea generation in new synthesis route planning and for de novo molecule design
Valentina Eigner Pitto1 , ve@infochem.de, Fernando Huerta2 , Mike Hutchings1 , Heinz Saller1 , Peter Loew1
Abstract| Slides (pdf)
2:45pm-3:10pm CINF 68: Chematica – the Deep Blue of chemistry
Bartosz Grzybowski12 , nanogrzybowski@gmail.com
Abstract
3:10pm-3:25pm Intermission
3:25pm-3:50pm CINF 69: Reaction mining with condensed graphs of reactions: Problems and perspectives

Alexandre Varnek1 , varnek@unistra.fr
Abstract

3:50pm-4:15pm CINF 70: Assessment of optimal conditions for selective deprotection reactions resulted from analysis of large reaction database
Timur Madzhidov1 , tmadzhidov@gmail.com, Arkadii Lin12 , Igor Antipin1 , Olga Klimchuk2 , Alexandre Varnek2
Abstract
4:15pm-4:40pm CINF 71: Energy refinement of reactive molecular dynamics pathways
Lee-Ping Wang3 , officer.ping@gmail.com, Robert McGibbon4 , Vijay Pande1 , Todd Martinez2
Abstract
CINF: The Growing Impact of Openness in Chemistry: A Symposium in Honor of JC Bradley 1:00pm - 5:50pm
Monday, August 17
Room 103 - Boston Convention & Exhibition Center
Andrew Lang, Antony Williams, Organizing
Andrew Lang, Antony Williams, Presiding
1:00pm-1:05pm Introductory Remarks
1:05pm-1:25pm CINF 78: Contributions of Jean-Claude Bradley to the vision and execution of Open Notebook Science
Antony Williams12 , tony27587@gmail.com, Andrew Lang3
Abstract| Slides (pdf)
1:25pm-1:45pm CINF 79: Making it open: Putting cheminformatics to use against the Ebola virus
Sean Ekins1 , ekinssean@yahoo.com
Abstract
1:45pm-2:05pm CINF 80: Opening up and connecting up antimalarial data: Progress but with caveats

Christopher Southan1 , cdsouthan@gmail.com
Abstract| Slides (pdf)

2:05pm-2:25pm CINF 81: Context of crowdsourcing: A driver of organizational openness?
David Thompson1 , d.c.thompson.00@gmail.com, Jorg Bentzien2
Abstract| Slides (pdf)
2:25pm-2:35pm Intermission
2:35pm-2:55pm CINF 82: Promoting, supporting, and incentivizing openness in scientific research
Sara Bowman1 , sed8n@virginia.edu
Abstract
2:55pm-3:15pm CINF 83: OpenTox - an open community and framework supporting predictive toxicology and safety assessment
Barry Hardy1 , barry.hardy@douglasconnect.com
Abstract
3:15pm-3:35pm CINF 84: Topliss batchwise scheme reviewed in the era of Open Data
Lars Richter2 , Gerhard Ecker1 , gerhard.f.ecker@univie.ac.at
Abstract
3:35pm-3:55pm CINF 85: Anatomy of a chemical reaction: Dissection by machine learning algorithms
Alex Clark1 , aclark.xyz@gmail.com
Abstract
3:55pm-4:15pm CINF 86: Cheminformatics OLCC
Robert Belford4 , rebelford@ualr.edu, David Wild8 , Leah McEwen2 , Antony Williams3 , Stuart Chalk6 , Jennifer Muzyka1 , John Penn7 , Jon Holmes5
Abstract
4:15pm-4:25pm Intermission
4:25pm-4:45pm CINF 87: PubChem project and annotations
Jian Zhang1 , jiazhang@ncbi.nlm.nih.gov, Paul Thiessen1 , Sunghwan Kim1 , Asta Gindulyte1 , Renata Geer1 , Evan Bolton1
Abstract
4:45pm-5:05pm CINF 88: Open Spectral Database: Open data, open code, open concept
Stuart Chalk1 , schalk@unf.edu
Abstract| Slides (pdf)
5:05pm-5:25pm CINF 89: DeepLit WikiHyperGlossary
Michael Bauer1 , mbauer2@uams.edu, Andrew Cornell2 , Dan Berleant3 , Robert Belford2
Abstract| Slides (pdf)
5:25pm-5:45pm CINF 90: Changing landscape of scientific publishing: Open access, open data, and more
Charlotte Hollingworth1 , charlotte.hollingworth@springer.com
Abstract
5:45pm-5:50pm Concluding Remarks
CINF: Enabling Machines to 'Read' the Chemical Literature: Techniques, Case Studies & Opportunities 1:30pm - 4:15pm
Monday, August 17
Room 104B - Boston Convention & Exhibition Center
Daniel Lowe, Organizing
Daniel Lowe, Presiding
1:30pm-1:55pm CINF 72: Identifying chemical species in combustion models
Richard West1 , r.west@neu.edu
Abstract
1:55pm-2:20pm CINF 73: Text mining the chemical literature to find chemicals in context
Tong-Ying Wu1 , tony.wu@linguamatics.com, Andrew Hinton2 , David Milward2
Abstract
2:20pm-2:45pm CINF 74: Unlocking chemical information from tables and legacy articles
Daniel Lowe1 , daniel@nextmovesoftware.com, Roger Sayle1 , Antony Williams2
Abstract| Slides (pdf)
2:45pm-3:00pm Intermission
3:00pm-3:25pm CINF 75: Chemical structure identification and retrieval with OSRA
Igor Filippov2 , igor.v.filippov@gmail.com, Iwona Weidlich1
Abstract| Slides (pdf)
3:25pm-3:50pm CINF 76: P-OSRA: Translating polymer images to text using extensions of open source software
Bryn Reinstadler21 , br6@williams.edu, Hans Horn2
Abstract| Slides (pdf)
3:50pm-4:15pm CINF 77: Practical case studies of the application of CLiDE for the efficient extraction of chemical structures from documents
Aniko Valko1 , Aniko.Valko@keymodule.co.uk, Peter Johnson2
Abstract| Slides (pdf)
CINF: Sci-Mix 8:00pm - 10:00pm
Monday, August 17
Hall C - Boston Convention & Exhibition Center
8:00pm-10:00pm CINF 109: Dark chemical matter: Could 'inactive' compounds be good starting points for drug discovery?

Anne Wassermann1 , anne.wassermann@pfizer.com

8:00pm-10:00pm CINF 118: Chemical Information Sources Wikibook - the open source created by chemical information professionals for chemical information professionals View Session Detail| Slides (pdf)

Charles Huber1 , huber@library.ucsb.edu

8:00pm-10:00pm CINF 128: Scaffold-based analytics: Enabling hit-to-lead decisions by visualizing chemical series linked across large datasets

Deepak Bandyopadhyay1 , Deepak.2.Bandyopadhyay@gsk.com, Constantine Kreatsoulas1 , Pat Brady1 , Genaro Scavello1 , Dac-Trung Nguyen2 , Tyler Peryea2 , Ajit Jadhav2

8:00pm-10:00pm CINF 138: Linking transporter interaction profiles to in vivo side effects

Eleni Kotsampasakou1 , Sylvia Escher3 , Andreas Jurik2 , Harald Sitte4 , Lukas Pezawas5 , Gerhard Ecker1 , gerhard.f.ecker@univie.ac.at

8:00pm-10:00pm CINF 13: Analyzing ToxCast data using nebula (neighbor-edges based and unbiased leverage algorithm)

Huixiao Hong1 , Huixiao.Hong@fda.hhs.gov

8:00pm-10:00pm CINF 149: Data driven multi-object optimization (MOO) in drug design View Session Detail

Shahar Keinan1 , skeinan@cloudpharmaceuticals.com, Elizabeth Hobbs1 , Elizabeth Hatcher-Frush1

8:00pm-10:00pm CINF 160: From QSAR to big data: Developing mechanism-driven predictive models for animal toxicity

Marlene Kim2 , Hao Zhu1 , hao.zhu99@rutgers.edu

8:00pm-10:00pm CINF 166: CIIPro: An online cheminformatics portal for large scale chemical data analysis

Daniel Russo1 , danrusso@scarletmail.rutgers.edu, Wenyi Wang1 , Marlene Kim1 , Daniel Pinolini1 , Hao Zhu12

8:00pm-10:00pm CINF 168: “Graphical abstracts only”: The changing use of periodicals among early career chemists

Marianne NOEL1 , noel@ifris.org

8:00pm-10:00pm CINF 19: Tools and strategies: Incorporating Wikipedia-based assignments into a course

Eryk Salvaggio1 , eryk@wikiedu.org, Jami Mathewson1 , jami@wikiedu.org

8:00pm-10:00pm CINF 31: Interactive web-based tools for navigating the biologically relevant chemical space

Obdulia Rabal1 , orabal@unav.es, Julen Oyarzabal1

8:00pm-10:00pm CINF 34: P-OSRA: Polymer Optical Structure Recognition Application

Bryn Reinstadler21 , br6@williams.edu, Hans Horn2

8:00pm-10:00pm CINF 35: Withdrawn
8:00pm-10:00pm CINF 43: Chess-like algorithms behind Chematica's retrosynthetic planning

Sara Szymkuc1 , sara.szymkuc@icho.edu.pl, Ewa Gajewska1 , Tomasz Klucznik1 , Piotr Dittwald1 , Michal Startek3 , Karol Molga1 , Michal Bajczyk1 , Bartosz Grzybowski21

8:00pm-10:00pm CINF 45: Mining chemical databases to obtain knowledge based information of non-covalent interactions

Mathew Koebel1 , mathew.koebel@stlcop.edu, Suman Sirimulla1

8:00pm-10:00pm CINF 46: In silico assessment of toxicity endpoints: Case-studies using CORINA Symphony and ChemTunes Studio

Christof Schwab1 , Joerg Marusczyk1 , Aleksey Tarkhov1 , Thomas Kleinoeder1 , Dimitar Hristozov4 , Bruno Bienfait5 , Oliver Sacher1 , James Rathman34 , rathman.1@osu.edu, Chihae Yang24

8:00pm-10:00pm CINF 54: CHEMDNER-Patents: Automatic recognition of chemical and biological entities in patents

Martin Krallinger2 , Florian Leitner3 , Obdulia Rabal1 , orabal@unav.es, Miguel Vazquez2 , Julen Oyarzabal1 , Alfonso Valencia2

8:00pm-10:00pm CINF 69: Reaction mining with condensed graphs of reactions: Problems and perspectives

Alexandre Varnek1 , varnek@unistra.fr

8:00pm-10:00pm CINF 80: Opening up and connecting up antimalarial data: Progress but with caveats

Christopher Southan1 , cdsouthan@gmail.com

8:00pm-10:00pm CINF 91: Chemistry enabling Chinese, Japanese, and Korean patents
Daniel Lowe1 , daniel@nextmovesoftware.com, Roger Sayle1
Abstract
8:00pm-10:00pm CINF 94: Non-specificity of drug-target ineractions: Consequences for drug discovery

Gerald Maggiora12 , gerry.maggiora@gmail.com, Vijay Gokhale3

CINF: Herman Skolnik Award Symposium 8:00am - 12:00pm
Tuesday, August 18
Room 104A - Boston Convention & Exhibition Center
Jürgen Bajorath, Veerabahu Shanmugasundaram, Organizing
Veerabahu Shanmugasundaram
Cosponsored by: COMP and MEDI
Financially supported by: Pfizer, Presiding
8:00am-8:05am Introductory Remarks
8:05am-8:45am CINF 92: Withdrawn
8:45am-9:25am CINF 93: Paradigm which permits the parsing of information content arising from receptor-independent ligand activity models and receptor-dependent activity models
Anton Hopfinger1 , hopfingr@unm.edu
Abstract
9:25am-10:05am CINF 94: Non-specificity of drug-target ineractions: Consequences for drug discovery

Gerald Maggiora12 , gerry.maggiora@gmail.com, Vijay Gokhale3
Abstract

10:05am-10:45am CINF 95: Molecular similarity approaches in chemoinformatics: Early history and bibliometric analysis
Peter Willett1 , p.willett@sheffield.ac.uk
Abstract
10:45am-11:00am Intermission
11:00am-11:30am CINF 96: Generative topographic mapping: Universal tool for chemical space analysis
Alexandre Varnek1 , varnek@unistra.fr
Abstract
11:30am-12:00pm CINF 97: Development of a knowledge-generating platform driven by big data in drug discovery through production processes
Kimito Funatsu1 , funatsu@chemsys.t.u-tokyo.ac.jp
Abstract
CINF: Scientific Integrity: Can We Rely on the Published Scientific Literature? 9:00am - 12:25pm
Tuesday, August 18
Room 104B - Boston Convention & Exhibition Center
Judith Currano, William Town, Organizing
William Town
Cosponsored by: COMSCI, ETHC and PROF, Presiding
9:00am-9:05am Introductory Remarks
9:05am-9:30am CINF 98: Integrity, ethics, and trust in scientific research literature
Christopher Leonard1 , christopher.j.leonard@gmail.com
Abstract
9:30am-9:55am CINF 99: Policy making at the American Chemical Society: Developing a statement on scientific integrity
Sarah Cooney1 , sarah_cooney@bat.com, Christopher Proctor1 , christopher_proctor@bat.com
Abstract
9:55am-10:20am CINF 100: Publishability
Martin Hicks1 , mhicks@beilstein-institut.de
Abstract
10:20am-10:35am Intermission
10:35am-11:00am CINF 101: What is the role of peer review in protecting the integrity of scientific research?
Na Qin1 , qinna@msu.edu
Abstract
11:00am-11:25am CINF 102: Open, network-based answer to the reproducibility crisis: The ScienceOpen peer review concept
Stephanie Dawson1 , stephanie.dawson@scienceopen.com
Abstract| Slides (pdf)
11:25am-11:50am CINF 103: Managing new threats to the integrity of the scientific literature
Judith Currano1 , currano@pobox.upenn.edu, Kenneth Foster2
Abstract
11:50am-11:55am Concluding Remarks
CINF: Herman Skolnik Award Symposium 1:00pm - 5:00pm
Tuesday, August 18
Room 104A - Boston Convention & Exhibition Center
Jürgen Bajorath, Veerabahu Shanmugasundaram, Organizing
Veerabahu Shanmugasundaram
Cosponsored by: COMP, COMP, MEDI and MEDI
Financially supported by: Pfizer, Presiding
1:00pm-1:30pm CINF 104: Enabling drug discovery by computational molecular design
Gisbert Schneider1 , gisbert@ethz.ch, Petra Schneider1
Abstract
1:30pm-2:00pm CINF 105: Integrating public data sources into the drug discovery workflow
Patrick Walters1 , pat_walters@vrtx.com, Alex Aronov1 , Brian Goldman1 , Jun Feng2 , Brian McClain2 , Lidio Meireles2 , Hsin-Pei Shih2 , Jonathan Weiss2
Abstract
2:00pm-2:30pm CINF 106: Going beyond R-group tables: Close-in analog prioritization using neighborhood information derived from SAR matrices
Liying Zhang1 , Kjell Johnson1 , Jeremy Starr1 , Chris Poss1 , Jared Milbank1 , Max Kuhn1 , Veerabahu Shanmugasundaram1 , Veerabahu.Shanmugasundaram@pfizer.com
Abstract
2:30pm-2:45pm Intermission
2:45pm-3:15pm CINF 107: AnalogExplorer: A new method for graphical analysis of analog series and associated structure−activity relationship information
Ye Hu1 , pauline810805@googlemail.com
Abstract| Slides (pdf)
3:15pm-3:45pm CINF 108: How many fingers does a compound have? The various ways to define molecular similarity
Eugen Lounkine1 , lounkine@gmail.com
Abstract| Slides (pdf)
3:45pm-4:15pm CINF 109: Dark chemical matter: Could 'inactive' compounds be good starting points for drug discovery?

Anne Wassermann1 , anne.wassermann@pfizer.com
Abstract

4:15pm-4:45pm CINF 110: Complexity and heterogeneity of data for chemical information science
Jürgen Bajorath1 , bajorath@bit.uni-bonn.de
Abstract
4:45pm-5:00pm Awards Presentation
CINF: Scientific Integrity: Can We Rely on the Published Scientific Literature? 1:30pm - 5:20pm
Tuesday, August 18
Room 104B - Boston Convention & Exhibition Center
Judith Currano, William Town, Organizing
Judith Currano
Cosponsored by: COMSCI, ETHC and PROF, Presiding
1:30pm-1:35pm Introductory Remarks
1:35pm-2:00pm CINF 111: Toward a more reproducible corpus of scientific literature
Cesar Berrios1 , cesar.berrios-otero@f1000.com
Abstract
2:00pm-2:25pm CINF 112: Extraordinary public access to scientific evidence in the FDA modified risk tobacco product process
James Solyst1 , jim.solyst@smna.com
Abstract
2:25pm-2:50pm CINF 113: Validation and fraud in small-molecule crystallography
Sean Conway1 , sc@iucr.org
Abstract
2:50pm-3:15pm CINF 114: Scientific integrity: A crystallographic perspective
Ian Bruno1 , bruno@ccdc.cam.ac.uk
Abstract| Slides (pdf)
3:15pm-3:30pm Intermission
3:30pm-3:55pm CINF 115: Ways publishers help, maintain, and support responsible research
Raymond Boucher1 , rboucher@wiley.com
Abstract
3:55pm-4:20pm CINF 116: Integrity, trust, and reproducibility: How scientific publishers can contribute
Guido Herrmann1 , guido.herrmann@thieme.de
Abstract
4:20pm-4:45pm CINF 117: The write stuff – scientific integrity and publishing
Jamie Humphrey2 , humphreyj@rsc.org, Richard Kidd1 , kiddr@rsc.org
Abstract
4:45pm-4:50pm Concluding Remarks
CINF: Computational Toxicology: From QSAR Models to Adverse Outcome Pathways 8:15am - 11:35am
Wednesday, August 19
Room 103 - Boston Convention & Exhibition Center
Mohamed AbdulHameed, Organizing
Mohamed AbdulHameed
Cosponsored by: AGRO, COMP, ENVR and MEDI, Presiding
8:15am-8:20am Introductory Remarks
8:20am-8:40am CINF 132: Using mode-of-action (MOA) data to guide the development of local quantitative structure-activity relationship (QSAR) models for molecular and early cellular events in an adverse outcome pathway (AOP)
Jay Tunkel1 , tunkel@srcinc.com, Julie Melia1 , Kelly Salinas1 , Laura Morlacci1 , Jennifer Rhoades1 , Mary Kawa1 , Catherine Rudisill1 , Heather Carlson-Lynch1
Abstract
8:40am-9:00am CINF 133: QSAR models could replace LLNA test for predicting human skin sensitization potential of chemicals
Vinicius Alves2 , viniciusm.alves@gmail.com, Rodolpho Braga2 , Eugene Muratov4 , murik@email.unc.edu, Denis Fourches5 , Nicole Kleinstreuer6 , Judy Strickland6 , Carolina Andrade1 , Alexander Tropsha3 , alex_tropsha@unc.edu
Abstract
9:00am-9:20am CINF 134: Assessing skin sensitization potential by combining AOP-informed chemotype alerts, QSAR models, and in vitro biological assay data
James Rathman34 , rathman.1@osu.edu, Chihae Yang24 , Aleksandra Mostrag-Szlichtyng4 , Bruno Bienfait2 , Joerg Marusczyk2 , Christof Schwab1
Abstract
9:20am-9:40am CINF 135: Using OpenTox to map toxicity data to adverse outcome pathways
Barry Hardy1 , barry.hardy@douglasconnect.com
Abstract
9:40am-10:00am CINF 136: Cheminformatic tools in support of pharmacokinetics and ADME profiling
Michael Goldsmith1 , r.goldsmith@chemcomp.com, Daniel Chang2
Abstract
10:00am-10:15am Intermission
10:15am-10:35am CINF 137: Predicting off target profiles using local 3D QSAR models generated 'on the fly'
Brian Masek1 , brian.masek@certara.com, Alexander Steudle1 , Lei Wang1 , Bernd Wendt1
Abstract
10:35am-10:55am CINF 138: Linking transporter interaction profiles to in vivo side effects

Eleni Kotsampasakou1 , Sylvia Escher3 , Andreas Jurik2 , Harald Sitte4 , Lukas Pezawas5 , Gerhard Ecker1 , gerhard.f.ecker@univie.ac.at
Abstract

10:55am-11:15am CINF 139: Enhancing structural alerts for toxicity with mechanism-based metabolism and reactivity models
S. Joshua Swamidass2 , swamidass@wustl.edu, Tyler Hughes2 , Grover Miller1
Abstract
11:15am-11:35am CINF 140: Toxicity biomarker identification and drug repurposing using gene co-expression modules
Gregory Tawa2 , gtawa@hotmail.com, Mohamed AbdulHameed1 , Danielle Ippolito3 , kamal kumar1 , John Lewis3 , Jonathan Stallings3 , Anders Wallqvist1
Abstract
CINF: Find the Needle in a Haystack: Mining Data from Large Chemical Spaces 8:30am - 11:50am
Wednesday, August 19
Room 104B - Boston Convention & Exhibition Center
David Deng, Organizing
David Deng, Presiding
8:30am-8:35am Introductory Remarks
8:35am-9:05am CINF 126: Frequency of activity cliffs and distribution over different potency ranges
Dagmar Stumpfe1 , stumpfe@bit.uni-bonn.de, Dilyana Dimova1 , Jürgen Bajorath1
Abstract
9:05am-9:35am CINF 127: Random indexing for comparing path-based chemical fingerprints
Patrick Devaney2 , p.devaney@formatherapeutics.com, David Lancia2 , Jared Milbank2 , Mary Bradley1
Abstract
9:35am-10:05am CINF 128: Scaffold-based analytics: Enabling hit-to-lead decisions by visualizing chemical series linked across large datasets

Deepak Bandyopadhyay1 , Deepak.2.Bandyopadhyay@gsk.com, Constantine Kreatsoulas1 , Pat Brady1 , Genaro Scavello1 , Dac-Trung Nguyen2 , Tyler Peryea2 , Ajit Jadhav2
Abstract| Slides (pdf)

10:05am-10:20am Intermission
10:20am-10:50am CINF 129: Resolving cryptic needles to molecular structures: The GtoPdb experience
Christopher Southan1 , cdsouthan@gmail.com, Adam Pawson1 , Joanna Sharman1 , Helen Benson1 , Elena Faccenda1
Abstract| Slides (pdf)
10:50am-11:20am CINF 130: Current and future developments of Markush technology in drug discovery
David Deng1 , ddeng@chemaxon.com, Árpád Figyelmesi1
Abstract
11:20am-11:45am CINF 131: GPU-accelerated virtual screening: Rationale, challenges, and case studies
Olexandr Isayev1 , olexandr@olexandrisayev.com, Denis Fourches2
Abstract
11:45am-11:50am Concluding Remarks
CINF: Chemical Information Skills: The Essential Toolkit for Chemical Research— A Joint CINF-CSA Trust Symposium 9:00am - 12:40pm
Wednesday, August 19
Room 104A - Boston Convention & Exhibition Center
Grace Baysinger, Jonathan Goodman, Organizing
Grace Baysinger, Jonathan Goodman
Financially supported by: CSA Trust, Presiding
9:00am-9:05am Introductory Remarks
9:05am-9:25am CINF 118: Chemical Information Sources Wikibook - the open source created by chemical information professionals for chemical information professionals View Session Detail

Charles Huber1 , huber@library.ucsb.edu
Abstract| Slides (pdf)

9:25am-9:45am CINF 119: Soft skills of chemical research: Academic integrity and research ethics
Donna Wrublewski1 , dtwrub@caltech.edu, Michelle Leonard2 , Amy Buhler2 , Neelam Bharti2 , neelambh@ufl.edu
Abstract
9:45am-10:05am CINF 120: Integrating bibliographic management tools in chemical information literacy instruction
Svetla Baykoucheva1 , sbaykouc@umd.edu, Joseph Houck2
Abstract| Slides (pdf)
10:05am-10:25am CINF 121: Replacing the traditional graduate chemistry literature seminar with a chemical information literacy course
Vincent Scalfani3 , vfscalfani@ua.edu, Stephen Woski1 , Patrick Frantom2
Abstract| Slides (pdf)
10:25am-10:40am Intermission
10:40am-11:00am CINF 122: Chemical information skills: A searcher’s perspective
Elaine Cheeseman1 , echeesema@cas.org
Abstract
11:00am-11:20am CINF 123: Withdrawn
11:20am-11:40am CINF 124: Patents - the essential multifunctional tool for science, business, and intellectual property information
Edlyn Simmons1 , edlyns@earthlink.net
Abstract
11:40am-12:00pm CINF 125: Career information resources for graduate students and postdocs
Grace Baysinger1 , graceb@stanford.edu
Abstract
CINF: Find the Needle in a Haystack: Mining Data from Large Chemical Spaces 1:00pm - 4:30pm
Wednesday, August 19
Room 104B - Boston Convention & Exhibition Center
David Deng, Organizing
David Deng, Presiding
1:00pm-1:05pm Introductory Remarks
1:05pm-1:30pm CINF 149: Data driven multi-object optimization (MOO) in drug design View Session Detail

Shahar Keinan1 , skeinan@cloudpharmaceuticals.com, Elizabeth Hobbs1 , Elizabeth Hatcher-Frush1
Abstract

1:30pm-1:55pm CINF 150: Multiobjective transformation based de novo design: A case study of surfactants
Christos Kannas1 , chriskannas@gmail.com, Warren Read23 , Noel Ruddock3 , Martyn Fletcher4 , Tom Jackson4 , Robert Stevens2 , Jerry Winter3 , Peter Willett1 , Val Gillet1
Abstract| Slides (pdf)
1:55pm-2:20pm CINF 151: Mapping chemical data with Diversity Genie
Igor Filippov2 , igor.v.filippov@gmail.com, Iwona Weidlich1
Abstract| Slides (pdf)
2:20pm-2:30pm Intermission
2:30pm-2:55pm CINF 152: Extraction of structure-activity relationship information from activity cliff clusters
Dilyana Dimova1 , dimova@bit.uni-bonn.de, Dagmar Stumpfe1 , Jürgen Bajorath1
Abstract| Slides (pdf)
2:55pm-3:25pm CINF 153: Withdrawn
3:25pm-3:35pm Intermission
3:35pm-4:00pm CINF 154: Drug discovery tool pipeline - the best of all worlds
Carsten Detering1 , detering@biosolveit.com
Abstract
4:00pm-4:25pm CINF 155: 3D characteristics of efficient protein-protein interactions inhibitors: A big data analysis
Melaine KUENEMANN12 , melaine.kuenemann@univ-paris-diderot.fr, Laura M. L. Bourbon12 , Céline M. Labbé12 , Bruno O. Villoutreix12 , Olivier Sperandio12
Abstract
4:25pm-4:30pm Concluding Remarks
CINF: Chemical Information Skills: The Essential Toolkit for Chemical Research— A Joint CINF-CSA Trust Symposium 1:30pm - 5:15pm
Wednesday, August 19
Room 104A - Boston Convention & Exhibition Center
Grace Baysinger, Jonathan Goodman, Organizing
Grace Baysinger, Jonathan Goodman
Financially supported by: CSA Trust, Presiding
1:30pm-1:35pm Introductory Remarks
1:35pm-1:55pm CINF 141: So I have an SD File...what do I do next?
Rajarshi Guha1 , rajarshi.guha@gmail.com, Noel O'Boyle2 , baoilleach@gmail.com
Abstract| Slides (pdf)
1:55pm-2:15pm CINF 142: Chemical literacy for the ages: Essential skills in 2D chemical representation
Leah McEwen1 , lrm1@cornell.edu, Evan Hepler-Smith2
Abstract| Slides (pdf)
2:15pm-2:35pm CINF 143: From lab to the libraries: A new journey
Neelam Bharti1 , neelambh@ufl.edu
Abstract
2:35pm-2:55pm CINF 144: Experiments with chemists and information
Jonathan Goodman1 , jmg11@cam.ac.uk
Abstract| Slides (pdf)
2:55pm-3:10pm Intermission
3:10pm-3:30pm CINF 145: ChemData: A web application for learning chemical informatics
Stuart Chalk1 , schalk@unf.edu
Abstract| Slides (pdf)
3:30pm-3:50pm CINF 146: Improving geographically distributed research with real time collaboration
Andras Stracz1 , astracz@chemaxon.com, Aurora Costache2
Abstract| Slides (pdf)
3:50pm-4:10pm CINF 147: Chemical research toolkit: An end-to-end solution
Joshua Bishop1 , josh.bishop@perkinelmer.com, Phil McHale1 , Pierre Morieux1
Abstract
4:10pm-4:30pm CINF 148: ELN, RegMol and inventory: From synthesis to registration to inventory
Rajeev Hotchandani1 , hotchandani@yahoo.com
Abstract
4:30pm-4:35pm Concluding Remarks
CINF: Computational Toxicology: From QSAR Models to Adverse Outcome Pathways 1:30pm - 5:10pm
Wednesday, August 19
Room 103 - Boston Convention & Exhibition Center
Mohamed AbdulHameed, Organizing
Mohamed AbdulHameed
Cosponsored by: AGRO, COMP, ENVR and MEDI, Presiding
1:30pm-1:35pm Introductory Remarks
1:35pm-1:55pm CINF 156: Differential network analysis of chemical-mediated cancer induction
Francesca Mulas1 , fra.mulas@gmail.com, Daniel Gusenleitner1 , gusef@bu.edu, Stefano Monti12 , smonti@bu.edu
Abstract
1:55pm-2:15pm CINF 157: Massively orthogonal search engine for mechanism of action and toxicity studies
Douglas Selinger2 , douglas.selinger@novartis.com, Varun Shivashankar1 , Mustapha Larbaoui3 , Igor Mendelev1 , Michael Steeves1 , Stephen Litster1 , Philippe Marc3
Abstract| Slides (pdf)
2:15pm-2:35pm CINF 158: Combining predicted biological descriptors with chemical descriptors affords reliable hybrid QSAR models of rodent carcinogenicity
Regina Politi1 , reginap@email.unc.edu, Stephen Capuzzi1 , Sherif Farag1 , Alexander Tropsha1
Abstract
2:35pm-2:55pm CINF 159: Mining big datasets to create and validate machine learning models
Alex Clark2 , Sean Ekins1 , ekinssean@yahoo.com
Abstract
2:55pm-3:15pm CINF 160: From QSAR to big data: Developing mechanism-driven predictive models for animal toxicity

Marlene Kim2 , Hao Zhu1 , hao.zhu99@rutgers.edu
Abstract

3:15pm-3:30pm Intermission
3:30pm-3:50pm CINF 161: ChEMBL database and its application in toxicity assessment
Patricia Bento1 , patricia@ebi.ac.uk
Abstract
3:50pm-4:10pm CINF 162: Modeling ABC transporters as potential DILI targets
Matthew Segall1 , matthew.d.segall@gmail.com, Peter Hunt2 , Jon Tyzack2
Abstract| Slides (pdf)
4:10pm-4:30pm CINF 163: Addressing a key hurdle in translational research: Predicting mouse liver microsomal stability using machine learning
Alexander Perryman2 , alp168@njms.rutgers.edu, Sean Ekins13 , joel Freundlich42
Abstract
4:30pm-4:50pm CINF 164: Using supervised Latent Direchlet Allocation for structure-activity relation modeling in Tox21 2014 data challenge
Iwona Weidlich1 , iweidlic@gmail.com, Igor Filippov2
Abstract| Slides (pdf)
4:50pm-5:10pm CINF 165: Cheminformatics-based signal boosting for predicting drug adverse events
Andrew Fant1 , Andrew.Fant@fda.hhs.gov, Naomi Kruhlak1 , Keith Burkhart1
Abstract
CINF: General Papers 9:00am - 11:30am
Thursday, August 20
Room 104A - Boston Convention & Exhibition Center
Erin Davis, Organizing
Erin Davis, Presiding
9:00am-9:30am CINF 166: CIIPro: An online cheminformatics portal for large scale chemical data analysis

Daniel Russo1 , danrusso@scarletmail.rutgers.edu, Wenyi Wang1 , Marlene Kim1 , Daniel Pinolini1 , Hao Zhu12
Abstract| Slides (pdf)

9:30am-10:00am CINF 167: Improving virtual screening performance through identification of molecular descriptor features sensitive to specific biological activities
Martin Vogt1 , martin.vogt@bit.uni-bonn.de, Jürgen Bajorath1
Abstract| Slides (pdf)
10:00am-10:30am CINF 168: “Graphical abstracts only”: The changing use of periodicals among early career chemists

Marianne NOEL1 , noel@ifris.org
Abstract

10:30am-11:00am CINF 169: QSPR/QSAR studies of antifouling/fouling-release surface coatings containing quaternary ammonium salts
Farukh Jabeen3 , farukh.jabeen@ndsu.edu, Bakhtiyor Rasulev2 , Martin Ossowski2 , Bret Chisholm1 , Shane Stafslien1 , Philip Boudjouk4
Abstract
CINF: General Papers 1:00pm - 3:00pm
Thursday, August 20
Room 104A - Boston Convention & Exhibition Center
Erin Davis, Organizing
Erin Davis, Presiding
1:00pm-1:30pm CINF 170: Experimental chemoinformatics study of tautomerism of commercial screening samples
Laura Guasch1 , lguasch@helix.nih.gov, Marc Nicklaus1
Abstract
1:30pm-2:00pm CINF 171: Which kinase to hit in NCI-60? From a selectivity problem to a multitarget solution
Oscar Méndez Lucio2 , oscarmen@comunidad.unam.mx, Aakash Chavan Ravindranath2 , Qurrat Ul Ain2 , Kristian Birchall3 , Chido Mpamhanga3 , Stefan Knapp1 , Andreas Bender2
Abstract
2:00pm-2:30pm CINF 172: HackaMol: An object-oriented Modern Perl library for molecular hacking on multiple scales
Demian Riccardi1 , demianriccardi@gmail.com
Abstract
2:30pm-3:00pm CINF 173: Programmatic access to chemical information in PubChem
Sunghwan Kim1 , kimsungh@ncbi.nlm.nih.gov, Paul Thiessen1 , Evan Bolton1 , Stephen Bryant1
Abstract| Slides (pdf)

Cosponsored Symposia

COMP: Best in Class Computational Software by Integration 8:00am - 12:00pm
Sunday, August 16
Room 156B - Boston Convention & Exhibition Center
Alberto Gobbi, Patrick Walters, Organizing
Alberto Gobbi, Patrick Walters
Cosponsored by: CINF, Presiding
8:00am-8:30am COMP 17: Integrated suite of modeling tools that empower scientists in structure- and property-based drug design
JW Feng1 , jw.a.feng@gmail.com
8:30am-9:00am COMP 18: AIDEAS: An integrated cheminformatics solution
Rishi Gupta1 , rishirg@yahoo.com
9:00am-9:30am COMP 19: Autocorrelator v2.0: Adapting for a resource limited environment
Matthew Lardy1 , mlardy@gmail.com
9:30am-10:00am COMP 20: What's old is new again: Cheminformatics and the ‘modern’ web
Paul Watson1 , pwatson@arenapharm.com
10:00am-10:15am Intermission
10:15am-10:45am COMP 21: Developing an integrated software ecosystem at Merck
Scott Johnson1 , sajohn2@gmail.com
10:45am-11:15am COMP 22: Pharmit: Bring virtual screening to your browser
David Koes1 , dkoes@pitt.edu
11:15am-11:45am COMP 23: Building an integrated information environment for drug discovery
Jonathan Weiss1 , jonathan_weiss@vrtx.com, Guy Bemis2 , Carlos Faerman1 , Jun Feng1 , Brian Goldman3 , Xiaodan Zhang1 , Patrick Walters1
11:45am-12:00pm Panel Discussion
COMP: Integrated Approaches in Structure-Based Drug Design 8:00am - 12:00pm
Sunday, August 16
Room 156A - Boston Convention & Exhibition Center
Veerabahu Shanmugasundaram, Felix Vajdos, Organizing
Veerabahu Shanmugasundaram, Felix Vajdos
Cosponsored by: CINF and MEDI
Financially supported by: Pfizer, Presiding
8:00am-8:05am Introductory Remarks
8:05am-8:45am COMP 5: Wscore:integration of active site water structure into an empirical scoring function for calculating protein-ligand binding affinity
Richard Friesner1 , rich@chem.columbia.edu
8:45am-9:20am COMP 6: Water, thermodynamics, and drugs, oh my
Eric Manas1 , eric.manas@gmail.com, Alan Graves2
9:20am-9:55am COMP 7: Discovery and optimisation of a series of potent and selective Pan-Trk ligands
Sarah Skerratt1 , sarah.skerratt@pfizer.com
9:55am-10:10am Intermission
10:10am-10:50am COMP 8: In silico identification of Nav 1.7 inhibitors – building a homology model (Part I) and structure-based virtual screening (Part II)
Daniel La1 , daniel.la@amgen.com
10:50am-11:25am COMP 9: Using computational chemistry to drive design in the discovery of a potent, selective, brain penetrant and in vivo active LRRK2 kinase inhibitor
Bethany Kormos1 , bkormos@gmail.com, Jaclyn Henderson1 , Matthew Hayward2 , Karen Coffman2 , Jayasankar Jasti2 , Ravi Kurumbail2 , Travis Wager1 , Patrick Verhoest1 , Stephen Noell2 , Paul Galatsis1
11:25am-12:00pm COMP 10: Decision support for drug discovery: Some recent advances
Mark Murcko12 , mark_murcko@comcast.net
COMP: Integrated Approaches in Structure-Based Drug Design 1:30pm - 5:30pm
Sunday, August 16
Room 156A - Boston Convention & Exhibition Center
Veerabahu Shanmugasundaram, Felix Vajdos, Organizing
Veerabahu Shanmugasundaram, Felix Vajdos
Cosponsored by: CINF and MEDI
Financially supported by: Pfizer, Presiding
1:30pm-1:35pm Introductory Remarks
1:35pm-2:15pm COMP 42: Structure, enzymology, and biophysical characterization of a Jak3-Type II inhibitor complex
Felix Vajdos1 , fvajdos@gmail.com
2:15pm-2:50pm COMP 43: Structure and computationally guided design of potent non-nucleoside inhibitors with improved pharmacological properties that target HIV reverse transcriptase and drug-resistant variants
Karen Anderson1 , karen.anderson@yale.edu
2:50pm-3:25pm COMP 44: The devil is in the detail – two short stories on using direct binding data in lead optimization
Uli Schmitz1 , uli.schmitz@gilead.com, Jayaraman Chandrasekhar1 , Anita Niedziela-Majka1 , Roman Sakowicz1 , Sarah Boyce2 , Chris Higgs2 , Woody Sherman2 , Eric Lansdon1
3:25pm-3:40pm Intermission
3:40pm-4:20pm COMP 45: Transition state structure in the design of drug candidates
Peter C. Tyler2 , Gary B. Evans2 , Richard Furneaux2 , Vern Schramm1 , vern.schramm@einstein.yu.edu
4:20pm-4:55pm COMP 46: Using Ensemble-Docking and NMR constraints to generate high quality models of antagonist-bound HDM2 complexes
Xavier Fradera1 , xavier.fradera@merck.com
4:55pm-5:30pm COMP 47: Structure activity relationships of nuclear receptor, GPCR and kinase modulators revealed with differential HDX
Patrick Griffin1 , pgriffin@scripps.edu
PRES: 21st Century Chemistry Education: Formal and Informal 1:30pm - 5:00pm
Sunday, August 16
Room 158 - Boston Convention & Exhibition Center
George Bodner, Ingrid Montes, Organizing
Ingrid Montes
Cosponsored by: AGRO, CARB, CHAS, CHED, CINF, COLL, ENFL, PROF, SOCED and WCC, Presiding
1:30pm-1:40pm Introductory Remarks
1:40pm-2:10pm PRES 48: A community for teachers of chemistry by teachers of chemistry
Barbara Sitzman1 , sitzman@usc.edu
2:10pm-2:40pm PRES 49: Young chemists in action: The benefits of informal chemistry education
Sally Mitchell1 , sbmitchell2@gmail.com
2:40pm-3:10pm PRES 50: Promoting excellence in chemistry teaching through in-service professional development
Jesse Bernstein1 , bernsteinj@miamicountryday.org
3:10pm-3:20pm Intermission
3:20pm-3:50pm PRES 51: Making connections: Mentoring, networking, and presenting makes a difference for us and others as educators
Laura Slocum1 , leslocum621@gmail.com
3:50pm-4:20pm PRES 52: Teacher-tested, but student-blackbox online professional development for chemistry teachers
William Hunter1 , whunter@ilstu.edu
4:20pm-4:50pm PRES 53: Engaging researchers and students as partners in education and outreach
Carol Alpert1 , calpert@mos.org
4:50pm-5:00pm Concluding Remarks
PRES: 21st Century Chemistry Education: Formal and Informal 8:30am - 12:00pm
Monday, August 17
Room 158 - Boston Convention & Exhibition Center
George Bodner, Ingrid Montes, Organizing
George Bodner
Cosponsored by: AGRO, CARB, CHAS, CHED, CINF, COLL, ENFL, PROF, SOCED and WCC, Presiding
8:30am-8:40am Introductory Remarks
8:40am-9:10am PRES 59: Inspiring and motivating chemistry learning through visualization and rich contexts
Peter Mahaffy1 , peter.mahaffy@kingsu.ca
9:10am-9:40am PRES 60: Strategies to effectively incorporate learner-centered instruction into chemistry service courses
Maria Oliver-Hoyo1 , maria_oliver@ncsu.edu
9:40am-10:10am PRES 61: Opportunities of formal and informal chemistry education at the two-year college
Amina El-Ashmawy1 , ael-ashmawy@collin.edu
10:10am-10:20am Intermission
10:20am-10:50am PRES 62: Encouraging diversity in the chemical sciences
Carlos Gutierrez1 , cgutier@exchange.calstatela.edu
10:50am-11:20am PRES 63: Informal STEM education: Theory to outcome
Matthew Miller1 , matt.miller@sdstate.edu
11:20am-11:50am PRES 64: Overcoming popular myths about education
George Bodner1 , gmbodner@purdue.edu
11:50am-12:00pm Concluding Remarks
CHAS: Current Topics in Chemical Safety Information 9:00am - 11:30am
Tuesday, August 18
Waterfront 1A/1B - Seaport Hotel and World Trade Center
Leah McEwen, Ralph Stuart, Organizing
Leah McEwen, Ralph Stuart
Cosponsored by: AGFD, CCS, CHED and CINF, Presiding
9:00am-9:05am Introductory Remarks
9:05am-9:25am CHAS 22: Organizing chemical information to support lab safety
Ralph Stuart2 , secretary@dchas.org, Leah McEwen1
9:25am-9:45am CHAS 23: Keeping your kids away from poisonous chemicals: Chemical safety in the household
Na Qin1 , qinna@msu.edu
9:45am-10:05am CHAS 24: Updating NFPA 45: Fire protection for laboratories using chemicals
Laura Montville1 , lmontville@nfpa.org
10:05am-10:25am CHAS 25: Blueprint for successful chemical management at Yale’s West Campus
Christopher Incarvito1 , chris.incarvito@yale.edu, Kimberly Heard1
10:25am-10:40am Intermission
10:40am-11:00am CHAS 26: Chemistry lab safety information resources for academic user
Grace Baysinger1 , graceb@stanford.edu
11:00am-11:20am CHAS 27: Teaching future chemists how to create meaningful risk assessment tools
Samuella Sigmann1 , sigmannsb@appstate.edu
11:20am-11:30am Panel Discussion
CHAS: Current Topics in Chemical Safety Information 1:30pm - 5:00pm
Tuesday, August 18
Waterfront 1A/1B - Seaport Hotel and World Trade Center
Leah McEwen, Ralph Stuart, Organizing
Leah McEwen, Ralph Stuart
Cosponsored by: CCS, CHED and CINF, Presiding
1:30pm-1:50pm CHAS 28: Designing a hazard and risk assessment protocol for undergraduate instruction and use
David Finster1 , dfinster@wittenberg.edu
1:50pm-2:10pm CHAS 29: Experience with data handling in large chemical databases
Neal Langerman1 , neal@chemical-safety.com
2:10pm-2:30pm CHAS 30: Ensuring that lessons learned are not forgotten: Leveraging ELN to transform the safety paradigm
Mark Manfredi1 , mark.manfredi@bms.com, Ramesh Durvasula1 , William Bullock2 , Bob Cavallaro1 , Carol Mcnab1 , Matthias Nolte1 , Dana Vanderwall1
2:30pm-2:50pm CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alerting system
John May1 , john@nextmovesoftware.com, Roger Sayle1
2:50pm-3:05pm Panel Discussion
3:05pm-3:20pm Intermission
3:20pm-3:40pm CHAS 32: Biological and ecological toxicity of engineered nanomaterials
Ian Gunsolus1 , Tian Qiu1 , Vivian Feng1 , Christy Haynes1 , chaynes@umn.edu
3:40pm-4:00pm CHAS 33: eNanoMapper: A database and ontology framework for nanomaterials design and safety assessment
Barry Hardy1 , barry.hardy@douglasconnect.com, Egon Willighagen3 , Janna Hastings2 , Markus Hegi1 , Nina Jeliazkova4 , Haralambos Sarimveis5
4:00pm-4:20pm CHAS 34: Data, data everywhere, nor any bit processable: Opportunities for amalgamating and opening up chemical data and information relevant to hazard recognition and safety planning
Jian Zhang2 , Paul Thiessen2 , Gang Fu2 , Evan Bolton2 , bolton@ncbi.nlm.nih.gov, Leah McEwen1
4:20pm-4:40pm CHAS 35: It's all in how you do it: Annotating process conditions in laboratory chemical hazard recognition and risk management
Leah McEwen1 , lrm1@cornell.edu, Ye Li2
4:40pm-5:00pm Panel Discussion

Technical Program with Abstracts

ACS Chemical Information Division (CINF)
250th ACS National Meeting, Fall 2015
Boston, MA (August 16-20, 2015)

CINF Symposia

Erin Davis, Program Chair

[Created Wed Aug 12 2015, Subject to Change; Check ACS Online Program for Latest Changes]

CINF: Substance Identifiers, Addressing the Challenges Presented by Chemically Modified Biologics: The Role of InChI & Related Technologies
8:30am - 10:10am
Sunday, August 16

Room 104A - Boston Convention & Exhibition Center
Stephen Heller, Keith Taylor, Organizing
Stephen Heller, Keith Taylor, Presiding
8:30am-8:35am Introductory Remarks

8:35am-9:05am
CINF 1: Generating canonical identifiers for glycoproteins and other chemically modified biopolymers

Roger Sayle1 , roger@nextmovesoftware.com, John May1 , Noel O'Boyle1
1 NextMove Software, Cambridge, United Kingdom

Bioinformatics dogma asserts that all-atom representations, capable of encoding details such as disulfide bridging and post-translationally modified amino acids, are too unwieldy to be of practical use. In this presentation, we show how recent advances in computer power, software algorithms and storage technology require us to question this precept. We show how InChI, InChI keys and canonical SMILES can be generated for the largest known proteins, and even for nucleic acid sequences as large as viral and prokaryotic genomes. Indeed, unique identifiers derived from all-atom nucleic acid representations, allow the capture of epigenetic methylation information and circular DNA; feats that are impossible with the one-letter codes used by bioinformaticians. These unique identifiers allow the linking of mature antibodies to the unique identifiers of the plasmids used to express them. Finally, we discuss the possibility of polymer-specific implementations/optimizations of standard InChI, by showing how InChIs and InChI keys may be generated efficiently for specific classes of polymer with over a million atoms.


9:05am-9:35am
CINF 2: Toward addressing informatics challenges presented by antibody drug conjugates

Sai Chetan Sukuru1 , chetan.sukuru@pfizer.com, Tianhong Zhang2 , Lawrence Tumey1 , Elwira Muszynska3 , Megan Tran4 , Frank Loganzo3
1 Worldwide Medicinal Chemistry, Pfizer Inc., Groton, Connecticut, United States; 2 Research Informatics, Pfizer Inc., Cambridge, Massachusetts, United States; 3 Oncology Research Unit, Pfizer Inc., Pearl River, New York, United States; 4 Research Informatics, Pfizer Inc., Pearl River, New York, United States

Recent advances and drug approvals have led to an increased interest in Antibody Drug Conjugates (ADCs). Even though a lot of promising preclinical and clinical data has been reported on numerous ADCs, there is still a paucity of in silico descriptors and tools that could help navigate the molecular complexity of ADCs as well as analyze the data around them. To tackle the informatics challenges arising from new ADC discovery projects, we have developed a novel in silico tool called Antibody Conjugate Tracker (ACT). ACT is designed to efficiently characterize each ADC and its molecular components, namely the antibody, linker-payload and payload. The ACT provides a unique in silico environment with structured metadata that enables comprehensive data analytics on ADCs. Based on the versatile applications and impact of this tool in house – from compound registration and management to data visualization and analysis, we propose novel descriptors to parse and analyze ADC data that could improve our understanding and accelerate the discovery of potential therapeutic ADCs.

9:35am-10:05am
CINF 3: Representation of chemically modified proteins in the Substance Index SPL Files

Yulia Borodina1 , yulia.borodina@fda.hhs.gov, Gunther Schadow2
1 FDA, Catonsville, Maryland, United States; 2 Pragmatc Data LLC, Indianapolis, Indiana, United States

Chemically modified proteins often have complex structure and/or are described by a combination of structural and non-structural descriptors. Commonly used chemical data formats, such as MOLFILE and SMILES, are inadequate for representing such information. HL7 SPL is a structured document format adopted by FDA and used by manufacturers around the world for exchanging data about all medical products available in the USA. SPL uses an information modeling framework that encourages realist’s description of things on any scale of magnitude, organizations and devices in the macroscopic world as well as molecular assemblies in the microscopic world. Substance Index SPL files contain both structured information about chemically modified proteins and a hash code computed from this structured information.
10:05am-10:10am Concluding Remarks
CINF: The Growing Impact of Big Data in the World of Chemical Information
8:30am - 11:50am
Sunday, August 16

Room 104B - Boston Convention & Exhibition Center
Sean Ekins, Rudolph Potenzone, Antony Williams, Organizing
Sean Ekins, Rudolph Potenzone, Antony Williams, Presiding
8:30am-8:35am Introductory Remarks

8:35am-9:00am
CINF 7: Challenges in big data chemistry using publicly available chemical information

Sunghwan Kim1 , kimsungh@ncbi.nlm.nih.gov, Gang Fu1 , Volker Hähnke1 , Lianyi Han1 , Bo Yu1 , Lewis Geer1 , Benjamin Shoemaker1 , Asta Gindulyte1 , Siqian He1 , Paul Thiessen1 , Evan Bolton1 , Stephen Bryant1
1 NCBI / NLM / NIH, Warrenton, Virginia, United States

The term “big data” is used to indicate a collection of data sets that are too large and complex to deal with using traditional data management approaches. For the past few years, this term has been drawing much attention of the scientific community as well as the general public. Importantly, every year a substantial amount of public money is being invested on many biomedical research projects, resulting in a large amount of data that are freely available to the public. Therefore, researchers in the biomedical sciences have a great interest in exploiting these publicly available big data to make a scientific breakthrough that would significantly improve public health. However, to achieve this goal, it is essential to address many issues of big data, including analysis, search, sharing, storage, transfer, visualization, and many others. In this presentation, these big data issues are discussed in the context of publicly available small molecule data stored in PubChem (https://pubchem.ncbi.nlm.nih.gov), which is a public repository of information on chemical substances and their biological activities at the National Library of Medicine, National Institutes of Health. PubChem has the largest corpus of publicly available chemical information, with more than 180 million depositor-contributed substance descriptions, 60 million unique chemical structures, one million biological assays, and 225 million biological activity result outcomes, covering more than nine thousand target protein sequences. It also contains significant amounts of scientific research data and the inter-relationships between chemicals, proteins, genes, scientific literature, patents and more. Therefore, PubChem has been facing typical big data issues in collecting various data from different sources, organizing them to remove data redundancy, ambiguity, and inconsistency, and disseminating them to the public. This presentation provides an overview of PubChem’s strategies to address these big data issues.

9:00am-9:25am
CINF 8: Multiplexing analysis of 1000 approved drugs across 70 million PubChem entries: Will the correct structures please stand up?

Christopher Southan1 , cdsouthan@gmail.com
1 IUPHAR/BPS Guide to PHARMACOLOGY, University of Edinburgh, Göteborg, Sweden

Database molecular entries for approved drugs are the Crown Jewels of over 50 years of global R&D. However, a surprising degree of uncertainty surrounds exact numbers and explicit chemical structures. Choosing a representative approved drug or clinical candidate is becoming harder because of different molecular representations (i.e. structural multiplexing). In this work results will be presented from the analysis a 1000 drug set compiled inside PubChem . This showed that each structure had been submitted (on average) 81 times. In addition the “same connectivity” operator indicted 21 canonically related CIDs and each drug represented in 44 mixtures. We can also detect the “split bioactivity” problem where 135 CIDs related to taxol, 12 have bioassay results. As the totality of public chemical structures pushes towards 100 million we can track a constellation of problems related to the type of statistics above. In particular, the recently increased open availability of patent extracted chemistry and broadening vendor choice is generally welcome by database users. However, analysing entries related to the 1000 drugs across time indicates both types of expansions come with a cost. For example, the 55 million vendor CIDs show increased unresolved chirality (i.e. flat versions) and/or E/Z positions (crossed-bonds). In addition, noticeable “patent-picking”, including mixtures, suggest vendor submissions are increasing in virtual, rather than extant structures. The 21 million automated and manual patent extractions also bring in a variety of artefacts, such as shotgun exemplifications of mixtures, chrial permutations and virtual deuteration. Also, 85% are devoid of BioAssay data links. As a solutions to at least some of these problems, PubChem facilitates particularly effective query selects and filters predicated on their advanced relationship rules. Notwithstanding, the inexorable increase in multiplexing can confound the less experienced and is arguably reaching problematic proportions across all “big data” chemical resources.

9:25am-9:50am
CINF 9: How the availability of online data and datasets can underpin a platform of connected data

Antony Williams12 , tony27587@gmail.com
1 Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina, United States; 2 ChemConnector Inc., Wake Forest, North Carolina, United States

Nowadays there are many public databases online containing tens of millions of chemicals (for example PubChem and ChemSpider) and contained within these databases are 100s of millions of associated data points; for example PubChem contains over 200 million biological assay screening endpoints and continues to grow. Many of these databases have been assembled via contributions from various data sources of differing data quality. What has been learned during the process of producing these databases? What have we learned in regards to applying algorithms for data-checking and standardization of data? Can crowdsourced curation and annotation be used to improve and enhance data quality? Since text-mining and graphical conversion activities are increasingly used to extract data from both publication and patent corpora to build up an integrated database of chemicals and related property data, what cautions need to be taken prior to accepting the data. This presentation will give an overview of how large scale chemical databases can be assembled nowadays, some of the necessary cautions to preparing and integrating the data and the necessity of considering the new world of Linked Data and the semantic web.
9:50am-10:05am Intermission

10:05am-10:30am
CINF 10: Applying cheminformatics and bioinformatics approaches to neglected tropical disease big data

Sean Ekins12 , ekinssean@yahoo.com, Jair Lage De Siqueira3 , Laura_Isobel McCall3 , Malabika Sarker4 , Maneesh Yadav4 , Elizabeth Ponder5 , Adam Kallel1 , Barry Bunin1 , James McKerrow3 , Carolyn Talcott4
1 Collaborative Drug Discovery, Inc, Burlingame, California, United States; 2 Collaborations in Chemistry, Fuquay Varina, North Carolina, United States; 3 Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, California, United States; 4 SRI International, Menlo Park, California, United States; 5 ChEM-H, Shriram Center, Stanford University, Stanford, California, United States

Seventeen NTDs prioritized by WHO are endemic in 149 countries and affect more than 1.4 billion people globally, which costs these developing economies billions of dollars annually. Chagas disease is one of these NTDs caused by the eurkaryotic parasite Trypanosoma cruzi. The current clinical and preclinical pipeline for T. cruzi is extremely sparse and lacks drug target diversity. Several whole-cell, phenotypic high throughput screens have been completed for T. cruzi, including a screen of over 300,000 molecules in the search for chemical probes. We have compiled and curated relevant biological and chemical compound screening data including (i) compounds and biological activity data from the literature, (ii) high throughput screening datasets, and (iii) predicted metabolites of T. cruzi metabolic pathways. This information was used to help us identify compounds and their potential targets. We have constructed a Pathway Genome Data Base for T. Cruzi. In addition we developed Bayesian machine learning models that were used to virtually screen libraries of compounds. 97 compounds were selected for in vitro testing and 11 of these were found to have EC50in vivo mouse efficacy model and validated that the machine learning model could identify in vitro active compounds not in the training set as well as known positive controls. One molecule appeared to possess 85.2% efficacy in the mouse model. We have also proposed potential targets (for future verification) for this compound based on structural similarity to known compounds with targets in T.Cruzi. The approach we have taken is broadly applicable to mining all the NTDs and we propose how this might be achieved.

10:30am-10:55am
CINF 11: Chemocentric informatics analysis of 'omics' data identifies novel associations between histone deacetylase inhibitors and neurodisease

Mary Bradley1 , mary_p_bradley@hotmail.com
1 Informatics, FORMA Therapeutics, Brookline, Massachusetts, United States

It has been suggested that small molecule inhibition of histone deacetylases (HDACs) may provide therapeutic alternatives to treating schizophrenia and enhancing cognition. We have applied a chemocentric informatics protocol to identify networks of genes that may collectively define the mechanisms of action of isoform-selective HDAC inhibitors in modulating schizophrenia. Our approach combined chemical genomics, network mining, and biomedical text mining. Gene signatures were generated for silencing or inhibiting HDAC1-11 isoforms individually or collectively. Simultaneously, gene signatures for schizophrenia and known antipsychotic agents we obtained from the biomedical literature. All gene signatures were used to query the LINCS gene signatures’ database (http://www.lincsproject.org) to identify novel chemical-gene-disease connections in schizophrenia. Our analysis identified strong functional connections between isoform-selective HDAC inhibition and specific disease pathways in schizophrenia and led to the prioritization of 5 structurally different HDAC inhibitors as therapeutic alternatives to treating schizophrenia with reduced potential for side effects.

10:55am-11:20am
CINF 12: Chemical biology informatic approaches to identify and validate new therapeutic targets

Peter Kutchukian1 , peterkutchukian@hotmail.com
1 Merck, Castleton on Hudson, New York, United States

The identification of novel therapeutic targets for disease intervention has become paramount in the current drug discovery environment. Chemical Biology Informatics serves as a nexus between two fields that pursue complementary approaches to target discovery: human genomics and phenotypic screening. At the heart of genomics is the identification of genetic variations between a diseased or non-diseased population that modulates the diseased state, thus revealing potential therapeutic targets. Phenotypic screening, on the other hand, aims to identify new targets by screening chemical matter against cells or organisms that model a diseased state as genuinely as possible. A suite of integrated data repositories and tools have been developed to empower these endeavors at Merck. A chemogenomics database CHEMGENIE that captures compound-target interactions has been implemented by integrating and harmonizing internal and external data sources. This allows one to identify the chemical matter that perturbs a target, as well as view the activity profile of a compound against all measured targets. Based on these activity profiles, an algorithm to identify tool compounds that are potent and specific has been designed. This algorithm can be used to identify tool compounds for genetic targets of interest, in order to validate the targets in cell-based assays. Furthermore, it has been applied to design a set of tool compounds that has been screened against dozens of phenotypic screens. The Target Enrichment Analysis (TEA) has been developed to deconvolute the Mode of Action (MoA) of active compounds from these screens using the aforementioned set of tool compounds. In short, TEA allows the identification of targets with associated chemical matter that is enriched in the actives of a phenotypic screen. This approach has been used to successfully validate phenotypic screens, as well as to uncover targets for phenotypic screens, that will be discussed. In addition, TEA informs genomics endeavors, since it can be used to assess genetic hits and quickly identify whether perturbing putative targets in historical phenotypic screens has engendered an outcome biologically relevant to the current disease state of interest.

11:20am-11:45am
CINF 13: Analyzing ToxCast data using nebula (neighbor-edges based and unbiased leverage algorithm)


Huixiao Hong1 , Huixiao.Hong@fda.hhs.gov
1 FDA, Jefferson, Arkansas, United States

Most so called “big datasets” are also incomplete, with varying degrees of scarcity. Denoting such datasets as sparse datasets for convenience, they pose special difficulties across the board for traditional machine learning and classification algorithms. Yet, the future of big data is now, equating to some urgency in finding approaches for overcoming the challenges of analyzing sparse datasets. In this study we developed the Neighbor-Edges Based and Unbiased Leverage Algorithm (Nebula) to tackle sparse big data. The U.S. Environmental Protection Agency’s (EPA) ToxCast project evaluated a diverse set of chemicals, including both environmental chemicals and drugs, using a broad panel of high-throughput in vitro assays. ToxCast data have been studied to characterize the toxicological profiles of environmental chemicals. However, the dataset has a high degree of missing elements and thus is sparse. To warrant full utilization of ToxCast dataset generated from a huge EPA investment in risk assessment of chemicals, novel and comprehensive analyzing the dataset is needed. Therefore, as a test, we applied Nebula and modularity analysis for ToxCast data. We found that the chemical-assay network could be decomposed into seven densely connected modules based on its topological properties. Moreover, each of the seven modules was associated with different set of adverse outcome pathways (AOPs) as well as chemical structural descriptors. Leave-one-out cross validations showed a high consistency between experimental AC50 values from ToxCast and predicted AC50 values from Nebula. Our study demonstrated Nebula to be an efficient algorithm for analyzing sparsely populated big data and, thus, useful in the big data era. Our results also indicated ToxCast data could be used for toxicologically profiling chemicals that have not been assayed in ToxCast to assist risk assessment of chemicals.
11:45am-11:50am Closing Remarks
CINF: Applications of Cheminformatics to the Diverse World of Natural Products
10:30am - 11:55am
Sunday, August 16

Room 104A - Boston Convention & Exhibition Center
Roger Schenck, Antony Williams, Organizing
Valerie Biehl, Antony Williams, Presiding
10:30am-10:35am Introductory Remarks

10:35am-11:00am
CINF 4: Naming algorithms for derivatives of peptide-like natural products

Roger Sayle1 , roger@nextmovesoftware.com, Noel O'Boyle1 , Christopher Southan2
1 NextMove Software, Cambridge, United Kingdom; 2 IUPHAR/BPS Guide to PHARMACOLOGY, University of Edinburgh, Göteborg, Sweden

The nomenclature of natural products is a highly specialized field of biochemistry. Fortunately, some classes of natural products are more amenable to computer analysis than others. Non-ribosomal peptides and heavily post-translationally modified peptides, such as derivatives of the homodetic cycles gramicidin S and the cyclic depsi-peptide valinomycin and the natural product cyclic isopeptides anantin and sungsanpin push the current state-of-the-art in automated natural product naming. Where a compound is structurally related to an existing peptide, perceiving this relationship is required for generating succinct human understandable names. In this talk, we describe the use of databases/dictionaries based upon HELM notation and IUPAC's condensed line notations for specifying 'parent' peptides from which derivatives and analogues can be named. Using the described techniques the name '[5-L-valine]dichotomin C' may be assigned to the cyclic peptide CHEMBL478596. These techniques have been successfully used to identify and correct naming issues in the UniProt and IUPhar/BPS guide to pharmacology databases, which have then been updated by their curators.

11:00am-11:25am
CINF 5: Applications of cheminformatics to the diverse world of natural products

Antony Williams13 , tony27587@gmail.com, Serin Dabb2
1 Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina, United States; 2 The Royal Society of Chemistry, Cambridge, United Kingdom; 3 ChemConnector Inc., Wake Forest, North Carolina, United States

The Royal Society of Chemistry has been working on a number of the informatics challenges associated with the hosting and searching of natural product structures and related data. In particular we have been investigating the possibility of using available data to assist in structure verification (i.e. dereplication) as well as in the structure elucidation of novel natural products. RSC provides access to a carefully curated dataset of Marine Natural Products hosted in the Marinlit commercial database. We also provide free-access natural product data in the ChemSpider database aggregated from community resources and users and segregated from the entire collection of over 30 million chemicals. This presentation will provide an overview of the natural product data accessible via RSC systems and the benefits of moving to a web-based delivery system for Marinlit. The ability to perform structure identification using interrogation of spectral features will be examined as well as the benefits of integrating the ChemSpider data into a computer-assisted structure elucidation system, ACD/Structure Elucidator.

11:25am-11:50am
CINF 6: Reliable structure characterization and elucidation: Finding and confirming the truth

Patrick Wheeler1 , pwheeler@yahoo.com, Antony Williams2
1 Product Development, Advanced Chemistry Development, Encinitas, California, United States; 2 Royal Society of Chemistry, London, United Kingdom

Thanks to advances in NMR probes, consoles, and experimental techniques, it is possible to attempt structure elucidation on smaller samples than were ever accessible in the past.
Still, the interpretation of the resulting data can be arduous, requires great expertise, and is error-prone. Of course, other techniques are used to confirm structures as well: synthetic reproduction of natural products has a long tradition of success in the elucidation of molecules containing intricate elements, including multiple stereo centers. However, despite rigorous analysis by qualified chemists, these methods still sometimes arrive at erroneous results1-5. X-ray crystallography is highly reliable, but requires still larger amounts of relatively pure samples.
Astute application of modern technology can speed the rate at which structures are solved, while also crucially reducing errors that result either from synthetic methods or from unassisted analysis of instrumental data. Computer Assisted Structure Elucidation (CASE) has developed over the past decades to relieve the burden of work in proving [GM1] correct structures. In this presentation, we discuss how CASE is used to objectively analyze complex sets of NMR data in order to test structural hypotheses, conduct de novo structure elucidation, and query large databases of known structures for matches to already identified natural products. Results are available more quickly and more reliably than with previously available methods.

1Nicolaou. K.C, Snyder, S.A. Angew. Chem. Int. Ed. 44: 1012-1014, 2005.
2Sakano, Y., et al. J. Antibiot. 57: 564-568, 2004.
3Oö mura, S. et al. Chem. Abstr. 141, 122412m, 2004.
4Snider, B., Gao, X. Org. Letters, 7(20), 4419-4422, 2005.
5Kim, et al. Org. Letters, 15(1), 100-103, 2013.
11:50am-11:55am Concluding Remarks
CINF: Careers in Chemical Information and Cheminformatics Panel Discussion & Brunch
9:00am - 11:00am
Sunday, August 16

Room 52AB - Boston Convention & Exhibition Center
CINF: Visualizing Chemistry Data to Guide Optimization
1:00pm - 5:10pm
Sunday, August 16

Room 104B - Boston Convention & Exhibition Center
Erin Davis, Matthew Segall, Organizing
Erin Davis, Matthew Segall, Presiding
1:00pm-1:05pm Introductory Remarks

1:05pm-1:30pm
CINF 25: Integrating data visualization into the drug discovery workflow

Patrick Walters1 , pat_walters@vrtx.com, Guy Bemis2 , Jun Feng2 , Brian Goldman2 , Georgia McGaughey2 , Jeff Orr2 , Emanuele Perola2 , Susan Roberts2 , Jason Yuen2 , Jonathan Weiss2
1 Vertex Phamaceuticals, Boston, Massachusetts, United States; 2 Vertex Pharmaceuticals, Boston, Massachusetts, United States

Drug discovery involves the simultaneous optimization of many parameters. In order to produce a drug candidate, a team must consider many factors including activity, selectivity against anti-targets, in-vivo activity, physical properties and pharmacokinetics. Ultimately, a team needs to understand the relationships between chemical structure and biological activity, and exploit these relationships throughout the optimization process. Over the last twenty years, data visualization has been a key component of the drug discovery process. Starting from simple plots of single parameters, the field has evolved to utilize sophisticated visualizations that can incorporate knowledge of chemistry and biology. While off-the-shelf solutions can have a significant impact on a drug discovery effort, custom visualizations are still necessary. Novel visualization techniques, often borrowed from other fields, can provide unique insights. This presentation will provide an overview of a few of the visualization tools that we have integrated into our drug discovery infrastructure.

1:30pm-1:55pm
CINF 26: Data visualization: New directions or just familiar routes?

Edmund Champness1 , ed.champness@optibrium.com, Peter Hunt1 , Matthew Segall1
1 R&D, Optibrium Limited, Cambridgeshire, United Kingdom

Data visualization tools make it very easy to represent our data graphically and present it in a way that clearly communicates patterns and trends. But, there is a risk that visualizations may be used, in practice, to confirm or justify our own hypotheses and biases. Instead, can data visualizations bring to light patterns in our data, drive new hypotheses and show us things we weren’t expecting? In this presentation we will look at a number of common data analyses and visualizations used within the drug discovery process. We will illustrate some of the ways that these approaches can be misleading, with examples showing how inappropriate use of data visualization can lead us to conclusions which aren’t necessarily supported by our data. We will discuss alternative, visual methods to guide our decisions in drug discovery and consider ways in which these can enable us to drive the analysis of data without introducing any of our own biases.

1:55pm-2:20pm
CINF 27: Reaction discovery and optimization tools for visualizing chemistry data

Joshua Bishop1 , josh.bishop@perkinelmer.com, Phil McHale1 , Philip Skinner2 , megean schoenberg1
1 Informatics, PerkinElmer, Waltham, Massachusetts, United States; 2 Informatics, PerkinElmer, San Diego, California, United States

Electronic data collection in both industry and academia has increased the need for modern analysis and visualization tools that can handle chemistry information. Traditional visualization and analysis platforms (excel, powerpoint) lack the ability to easily and intelligibly depict and interact with chemical information. Today’s researchers need chemically alert tools that can provide a simple and intuitive user experience. The benefit of such a tool is two-fold: 1) Scientists can have their data represented in a familiar and scientifically intelligent fashion and 2) Powerful statistical tools are key to chemistry research. In this paper we will describe methods using modern data analysis and visualization tools to rapidly identify reactivity trends and pathways for transformation optimization, thereby saving chemists time that can be more productively spent in the lab.

2:20pm-2:45pm
CINF 28: Visualization of structure-activity relationship patterns and compound design using the SAR Matrix method

Dilyana Dimova1 , dimova@bit.uni-bonn.de, Jürgen Bajorath1 , bajorath@bit.uni-bonn.de
1 Life Science Informatics, University of Bonn, B-IT, Bonn, Germany

The Structure-Activity Relationship (SAR) Matrix (SARM) methodology was designed to extract compound series from screening or chemical optimization data and organize structurally related series and associated SAR information in an intuitive matrix format. SARMs systematically capture substructure relationships contained in compound data sets and reveal SAR patterns and trends. The data structure can be easily interpreted from a medicinal chemistry viewpoint. SARM calculations also yield many virtual candidate compounds that form a chemical space envelope around a given data set. These virtual candidates provide immediate suggestions for compound design. Different methods have been developed for activity prediction of virtual compounds from SARMs for hit expansion and lead optimization. The evolution of the SARM methodology is presented and results of prospective applications are reported.

References
Gupta-Ostermann, D.; Bajorath, J. The ‘SAR Matrix’ Method and its Extensions for Applications in Medicinal Chemistry and Chemogenomics [v2; ref status: indexed, http://f1000r.es/3gh] F1000Research 2014, 3: 113 (eCollection 2014).
Gupta-Ostermann, D.; Balfer, J.; Bajorath J. Hit Expansion from Screening Data Based upon Conditional Probabilities of Activity Derived from SAR Matrices. Mol. Inf. 2015, 34, 134-146.
2:45pm-3:00pm Intermission

3:00pm-3:25pm
CINF 29: Visualization and manipulation of Matched Molecular Series for decision support

Noel O'Boyle1 , baoilleach@gmail.com, Roger Sayle1
1 NextMove Software, Cambridge, United Kingdom

A Matched Molecular Series (MMS) is a set of molecules that differ by substitution at the same scaffold location [1]. For two molecules, this is equivalent to a Matched Molecular Pair.

We present a graphical interface for querying a database of bioactivity or physicochemical property data using a matched series. Using the database, predictions are made using the Matsy method [2] which suggests what R groups will improve the particular property value of interest.

An interesting aspect of our approach is that the interface treats the distinct R groups attached to a particular scaffold as first-class entities that can be manipulated and rearranged to see the effect on the predictions. This makes it easy, for example, to compare the predictions based simply on matched-pair information versus information from longer length series.

References:
[1] Wawer, M.; Bajorath, J. J. Med. Chem. 2011, 54, 2944.
[2] O’Boyle, N.M.; Boström, J.; Sayle, R.A.; Gill, A. J. Med. Chem. 2014, 57, 2704.


Setting up a query matched series

3:25pm-3:50pm
CINF 30: Design and characterization of chemical space networks

Martin Vogt2 , martin.vogt@bit.uni-bonn.de, Gerald Maggiora1 , Jürgen Bajorath2
1 Cancer Biology, Translational Genomics Research Institute, Tucson, Arizona, United States; 2 Life Science Informatics, University of Bonn, B-IT, Bonn, Germany

For chemical space display and navigation, networks are an attractive alternative to coordinate-based representations. In chemical space networks (CSNs), vertices represent compounds and edges between vertices pairwise similarity relationships. In CSNs, the chemical neighborhood of a compound is formed by a set of similar compounds; in coordinate-based views, it is represented by a parameterized multi-dimensional subspace. Threshold CSNs are conveniently generated on the basis of Tanimoto similarity using fingerprints as molecular descriptors. Similarity threshold values need to be defined to control the edge density of CSNs and generate interpretable topologies. As an alternative, CSNs can also be generated using substructure-based similarity measures, which are not dependent on pre-defined similarity threshold values. At constant edge density, CSNs representing different regions of chemical space can be directly compared using statistical measures from network science. The application of statistical parameters such as the clustering coefficient (transitivity) and modularity makes it possible to better understand the distribution of compounds in chemical space. In addition, the homophily principle is a major determinant of the CSN topology, typically resulting in the formation of well-defined community structures. CSNs of bioactive compounds systematically differ from CSNs of random samples of chemical space.

References:
(1) Maggiora, G. M.; Bajorath, J. Chemical Space Networks: A Powerful New Paradigm for the Description of Chemical Space. J. Comput.-Aided Mol. Des. 2014, 28, 795-802.
(2) Zwierzyna, M.; Vogt, M.; Maggiora, G. M.; Bajorath, J. Design and Characterization of Chemical Space Networks for Different Compound Data Sets. J. Comput.-Aided Mol. Des. 2015, 29, 113-125.

3:50pm-4:15pm
CINF 31: Interactive web-based tools for navigating the biologically relevant chemical space


Obdulia Rabal1 , orabal@unav.es, Julen Oyarzabal1
1 Small Molecule Discovery Platform, Center for Applied Medical Research (CIMA) - University of Navarra, Pamplona, Spain

One of the most common tasks that cheminformatics experts face is analysis and visualization of large collections of molecules with the purpose of analyzing data from large screening campaigns, establishing structure-activity relationships, acquiring new compounds to enhance the internal compound archive, designing libraries of analogues, finding bioisosteric replacements and many others.

Here, we first present BRCS navigator: an interactive web-based tool for visualizing the biologically relevant chemical space (BRCS) covered by compounds which has special focus on needs of medicinal chemists: minimal input, intuitive use and independent representation of any reference space. Applications of BRCS navigator in the above mentioned key navigation strategies of the drug discovery process are shown, with a special focus on the analysis of compounds extracted from patents.

As an alternative to full molecular navigation, a second web-based tool for inspecting databases of scaffolds is presented. Underlying this method is the generation of a novel 2D Scaffold Fingerprint (SFP) that describes rings according to their topology, shape, pharmacophoric features as well as position of and nature of their growing vectors. Two cases studies where this approach was applied for bioisosteric replacement and compared with well-validated current standard 2D and 3D methodologies are discussed.

4:15pm-4:40pm
CINF 32: Compact models for compact devices: Visualisation of SAR data using mobile apps

Alex Clark1 , aclark.xyz@gmail.com
1 Independent, Montreal, Quebec, Canada

Since smartphones and tablets began to take a prominent role if the computing landscape, their raw power and capacity have expanded rapidly, and in many ways their capabilities are becoming directly competitive with entry level workstations. In one way they have not kept pace: mobile devices are still poorly suited for directly handling large data collections. Fortunately, a number of structure-activity visualisation methods are based on the use of models that are much smaller and more portable than the source data used to build them, and these techniques include approaches such as Bayesian modelling, QSAR, and scaffold analysis. By taking advantage of the increasing ability for mobile apps to offer content-creation functionality (e.g. drawing molecular fragments) and localised implementation of fundamental algorithms (e.g. fingerprint/descriptor generation), it is becoming straightforward to use data sharing services to import models built from large data collections onto a touchscreen device. This opens the door to the full array of visualisation methods that can be applied to the small-to-medium collections of structures that are typically used within a mobile app.

This talk will describe a variety of visualisation methods that can be used to study structure-activity relationships on mobile devices, including atom colouring using Bayesian models from fragment-derived fingerprints, bulk analysis of molecule collections, scaffold/fragment matrix heatmaps, and graphical similarity clustering, among others. The ability to make use of such visualisation techniques on a mobile device considerably broadens the applicability of cheminformatics software, since the cost and requisite specialisation is greatly reduced. Also, being able to implement sophisticated algorithms natively on the device itself means that it is no longer necessary to rely on a webservice to provide the heavy lifting, which removes most of the security issues as well as the inconvenience of needing to maintain a remote server and ensure consistent network connectivity.

4:40pm-5:05pm
CINF 33: Fast, visual, and compelling analysis of datasets from similarity to SAR

Mike Hartshorn1 , Daniel Ormsby1 , Christoph Mueller1 , Rob Brown2 , rob.brown@dotmatics.com, Jesse Gordon3 , Tamsin Mansley3 , Clare Tudge3
1 Dotmatics, Ltd, Bishop's Stortford, United Kingdom; 2 Dotmatics, Inc, San Diego, California, United States; 3 Dotmatics, Inc, Woburn, Massachusetts, United States

Modern tools provide many opportunities for comparison of structures and their properties and a highly visual output enhances decision making. We will present case studies where scientists can analyze large datasets to identify lead series and understand the associated SAR. Techniques employed will include matched series, card view, Bayesian classifiers and high performance similarity, sub- and super-structure searching all wrapped into compelling, shareable, visual workspaces.
5:05pm-5:10pm Concluding Remarks
CINF: Wikipedia and Chemistry: Collaborations in Science and Education
1:00pm - 5:05pm
Sunday, August 16

Room 104A - Boston Convention & Exhibition Center
Ye Li, Martin Walker, Organizing
Ye Li, Martin Walker
Cosponsored by: CHED, Presiding
1:00pm-1:05pm Introductory Remarks

1:05pm-1:25pm
CINF 14: Chemistry and Wikipedia: Coverage, evolution, and citations

Elsa Alvaro2 , elsa.alvaro@northwestern.edu, Angel Yanguas-Gil1
1 Northwestern Argonne Institute of Science and Engineering, Northwestern University, Evanston, Illinois, United States; 2 Northwestern University Library, Northwestern University, Evanston, Illinois, United States

Wikipedia has become one of the most popular internet sites and the world’s largest encyclopedia. Containing a wealth of chemical information, it is frequently used by chemistry students and professionals. In this work we will study the chemistry coverage of Wikipedia, both from the point of view of its categories and individual pages, including its evolution as a function of time. We will also focus on the role of editors, and in particular in the distribution of contributions and whether chemistry contributions are widely distributed or concentrated on a few editors. Finally, we will also determine the prevalence of Wikipedia citations in the chemistry scholarly literature, and how Wikipedia is being cited in chemistry journals. The goal is to obtain a complete description of how chemistry is represented in wikipedia, its impact in research literature, and how the content has evolved with time.

1:25pm-1:45pm
CINF 15: Chemistry collaborations on Wikipedia

Martin Walker1 , walkerma@potsdam.edu
1 SUNY Potsdam, Potsdam, New York, United States


Wikipedia is a valuable resource for the chemistry community, but it is most effective when it can draw upon the wide chemistry community, and the full range of chemical information sources. This presentation will provide a history of collaborations with organizations such as ChemSpider, CAS, IUPAC, and show how these have made Wikipedia more reliable and authoritative. In return, Wikipedia has provided a high profile way for disseminating chemical information to the public. In education, a productive collaboration with a college class can improve Wikipedia while teaching writing and information literacy. However, collaborations must involve both groups working together with mutual respect and understanding. If a group seeks to promote its ideas or product through Wikipedia, it will sow distrust among Wikipedians and shut down collaboration. If a college instructor fails to work with Wikipedians when running a class project, mass deletions and hostility may result. This presentation will show how it is possible for the communities to come together to improve and enrich Wikipedia for all chemists.

1:45pm-2:05pm
CINF 16: Improving the knowledge about chemistry: The two leading encyclopedias, Wikipedia and RÖMPP, cooperate in Germany

Guido Herrmann1 , guido.herrmann@thieme.de
1 Georg Thieme Verlag Kg, Stuttgart, Germany

Thieme has been a chemistry publisher since 1909 and publishes scientific information in various formats: journals, reference works, encyclopaedia, monographs and textbooks. The chemical encyclopedia “RÖMPP” has been founded by Dr. Hermann RÖMPP in 1947. About 250 authors have been contributing to the work over the last years and today RÖMPP contains more than 65000 entries and 14500 structural formulas. For more than a decade RÖMPP is published online (https://roempp.thieme.de/) and there are now monthly updates.

Since 2014 a cooperation with the project chemistry in the German-language edition of Wikipedia has been established. The aim is to improve the content in both encyclopedias and strengthen the links between both works. There has been intensive exchange between Wikipedians and RÖMPPians.

The talk will present background information and will highlight some of our key findings and best practises.

2:05pm-2:25pm
CINF 17: PubChem Wikipedia integration and potential for future collaboration

Jian Zhang1 , jiazhang@ncbi.nlm.nih.gov, Paul Thiessen1 , Asta Gindulyte1 , Evan Bolton1
1 NCBI-NLM/NIH, Bethesda, Maryland, United States

Wikipedia is a popular information source. Major search engines such as Google and Bing include some degree of integrated Wikipedia information content. For nearly ten thousand chemical entries in Wikipedia, more than 70% contain PubChem CIDs added as a major identifier. PubChem is collaborating with Wikipedia. As a part of this PubChem has integrated Wikipedia links into the PubChem Compound Summary page. This talk will give an overview of PubChem’s Wikipedia data integration project and discuss the potential for improved collaboration between both sites.

2:25pm-2:45pm
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining

Roger Sayle1 , roger@nextmovesoftware.com, Daniel Lowe1
1 NextMove Software, Cambridge, United Kingdom

The resources provided by the Wikimedia Foundation provide an unprecedented resource for chemists, information professionals and natural language processing researchers in the annotation of pharmaceutically-relevant information in documents. A widely publicized example of the use of Wikipedia in artificial intelligence research is IBM's Watson's participation in the Jeopardy! quiz show. In this presentation, we present several chemical research applications of Wikipedia-derived data sets, including named-entity dictionaries and synonym lists for linking ontologies. The global community of volunteer contributors to these projects deserves continual recognition for the invaluable resource they enable.
2:45pm-3:00pm Intermission

3:00pm-3:20pm
CINF 19: Tools and strategies: Incorporating Wikipedia-based assignments into a course


Eryk Salvaggio1 , eryk@wikiedu.org, Jami Mathewson1 , jami@wikiedu.org
1 Wiki Education Founation, San Francisco, California, United States

As the fifth most-visited website, with over 4.5 million articles, Wikipedia is the largest source of shared information on the planet. Because of this unique position as a shared knowledge resource, there is tremendous academic interest in ensuring that the information on Wikipedia is accurate, complete, and well-sourced.

Wikipedia writing assignments are one model for connecting academic knowledge to the public. In that assignment, a student's traditional term paper is replaced, or complemented by, writing for Wikipedia. The student gains the experience of communicating for an authentic audience. They tackle skills related to digital literacy, library research and citation methodologies, and peer review of writing. It also encourages a clear distillation of academic concepts through writing aimed at a general audience.

Through my role at the Wiki Education Foundation, I have worked with hundreds of instructors to create successful assignments for courses in the USA and Canada. In this presentation, I will highlight methods of incorporating Wikipedia-based assignments into a course based on the research and experience of the Wiki Education Foundation. I'll present tools, useful information and strategies for instructors to get started.

3:20pm-3:40pm
CINF 20: Wikipedia editing in chemistry classrooms: Resonance and gaps between educational needs and Wikipedia community practices

Ye Li1 , liye@umich.edu
1 3162 Shapiro Science Library, University of Michigan, Ann Arbor, Michigan, United States

Instructors in higher education have found it productive to assign Wikipedia editing as a project for classes in Chemistry and many other fields. Editing or creating Wikipedia entries relevant to concepts learned in class enables students to enhance their understanding of subject matters as well as improve their information literacy, writing, peer review, and teamwork skills. Students often embrace Wikipedia projects because they see their contributions to public good immediately and can engage with the broader Wikipedia community in real world. Wikipedia community also welcomes positive and well-informed contributions to Wikipedia from educational activities. Programs such as Wiki Education Foundation have been initiated to support these educational activities. However, students’ contribution to Wikipedia may not always meet the standards of the Wikipedia community while the educational practices may not always fit the norms of Wikipedia community. Limitations of time, resources, and communication can lead to unproductive editing and even heated dispute between student editors and other Wikipedians. Based on my experiences with supporting courses using Wikipedia editing projects, collaborations and transparent communications among instructors, librarians, the Wiki Education Foundation and interested Wikipedians is essential to minimize these gaps between the classroom needs and the practices of Wikipedia Community. This presentation will examine these courses to review successes as well as inevitable challenges and conflicts. The unique advantages and disadvantages of using Wikipedia editing projects in Chemistry classes will also be discussed. Finally, we will explore strategies to ensure students’ contributions to Wikipedia applauded by the community while creating productive learning experiences for them.

3:40pm-4:00pm
CINF 21: Improving Wikipedia topics, a chemistry outreach activity

Keith Lindblom1 , k_lindblom@acs.org
1 American Chemical Society, Washington, District of Columbia, United States

Wikipedia is one of the most visited website for general information about a variety of fields, including chemistry. Because of its popularity, Wikipedia is an influential resource about chemistry topics for students and the public. Chemists and other subject matter experts can volunteer as contributors to articles related to their fields and interests. Keith Lindblom of the ACS Office of Public Affairs will explain his experience as a Chemistry Ambassador to improve the quality of Wikipedia resources and describe how others can get involved.

4:00pm-4:20pm
CINF 22: Value of the Mediawiki platform for providing content to the chemistry community

Antony Williams12 , tony27587@gmail.com
1 Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina, United States; 2 ChemConnector Inc., Wake Forest, North Carolina, United States

At this time, and in a culture where online access is now an imperative, Wikipedia has become the definitive encyclopedia. In terms of its support for chemistry it is rich in many encyclopedic pages including named reactions, chemical and drug pages, articles about chemists, and many other forms of chemistry related information. Wikipedia is hosted on Mediawiki, an open source platform that can be utilized by anybody as the basis of their own hosted content collection. Mediawiki has been used as a collaborative environment by a number of chemists to create As a general contribution to the community Mediawiki has been used to create a number of resources that have become very popular with the chemistry community. These include VIPEr to support inorganic chemistry, ChemWiki as an online textbook and other educational resources and a Chemical Information Wikibook. Mediawiki has also been used by the author to host open source collections of data including scientists, scientific databases and mobile apps for science: the ScientistsDB, SciDBs and SciMobileApps wikis. This presentation will provide an overview of some of the chemistry resources that presently exist and celebrate the major contributions that Wikipedia and Mediawiki have made to the collaborative dissemination of chemistry.

4:20pm-4:40pm
CINF 23: Chemical collaborations in the wiki realm

Andy Mabbett1 , mabbetta@rsc.org
1 Royal Society of Chemistry, Birmingham, United Kingdom

Andy Mabbett has been Wikimedian in Residence at the Royal Society of Chemistry since September 2014 (the role is ongoing). He has been - and continues to be - Wikipedian in Residence with a number of museums and galleries, as well as with ORCID. In this presentation, he will explain why the Royal Society of Chemistry, a learned society whose roots go back to 1842, chose to be involved with the Wikimedia movement, and how the Society's interest has grown over a number of years, despite some initially unwelcoming reactions from some Wikipedians. He will compare the role of a Wikimedian in Residence in a learned society - one which is also a major scientific publisher - with Wikipedians in Residence in more 'tarditional' host venues such as museums, archives and art galleries. He will give a frank appraisal of some of the obstacles he has encountered, and overcome, and will give details of a donation, by the Royal Society of Chemistry, of many thousands of dollars worth of journal access to Wikipedians.

One of Andy's initial goals upon accepting the post was to foster collaboration between Wikimedians working in different languages and Andy will show how this was achieved through collaboration with the Amical (Catalan) Wikimedia community, as well as editors using Italian and other languages.

Other topics covered will include the non-Wikipedia side of the residency; including contributions to Wikidata and Wikimedia Commons being key, but also Wikibooks, Wikiquote and others. There have even been contributions to OpenStreetMap and other open-content projects outside the Wikimedia Foundation's purview.

Finally, Andy will explain why complaints that the Royal Society of Chemistry's Wikimedian in Residence was not a chemist were irrelevant, and how non-scientists can be encouraged to contribute to other science collaborations in Wikimedia projects.

His talk will be framed as a model which other institutions condsidering hosting a Wikimedians in Residence may choose to follow, or to borrow from.

4:40pm-5:00pm
CINF 24: Panel Discussion: Wikipedia and MediaWiki: Collaborations and Education in Chemistry

Ye Li2 , liye@umich.edu, Martin Walker1 , walkerma@potsdam.edu
1 SUNY Potsdam, Potsdam, New York, United States; 2 3162 Shapiro Science Library, University of Michigan, Ann Arbor, Michigan, United States

A panel of speakers presented in the Symposium, Wikipedia and Chemistry: Collaborations in Science and Education, will interact with audience and answer questions involving Wikipedia and MediaWiki and their roles in scientific collaborations and education in Chemistry. The questions include but are not limited to the following ones: How to coordinate chemists' effort to improve Wikipedia or other MediaWiki platforms ? How to design and implement Wikipedia-based assignments in Chemistry classes? How to take advantage of Wikipedia content wisely, for both human and computer consumption?
5:00pm-5:05pm Concluding Remarks
CINF: CINF Scholarships for Scientific Excellence: Student Poster Competition
6:30pm - 8:30pm
Sunday, August 16

Lighthouse Blrm 1 - Seaport Hotel and World Trade Center

6:30pm-8:30pm
CINF 34: P-OSRA: Polymer Optical Structure Recognition Application


Bryn Reinstadler21 , br6@williams.edu, Hans Horn2
1 Williams College, Williamstown, Massachusetts, United States; 2 IBM - Almaden, San Jose, California, United States

The discovery and synthesis of novel polymers is a crucial step in advancing many diverse areas of chemistry, from plastics to drug-delivery mechanisms. Currently, the abstruse nomenclature rules for polymers result in making searching the literature on polymers arduous at best, as in many cases the difficulty of polymer nomenclature discourages scientists from naming new polymers in their papers. The difficulty of searching the literature slows down the progress in this field. P-OSRA, the Polymer Optical Structure Recognition Application, built on OSRA [1], is open-source software that aims to process images of polymers mined from the literature, output structural results in the SMILES format, and then store those results in a database available for user requests. The current implementation takes direct input of polymer images and is able to process the images and store the structural results.

[1] Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution. Igor V. Filippov and Marc C. Nicklaus, Journal of Chemical Information and Modeling 2009 49 (3), 740-743. DOI: 10.1021/ci800067r

6:30pm-8:30pm
CINF 35: Withdrawn

6:30pm-8:30pm
CINF 36: Knowledge-based approach to the parameterization of small molecule force fields based on crystal structures

Florian Roessler2 , fdr20@cam.ac.uk, Oliver Korb1 , Robert Glen3 , Peter Bond4
1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 2 University Cambridge, Cambridge, United Kingdom; 3 University of Cambridge, Cambridge, United Kingdom; 4 Chemistry, University of Cambridge Unilever Centre, Cambridge, United Kingdom

Computer-aided drug discovery has been increasingly successful over the last three decades, but still faces major obstacles. Molecular dynamics (MD) simulation approaches are routinely used to rationalize the conformational dynamics and energetics associated with the physical interactions between small molecules and the binding sites of biomolecules. Accurate parameterization of molecular force fields (FFs), particularly for novel small molecules, is currently a resource- and time-intensive task. Several current approaches rely on using or adapting pre-calculated parameters for common molecular functionalities, which are used as building blocks for larger molecules.

In contrast to these theory-based approaches, this research project focuses on generating parameters for a wide range of small molecules using a fast knowledge-based approach. For this purpose we mine the conformational information in the Cambridge Structural Database, which contains over 750,000 curated small molecule crystal structures. The focus of this work is on assessing and improving the bonded parameters, i.e. bond lengths, valence angles and dihedrals, of existing modern simulation FFs. A comparison between experimental and currently available pre-calculated parameters is a preliminary step and provides insights into the differences between the respective parameter spaces.

The comparison identified a large number of cases for which the bonded parameters taken from existing force fields agree with those determined by the knowledge-based approach, however significant discrepancies were also observed. In response to the initial results the impact of these differences on additional MD accessible observables (e.g. thermodynamic properties) was assessed. For this purpose a dataset containing chemical structures, which cover both similar and dissimilar bonded parameters, was assembled. Preliminary results indicate that parameters for systems, for which sufficient crystallographic data is available, can be readily and accurately parameterized within a knowledge-based approach. This could potentially lead to improvements over existing force fields.

6:30pm-8:30pm
CINF 37: Pilot study of clustering based safety assessment for fragrance ingredients

Jie Shen1 , jshen@rifm.org, Lambros Kromidas1
1 Research Institute for Fragrance Materials, Inc., Woodcliff Lake, New Jersey, United States

With the development in science and technology and changes in the regulatory standards, there are growing needs to evaluate or re-evaluate large inventory of chemicals. Evaluating a set of similar chemicals together (i.e., clustering or grouping) is considered to be resource and time conservative. This practice is also encouraged by the regulatory agencies, such as ECHA and US EPA. ECHA and OECD have issued some guidance on chemical grouping and categorizing. The underlying hypothesis is that chemicals with similar structures are expected to have similar toxicological profiles. However, chemical grouping, clustering, or categorizing usually depends on the chemists’ experience and may be subjective. As such, a transparent, objective, comprehensive, and reproducible method should be preferred and adopted. Chemical Assessment Clustering Engine (ChemACE) was developed by the U.S. EPA, Office of Pollution Prevention and Toxics (OPPT) to assist in the review and prioritization of large inventories of structurally diverse chemicals. In this study, 2664 fragrance ingredients were clustered using ChemACE. This resulted in 381 Clusters and 774 ingredient were considered as “orphans” -- i.e., chemicals that cannot be clustered. When using a less restrictive parameter, 378 clusters and 739 orphans were obtained. The clustering and not clustering of some material was questionable.

To better the clustering outcome, we developed a set of restrictive criteria that resulted in less orphans and more appropriate clustering based on a toxicological end point.

As a conclusion, clustering based only on chemical similarity is not enough to form a reliable cluster. Additional factors, such as structural alerts, metabolism and physicochemical properties, should be taken into consideration.

6:30pm-8:30pm
CINF 38: Investigation of the endocrine disruption potential of bisphenol A replacement compounds

Hui Wen Ng2 , ng.huiwen33@gmail.com, Roger Perkins2 , Weida Tong2 , Huixiao Hong1
1 FDA, Jefferson, Arkansas, United States; 2 NCTR, FDA, White Hall, Arkansas, United States

Bisphenol A (BPA), a compound once widely in many consumer, medical and dental products, has been a public and regulatory concern for the past two decades due to its potential for interfering with endocrine function. Consequently, many products now replace BPA with other compounds, some analogues to BPA. Unfortunately, the replacements have not been subjected to thorough safety testing; portending scientific and public concerns will arise anew. This study aimed to assess the estrogen-mimicking potential of BPA replacement compounds in order to assist in their safety evaluations. The BPA replacement compounds were identified from a literature review. Experimental estrogenic activity data were obtained from the curated Estrogenic Activity Database for some of the BPA replace compounds. A docking-based in silico model was developed to predict the estrogenic activity of the compounds lacking experimental data, then molecular dynamics (MD) simulations were performed to study the potentially key interactions involved in the binding and activation of the estrogen receptor (ER). The results showed good correlation between docking prediction and experimental findings. The MD simulations elucidated the role of potentially important interactions such as hydrogen bonding and hydrophobic contact in ligand-ER binding. Our study indicated that a considerable number of BPA replacement compounds are estrogenic, thus warranting consideration of thorough safety evaluations of these compounds to protect the environment and the public

6:30pm-8:30pm
CINF 39: Chemical alerts and QSAR models based on dynamically-generated annotated linear structural fragments

Darshan Mehta1 , mehta.182@osu.edu, James Rathman1 , Chihae Yang1
1 Ohio State University, Columbus, Ohio, United States

Chemical descriptors capture and represent structural information for use in chemoinformatics applications such as similarity searching, identifying alerts, and QSAR modeling. One approach to constructing the “fingerprint” for a compound using structural features is to match the molecule of interest against a library of predefined structural fragments, such as MACCS keys or ToxPrint Chemotypes. A second approach is to generate structure fragments dynamically by extracting them “on the fly” from the compound set of interest. Following the latter approach, we present a method for dynamic generation of linear fragments, linear subgraphs of chemical structures that can be annotated with atom-based features such as atom identity, connectivity, and partial charge. The choice of annotation scheme provides considerable flexibility in defining chemical fragments, and the linear nature of these fragments allow us to apply sequence analysis techniques to, for example, discuss the similarity between molecules in terms of alignment scores, analogous to methods used in bioinformatics for the analysis of nucleic acids and proteins. The performance of these fragments is evaluated on two toxicity endpoints: skin sensitization and Ames mutagenicity. Results show that these annotated linear fragments are able to extract meaningful chemical alerts, and also provide sets of descriptors useful for QSAR modeling. Again exploring the linear nature of these descriptors, a simplified Markov chain model has been developed using the annotated linear fragments. Applying this approach to the Hansen Ames mutagenicity dataset yields cross validation performance (sensitivity 74%, specificity 64%) better than many results reported in the literature on this same dataset. Further work is being done to extend this method to more complex Markov models, with focus on improving predictivity of models for chemical toxicity.

6:30pm-8:30pm
CINF 40: Developing group contributions for predicting transition state structures

Pierre Bhoorasingh1 , bhoorasingh.p@husky.neu.edu, Richard West1
1 Dept of Chemical Engineering, 313 SN, Northeastern University, Boston, Massachusetts, United States

Kinetic calculations involve the manual estimation of transition state searches, which are time consuming and potentially erroneous.
We have developed a group contribution method to predict the unknown reaction center distances of a transition state, then use the predictions to create transition state geometry estimates.

Mechanism generation is a useful tool for understanding complex chemical systems, and the process has been automated in software such as Reaction Mechanism Generator.
The mechanisms require a large number of kinetic parameters, many of which have to be estimated when known values are not available.
Kinetic estimation methods cannot be reliably extended to new chemical systems, limiting the use of mechanism generation software.
Manual kinetics calculations help address this issue, but a high-throughput method is required to provide sufficient kinetic parameters.

Predicting geometries for stable molecules is well understood, and can be applied to inert atoms in the transition state.
It is in the active site, where bonds are being formed and broken, where chemical knowledge is used to manually predict the transition state geometry.
This knowledge is replaced with molecular group contributions for a high-throughput automated procedure.
For a given reaction family, reaction center distances of known transition state geometries are collated to train the molecular groups in a hierarchical tree.
The trained group values are used to estimate reaction center distances for any reaction that belongs to the reaction family.
The predictions are then used with distance geometry to create transition state geometry estimates.

The geometry estimates are optimized using density functional theory, then validated with a path analysis calculation.
The path analysis geometries are converted into chemical graphs for comparison via graph isomorphism with the starting reaction in order to validate the transition state.

The group contribution method has been tested on three reaction families and can reliably predict transition states for each.
Consistent with the group contribution hypothesis, the predictions improve as more data is used to determine the molecular group values.

6:30pm-8:30pm
CINF 41: Changes in scholarly publishing practices in the chemical sciences: A focus on early career chemists

Marianne NOEL1 , noel@ifris.org
1 LISIS & IFRIS, Université Paris-Est, Marne-la-Vallée Cedex 2, France

The proposed poster is a follow-up of a 4-year collective study (ANR PrestEnce 2010-2013) where we studied the organizational construction of academic quality in high standing chemistry departments (Paradeise and Thoenig 2013, Paradeise et al., 2014).
Specifically, it focuses on early career researchers and relies on a field visit to a chemistry department in the USA in 2012-2013. In this study, I use a set of 13 interview data derived from single interviews with PhD students and postdocs. 5 semi-structured interviews were conducted in English, 8 in French. They were tape-recorded, then fully transcribed and lasted an average time of 45 minutes to one hour and half. For confidentiality reasons, data are anonymized and names are changed.
Interview transcripts are coded thematically using NVivO qualitative analysis software. Coding is the process of analyzing data by identifying particular themes, phrases, language and stories in the interviews, and tagging them using short identifiers that can be catalogued and assembled to collate similar data across interviews. This process of collating our data helped us to formulate an analysis of the substance of the conversations, and ultimately an argument. In this poster I will precisely describe two subsets of tags (on research work and informational practices) used to extract passages from interview transcripts.
Results suggest that reading, writing and, to a lesser extent, publishing feed into a non-linear process where inputs (publications and results) are constantly revisited to “build up a story”. Questioned PhD candidates and postdocs evoked a large browsing experience, in some cases going through huge quantities of papers (up to 1000 per week) and focusing on graphical abstracts only. In many interviews, time was considered a crucial aspect. Surprisingly, interviewees did not relate this to the time pressures of the publishing process but rather to the epistemic nature of objects and techniques they used and studied.
This research aims to outline the changing use of periodicals and their role in defining “value” in scholarly communication. More generally I would question the emergence of a modern press culture and civilisation of periodicity.

6:30pm-8:30pm
CINF 42: Predicting Tox21 assay outcome by quantitative structure-activity relationship and machine learning methods

Mikyung Lee1 , mikyung.lee11@gmail.com, Dac-Trung Nguyen2 , Ruili Huang2
1 NCATS, National Institute of Health, Rockville, Maryland, United States; 2 NCATS, NIH, Rockville, Maryland, United States

People are exposed to various chemicals through a variety of sources including food, household cleaning products and medicines. In some cases, these chemicals can be toxic thus to assess the risks of new chemical entities remains a mission of paramount importance. As one of the challenges for toxicity assessment, the U.S. Tox21 program has profiled a collection of approximately 10k chemicals against a panel of 12 human nuclear receptors and stress responses in a quantitative high-throughput screening (qHTS) format to assess their toxic potential. In this study, we have built quantitative structure-activity relationship (QSAR) models on the qHTS datasets used in the recent Tox21 Challenge (https://tripod.nih.gov/tox21/challenge) for the prediction of potential chemical toxicity. Different machine learning methods, including linear discriminant analysis (LDA) and random forest (RF), for example, were applied with 30 random undersampling of minority class to attenuate the effect of imbalanced data sets for multiple descriptors and fingerprints extracted from, e.g., MOE and RDKit. The best models were obtained when using a combination of descriptors achieving accuracies of ~70% on average on the validation of external test set. Here we discuss the results from our models in comparison with the ones we received from the Tox21 Challenge participants.

6:30pm-8:30pm
CINF 43: Chess-like algorithms behind Chematica's retrosynthetic planning


Sara Szymkuc1 , sara.szymkuc@icho.edu.pl, Ewa Gajewska1 , Tomasz Klucznik1 , Piotr Dittwald1 , Michal Startek3 , Karol Molga1 , Michal Bajczyk1 , Bartosz Grzybowski21
1 Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland; 2 Chemistry, UNIST, Ulsan, Korea (the Republic of); 3 Mathematics and Computer Science, University of Warsaw, Warsaw, Poland

One of the key challenges of computer-assisted retrosynthetic planning is the exponential increase in the number of possibilities with every reaction step taken. In the 1970s and the 1980s this problem was deemed impossible to overcome given the limited power of computers that were back then available. Yet, computing speeds have since improved manifolds and nowadays simple cell phones have the power of the once-famous Cray supercomputers. Concomitant development of efficient algorithms has led to several major breakthroughs including computers performing complex symbolic math operations (e.g., in Mathematica) or defeating human chess champions (e.g., Deep Blue). Our group has taken inspiration (and hope) from these advances and for several years have been developing algorithms that could efficiently plan chemical syntheses. In my poster, I will describe how these algorithms have finally come to fruition and why they are now capable of planning syntheses of truly complex molecules including natural products. Our algorithms navigate chemical space in intelligent ways scoring synthetic “positions” and strategizing to avoid sequences of poor synthetic choices. They can venture into one branch of retrosynthetic possibilities but, if this branch proves unpromising, can revert and explore alternative strategies that ultimately converge onto optimal synthetic solutions. The poster will discuss the nuts-and-bolts of the algorithms (including various reaction scoring functions we developed) and will supplement these fundamental considerations with examples of actual retrosynthetic pathways designed by Chematica. A computer demo of Chematica will also be provided.

6:30pm-8:30pm
CINF 44: Retrosynthesis of complex molecules using Chematica

Ewa Gajewska1 , ewa.p.gajewska@gmail.com, Sara Szymkuc1 , Tomasz Klucznik1 , Piotr Dittwald1 , Michal Startek3 , Karol Molga1 , Michal Bajczyk1 , Bartosz Grzybowski21
1 Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland; 2 Chemistry, UNIST, Ulsan, Korea (the Republic of); 3 Mathematics and Computer Science, University of Warsaw, Warsaw, Poland

After over a decade of development, a family of algorithms collectively known as Chematica is now ready to tackle realistic problems in synthetic design. It is for the first time in history of chemistry that a computer program can construct – completely de novo -- multistep retrosynthetic pathways leading to complex natural products, drugs and arbitrary user-specified molecules. In doing so, Chematica is unprecedented in that it takes into account complete stereochemistry, regiochemistry, protection chemistry and potential reactivity conflicts. My poster will illustrate multiple retrosynthetic pathways leading to complex targets and making full use of these capabilities. Alongside with the poster, a demo of Chematica will be provided illustrating its real-time performance.

6:30pm-8:30pm
CINF 45: Mining chemical databases to obtain knowledge based information of non-covalent interactions


Mathew Koebel1 , mathew.koebel@stlcop.edu, Suman Sirimulla1
1 Basic Sciences, St.Louis College of Pharmacy, St. Louis, Missouri, United States

Non-covalent interactions between protein and ligands have attracted great attention recently for their contributions to binding affinities and the possible incorporation in the drug design process. Structural databases including Cambridge Structural Database (CSD) and Protein Data Bank (PDB) were sorted through looking for these various non-covalent interactions. The non-covalent interactions of our search included: Halogen bonds, Carbon bonds, Cation-π interactions, and Sulfur bonds. The findings from this research was compiled and will be presented on.

6:30pm-8:30pm
CINF 46: In silico assessment of toxicity endpoints: Case-studies using CORINA Symphony and ChemTunes Studio


Christof Schwab1 , Joerg Marusczyk1 , Aleksey Tarkhov1 , Thomas Kleinoeder1 , Dimitar Hristozov4 , Bruno Bienfait5 , Oliver Sacher1 , James Rathman34 , rathman.1@osu.edu, Chihae Yang24
1 Molecular Networks GMBH, Erlangen, Germany; 2 Molecular Networks, GmbH, Erlangen, Germany; 3 Ohio State University, Columbus, Ohio, United States; 4 Altamira LLC, Columbus, Ohio, United States; 5 Molecular Networks, Erlangen, Germany

Case studies are presented to demonstrate the use of two new software platforms, CORINA Symphony and ChemTunes Studio, to assess chemical toxicity. CORINA Symphony enables large sets of chemical compounds to be visualized, stored, processed (to generate 3D structures, remove small fragments, identify duplicates, etc.) and to calculate both structural and physicochemical property descriptors. ChemTunes Studio is a knowledgebase comprising experimental in vitro and in vivo toxicity information and in silico models for a series of human health toxicity endpoints, including the key genetic toxicity endpoints: Ames mutagenicity, chromosome aberration, and in vivo micronucleus. ChemTunes Studio consists of multiple components, including chemotype alerts with likelihood prioritization, mechanistically-informed (mode-of-action driven) QSAR models, and comparison of the prediction results to structural analogues. A mathematically rigorous and quantitative weight-of-evidence decision theory approach is used to obtain the final overall assessment, and a quantitative estimation of the uncertainty associated with each prediction is also provided. This comprehensive approach enables the assessment of chemicals in compliance with regulatory requirements. For example, the ICH (International Conference for Harmonization) M7 guideline stipulates that the computational (in silico) toxicology assessment of genotoxic impurities should be performed using two quantitative structure-activity relationship (QSAR) prediction methodologies that complement each other, namely a statistical-based method and an expert rule-based method. ChemTunes Studio provides a single platform that seamlessly integrates computational (QSAR and alerts) and experiment (in vivo and in in vitro assays) results. The case studies illustrate how the integration of statistical methods, expert knowledge, and experimental assay results are combined to develop a robust assessment of chemical toxicity.

6:30pm-8:30pm
CINF 47: Chemogenomics-assisted anti-obesity drug discovery

Rima Hajjo1 , hajjo@email.unc.edu, Alexander Tropsha1 , alex_tropsha@unc.edu
1 Univ of North Carolina, Chapel Hill, North Carolina, United States

Obesity is a complex multifactorial disease. Few drugs with minimal efficacy and serious side effects are currently approved to treat the disease. A chemogenomics approach was used to predict novel chemical-gene-disease connections that could aid in the prioritization of better pharmacotherapy for obesity. A consensus gene signature for obesity was used to query the Connectivity Map (http://www.lincscloud.org/) and the Drug Pair Seeker program (www.maayanlab.net/DPS) to predict drugs and drug combinations that would reverse the obesity gene signature. Concurrently, QSAR models were developed for 5-HT2A, 5-HT2B and 5-HT2C receptors implicated in obesity. All generated models were used for the virtual screening (VS) of the World Drug Index database to identify putative ligands. Five common hits from QSAR/VS and cmap studies, and two drug combinations were prioritized as potential anti-obesity agents. All prioritized compounds will be tested in ligand-binding assays to validate their predicted target affinities.
CINF: Workflow Tools & Data Pipelining in Drug Discovery
8:00am - 10:20am
Monday, August 17

Room 103 - Boston Convention & Exhibition Center
Erin Davis, Tim Dudgeon, Organizing
Erin Davis, Tim Dudgeon, Presiding
8:00am-8:05am Introductory Remarks

8:05am-8:30am
CINF 59: When command line tools meet KNIME: Using the best of the two worlds to support drug discovery teams

Man-Ling Lee1 , man-ling.lee@gmx.net
1 Genentech, San Francisco, California, United States

Genentech’s small molecule drug discovery relies heavily on the outsourcing model. The Genentech model is unique in that we share assay data with contract research organizations to engage these sub teams in drug discovery. It is not unusual that members of a project team are located on three continents. To provide all team members access to up-to-date project data, the Computational Chemistry and Cheminformatics Group has implemented a framework to meet the project specific needs with high return-on-investment.

The talk will showcase the roles that UNIX Command Line tools and KNIME play in providing the flexibility and agility. These are crucial features for a framework for supporting geographically distributed teams and their changing needs while projects transition from early to late stage discovery. The following three examples will demonstrate how to combine and leverage Command Line tools and KNIME. (1) Project Vortex Sessions provide project teams customized views on project data. (2) The DMPK Model Validation application enables teams to check the applicability of the models to new compound series. (3) Organizing fragment hits using the Directed Sphere Exclusion algorithm has the advantage of directing the focus of review teams on the fragment hits most likely to succeed.

8:30am-8:55am
CINF 60: Pipelining in mind: Compound library preprocessing in an interactive workflow

Matthias Hilbig1 , Matthias Rarey1 , rarey@zbh.uni-hamburg.de
1 University of Hamburg, Hamburg, Germany

Preprocessing of chemical libraries is a mandatory step for many computer-based applications in drug discovery. Compound collections for experimental and virtual screening must be gathered from various sources, inspected and filtered. Similar tasks appear in the first-phase analysis of experimental data. Pipelining tools based on graphical programming became very popular for these tasks, they allow to design a standardized work process and apply the required tools in an automated fashion. Especially, if processing individual steps are time-consuming for large amounts of data, this is the method of choice. Due to advances in computer hardware, many classical cheminformatics tasks can nowadays be done in the milliseconds to seconds range. Going away from a classical pipeline towards an interactive tool supporting the individual steps brings several advantages. First and foremost, the chemists can use all his knowledge and expertise to individualize the process. Depending on the concrete application scenario, a first look into the data helps getting an impression of the library to be handled. Filter criteria can be individually defined and applied in a what-if fashion. The consequences of clustering and filtering can be directly analyzed enabling an iterative optimization process. In this talk we present MONA [1], a software tool especially developed for this purpose. MONA handles libraries with up to 1 million compounds allowing to flexibly and interactively browse and manipulate them.
[1] Hilbig, M.; Urbaczek, S.; Groth, I.; Heuser, S.; Rarey, M. (2013). MONA - Interactive manipulation of molecule collections. Journal of Cheminformatics, 5 (38)

8:55am-9:20am
CINF 61: New web based collaborative environment for cheminformatics workflows

Tim Dudgeon1 , tdudgeon@informaticsmatters.com
1 Informatics Matters Ltd, Oxford, United Kingdom

We present a new collaborative drug discovery environment that integrates multiple vendor toolkits into a platform that has been designed with an emphasis on high performance and ease of deployment. Being based on widely used enterprise software tools it allows easy deployment to in-house hardware and public and private cloud environments.
Multiple commercial and open source toolkits are integrated as well as several publicly available sources of chemical and biological data, making it easy to build cheminformatics workflows using best of breed components and to visualize, analyze and manipulate results without the user needing to worry about issues like software installation, software interoperability, format conversions or where the data resides.
A web based front end designed specifically for end user scientists allows data to be generated, processed and analyzed in a very simple and intuitive manner. Data and workflows can be shared with specific collaborators or made public.
The initial functionality is based mostly around lead generation and virtual screening workflows with components for structure database search, virtual library generation, physico-chemical property and toxicology prediction, and 2D and 3D screening and clustering techniques. As such this provides an attractive environment for chemists to operate and collaborate.
9:20am-9:30am Intermission

9:30am-9:55am
CINF 62: Workflows supporting drug discovery against malaria

Barry Hardy1 , barry.hardy@douglasconnect.com
1 Douglas Connect, Zeiningen, Switzerland

The goal of Scientists Against Malaria (SAM) is the discovery of novel anti-malarial compounds. SAM supports virtual drug discovery organizational structures collaborating on target selection and modeling, protein expression and assay development, computational drug design, and screening. A combination of interoperable information systems, ontologies and web services were designed and deployed to manage the data, documents, computational and assay results, activity and toxicology predictions, as well as dashboards to track project progress and to support decision making. Workflows were developed for consensus virtual screening of candidate malarial kinase inhibitors including docking, pharmacophore-based screening and free energy-based molecular simulations. The models were applied to the discovery of active ligands against a novel target with previously unknown structure or ligands. The workflows were extended to include OpenTox model web services to prioritize drug candidates according to their predicted toxicities, supporting a weight of evidence categorization of candidate molecules according to their activity and toxicity profiles.

9:55am-10:20am
CINF 63: Accessing knowledge and design insights from a fully-annotated kinase-focused compound collection

Natasja Brooijmans1 , nbrooijmans@blueprintmedicines.com
1 Blueprint Medicines, Cambridge, Massachusetts, United States

Blueprint Medicines has de novo developed a proprietary kinase-focused compound collection, in an iterative fashion. The full collection has been annotated against a kinase assay panel consisting of over 500 assays resulting in the identification of large collection of potent and selective kinase inhibitors. The wealth of data available for each individual compound creates a unique data analytics challenge, which we have solved using the open-source pipelining package KNIME. In this presentation we’ll cover how we assess kinome coverage by our collection and how we assess which of our library scaffolds are the most productive based on their ability to cover a wide variety of kinases potently and selectively. Matched Pair Analysis (MPA) across all of the assay data simultaneously has also been used to extract unique drivers of selectivity by integration of key binding site structural features into the MPA. A unique bioactivity-based clustering approach has also been developed to enable prioritization of hits and clusters of hits against targets of interest based on multi-kinase target product profiles.
CINF: Retrosynthesis, Synthesis Planning, Reaction Prediction: When Will Computers Meet the Needs of the Synthetic Chemist?
9:00am - 11:50am
Monday, August 17

Room 104A - Boston Convention & Exhibition Center
David Evans, Wendy Warr, Organizing
David Evans, Wendy Warr, Presiding
9:00am-9:05am Introductory Remarks

9:05am-9:30am
CINF 48: What are the next steps in your synthesis? The Reaxys experience

Juergen Swienty Busch1 , juergen@swienty-busch.de
1 Elsevier Information Systems GmbH, Budingen, Germany

“Making Substances for a better life” is what chemists do, and generations of chemists have and continue to work on understanding the influence of certain parameters on the direction and mechanism of chemical reactions. Solving this multi-dimensional problem makes up the art of chemistry. In addition to the human dimension tomorrow’s chemist will requires a thorough knowledge and understanding of the existing reaction landscape. Many attempts have been undertaken to solve this problem using pure algorithmic methods, but this is still in its infancy. At Reaxys we aim to help the chemist solve these problems. Combining the vast array and depth of information contained within the Reaxys database with analysis and visualization tools today’s chemist is provide with just the right information in order to make the best possible decisions on their next synthesis plans. We will show how recent developments in the Reaxys database, searching technologies, analysis and decisions support tools are enabling chemists to make better decision today.

9:30am-9:55am
CINF 49: Green chemistry in synthesis planning systems: A role for biocatalysis data and sustainability metrics?

Peter Johnson1 , p.johnson@leeds.ac.uk, Vilmos Valko1 , Anthony Cook23
1 School of Chemistry, University of Leeds, Leeds, United Kingdom; 3 GSK plc, Stevenage, United Kingdom

CHEM21 is a large European project concerned with the development of novel sustainable or ‘green chemistry’ protocols for pharmaceutical manufacturing processes. It involves a consortium of several Universities as well as many large European pharma companies. A significant part of the project is concerned with the development of a comprehensive reaction database which captures all the relevant details needed for the assessment of the greenness of any new method under development. A unique aspect of the system is that it includes a very detailed treatment of biocatalysed reactions to ensure the inclusion of all the bioinformatics information needed for full characterisation of those reactions. This is important to the project as a whole because much of the laboratory based work of the consortium is concerned with the development of novel biocatalysed reactions for manufacturing processes. Another key feature of the system is the inclusion of metrics calculations for sustainability, which allow quantitative comparisons of the greenness of alternative methods to accomplish a particular transformation.
The data produced by this project should prove a valuable adjunct to that found in current reaction databases and could help to expand the capabilities of knowledge based systems for retrosynthetic analysis and forward reaction prediction. Modified versions of the sustainability metrics calculations could be usefully applied to the evaluation of the relative greenness of alternative synthetic routes proposed by these systems.

9:55am-10:20am
CINF 50: Synthetically accessible virtual inventory (SAVI)

Yuri Pevzner1 , Wolf-Dietrich Ihlenfeldt2 , Marc Nicklaus1 , mn1@helix.nih.gov
1 NCI Frederick, Bldg 376 RM 205, Natl Inst Health Ft Detrick, Frederick, Maryland, United States; 2 Xemistry GmbH, Konigstein, Germany

The Synthetically Accessible Virtual Inventory (SAVI) project aims at computationally generating a very large number of reliably and inexpensively synthesizable screening sample structures that have desirable properties for the drug development process. In an international collaboration, it combines a set of transforms with rich chemical context and a set of highly annotated starting materials, tied together with the chemoinformatics toolkit CACTVS with custom developments for this project. A parser has been implemented as a component for CACTVS to read the original LHASA project transforms written in the CHMTRN/PATRAN language, and to handle the reversal of the original transform direction for the forward-synthetic SAVI project. We have also added a number of computed properties regarded as important in current drug design, to be used at the filtering stage of the SAVI product set generation. The project ultimately aims at creating a database of one billion high-quality screening samples, each annotated with a computer-proposed easy synthetic route, made available in a freely accessible GUI with full structure search capabilities. We present the current early stages of the project, ongoing developments, and preliminary results.
10:20am-10:35am Intermission

10:35am-11:00am
CINF 51: Analyzing success rates of supposedly 'easy' reactions

Roger Sayle1 , roger@nextmovesoftware.com
1 NextMove Software, Cambridge, United Kingdom

Chemists, like insects, come in a bewildering number of varieties and specializations. Traditional retrosynthesis tools are aimed at expert synthetic chemists to assist them with challenging total syntheses, or at process chemists searching for optimal routes via obscure reaction mechanisms. In this talk, we instead consider the role of computer software to support non-experts in synthetic chemistry, such as medicinal and computational chemists. Here the challenge is not in choosing the reaction, but instead preventing silly mistakes with the most widely applied classes of named reactions. Anecdotal experience with the content of pharmaceutical ELNs shows that low yield reactions often correlate with the presence of known incompatible functional groups, such as a second halide in Suzuki couplings.

11:00am-11:25am
CINF 52: Computer-inspired organic synthesis: Building on success

Jonathan Goodman1 , jmg11@cam.ac.uk
1 Dept of Chemistry, Cambridge, United Kingdom

There are many reports in the chemical literature of computational analyses of organic reactions, for which insights, both quantitative and qualitative, have been developed for understanding organic synthesis. We have reported such studies, and also an in-silico inspired total synthesis, for which a computational prediction was tested experimentally some time after its original publication. Despite all of this success, it does not seem to be common for scientists to regard themselves as computer-aided synthetic chemists. By examining how our computational successes have been used, and not used, by synthetic chemists, we can try to find out how to make synthetic chemistry even more effective in the future.


#sketchBINOL

11:25am-11:50am
CINF 53: Using reaction driven de novo design as a “retrosynthetic” analysis tool

Brian Masek1 , brian.masek@certara.com, Stephan Nagy1 , David Baker1 , Roman Dorfman1 , Farhad Soltanshahi1 , Karen Dubrucq1
1 Certara, Saint Louis, Missouri, United States

Reaction driven de novo design uses a library of reactions and a database of reactants to perform a stochastic walk through “synthetic chemistry space.” Guided by appropriate drug design “scoring” such as docking or ligand shape similarity, such an approach provides a means to generate novel ideas for drug candidates that includes a proposed synthesis pathway. We will present studies where reaction driven de novo design software simulations are used to “re-invent” known drug molecules or close analogs. When successful, the simulations also provide a proposed synthesis pathway to the compound under study. Examples which compare the “in silico” synthesis with the known synthetic path will be presented. This approach could provide a novel means of suggesting synthesis schemes for synthetic chemists.
CINF: Enabling Machines to 'Read' the Chemical Literature: Techniques, Case Studies & Opportunities
9:30am - 11:55am
Monday, August 17

Room 104B - Boston Convention & Exhibition Center
Daniel Lowe, Organizing
Daniel Lowe, Presiding
9:30am-9:35am Introductory Remarks

9:35am-10:00am
CINF 54: CHEMDNER-Patents: Automatic recognition of chemical and biological entities in patents


Martin Krallinger2 , Florian Leitner3 , Obdulia Rabal1 , orabal@unav.es, Miguel Vazquez2 , Julen Oyarzabal1 , Alfonso Valencia2
1 Small Molecule Discovery Platform, Center for Applied Medical Research (CIMA) - University of Navarra, Pamplona, Spain; 2 Structural Computational Biology Group, Spanish National Cancer Research Center (CNIO), Madrid, Spain; 3 Computational Intelligence Group, Polytechnic University of Madrid, Madrid, Spain

Efficient access to chemical and biological information contained in patents is a pressing need shared by researchers and patent attorneys from different chemical disciplines, especially in the fields of chemical-biology and medicinal chemistry. The identification and integration of all information contained in these patents in project-based chemical knowledge bases is becoming an extended task with a remarkable impact on drug discovery. Data mining systems can speed up and facilitate the process, making it more systematic and reliable. The use of automatic information extraction and mining technologies can complement handcrafted annotations and extract chemical entities, primary biological targets (genes or proteins) as well as therapeutic applications. Despite its importance, academic research in the area of text mining and information extraction using patent data has been minima.
Based on our previous experience with the CHEMDNER task of BioCreative IV, focused on chemical entity recognition in article abstracts, we moved the scope of the CHEMDNER task of BioCreative V to noisy text data (patents) as well as extended the type of annotated entities to biological targets: genes, gene products, DNA/protein sequence elements and protein families, domains and complexes.
This is the first time that a biomedical text mining community challenge handles patents and could result in software that helps to derive annotations from patents. Results from the participating teams will be discussed, with a major focus on the current challenges for patent annotation compared to articles (http://www.biocreative.org/tasks/biocreative-v/track-2-chemdner/)

10:00am-10:25am
CINF 55: SureChEMBL: An open patent chemistry resource

George Papadatos1 , georgep@ebi.ac.uk, Mark Davies1 , Nathan Dedman1 , Anne Hersey1 , John Overington1
1 EMBL European Bioinformatics Institute, Hinxton, United Kingdom

SureChEMBL (https://www.surechembl.org) is a new resource provided by the European Bioinformatics Institute (EMBL-EBI) that annotates, extracts and indexes chemistry from full text patent documents by means of continuous, automated text and image mining. SureChEMBL is perhaps the only open, freely available, live patent chemistry resource available, in a field that has been traditionally commercial.

Since its launch last September, the SureChEMBL interface provides sophisticated keyword and chemistry-based querying and exporting functionality against a corpus of more than 16 million compounds extracted from 13 million patent documents. Both the interface and the underlying data pipeline leverage a number of technologies for name to structure conversion, as well as compound standardisation, registration and searching.

In addition to providing an overview of the system, recent developments and improvements will be described. These include the introduction of various data interexchange and exporting options, such as flat files and a data feed client. Furthermore, our future plans for the SureChEMBL system will be outlined. To date, such plans include complementing the chemical annotations with biological ones, covering genes, proteins, diseases and indications. Furthermore, we are planning to further enrich the chemical annotations with a relevance score, indicating their importance in the patent document.

10:25am-10:50am
CINF 56: Deuterogate: Causes and consequences of automated extraction of patent-specified virtual deuterated drugs feeding into PubChem

Christopher Southan1 , cdsouthan@gmail.com
1 IUPHAR/BPS Guide to PHARMACOLOGY, University of Edinburgh, Göteborg, Sweden

The strategy of deuterating drugs to improve clinical profiles via the kinetic isotope effect has been known for over 50 years. However, recent development candidates have been predicated on a surge of opportunistic patent filings between 2008 and 2011. For automated chemical named entity recognition (CNER) these present particular challenges. These are investigated in this work by comparing sources of the 80K deuterated compounds inside PubChem. Of these, 45K originate from the patent CNER submissions of SCRIPDB, IBM and SureChEMBL plus 23K from Thomson Pharma via manual expert curation (MEXC). For CNER there are three options, image extraction, recognition of [2H] in IUPAC text forms or Complex Work Unit (CWU) molfiles obtained from the USPTO. For images, conversions to structures using OSRA with explicit H and D positions failed. Tests with chemicalize.org and OPSIN established that text “deuterio” did convert. The SureChEMBL pipeline also handles the “dx” prefix (e.g. methyl-d3). These tests, combined with inspection of SureChEMBL export records, confirmed that deuteration feeding into PubChem from patents was predominantly image-only derived. It was also clear that CWUs had provided the majority of these via molfiles. However, despite conceptually simillar CNER pipelines the three CNER sources showed divergent capture. Importantly, inspection of patents from the three major applicants in the deuteration IP Gold Rush indicated little reduction to practice. The unexpected consequences are that most of ~25K derivatives in PubChem of ~500 established drugs. are virtual, (i.e. the structures do not exist). This achilles heel of CNER will be discussed, since it presents database users with the dilemma between virtual swamping but possible IP significance on the one hand, verses the permanent absence of linked bioactivity data on the other.
10:50am-11:05am Intermission

11:05am-11:30am
CINF 57: Evaluating US patent full text documents with chemical ontologies

Lutz Weber1 , lutz.weber@ontochem.com
1 IT, OntoChem, Germering, Germany

Chemical ontologies represent abstractions of chemical compounds - providing structural as well as functional and chemical property classifications. There is a increasing interest to automatically classify chemical compounds in databases and in text documents to allow for chemical class based search indexes.
In the present paper we will present strategies to automatically classify chemical compounds based on their names and chemical structure using a chemical ontology derived from the pure lexical variants MeSH and ChEBI but incorporating SMARTS and chemical calculation based logic. In the talk we will describe the development of this ontology - comprising also functional classifications and material science terms such as alloys and polymers.
10 years of US patents and patent applications have been used to extract mentions of chemical compounds, substances, chemical classes and chemical groups. Subsequently, these chemical terms were classified by the chemical ontology and transforming 2 billion found class mentions into an ontology search index in a Lucene based browser application. This index was than used to perform an evaluation of the frequency of found chemical classes per time period, giving indications on the focus of general chemical reseach activities and recent trends in patenting strategies.
The data set is freely available for further investigations and shall be used to train and develop further the use, quality and interchangeability of chemical ontologies.

11:30am-11:55am
CINF 58: Text-mining to produce large chemistry datasets for community access

Antony Williams2 , tony27587@gmail.com, Daniel Lowe1 , Igor Tetko3 , Carlos Coba4 , Valery Tkachenko2 , Alexey Pshenichnov2 , Ken Karapetyan2
1 NextMove Software, Cambridge, United Kingdom; 2 Technology, Royal Society of Chemistry, Wake Forest, North Carolina, United States; 3 HelmholtzZentrum München, Munich, Germany; 4 Mestrelab Research, Santiago de Compostela, Spain

While in an ideal world all data would be deposited by the producing scientist directly into a database, in the real-world most chemical data is instead presented in a form designed for human rather than machine consumption. Text mining has the potential to extract this data back into a computer understandable form. As all United States patents are available free of charge they make the perfect corpus for extracting a large number of experimental properties of compounds, and chemical reactions.
We report on our text-mining activities to extract millions of textual NMR spectra, hundreds of thousands of physicochemical properties (with their associated compounds) and over a million chemical reactions. All extracted results are to be deposited into online databases allowing the community to benefit from the results of this work.
Using Mestrelab Research’s MNova product we have converted the textual NMR spectra to graphical spectra, and validated each spectrum against its associated chemical structure so as to detect cases where the NMR spectrum could not be produced by the associated structure.
In the case of melting points the resultant dataset, of over a quarter of a million melting compound/temperature relationships, is the largest public dataset the authors are aware of. We have used this dataset to produce a predictive model with results comparable to those of manually curated datasets. Our experiences with modelling this data has demonstrated that we are working at the edge of current algorithmic and computing capabilities for predictive model building, with the resultant matrix containing over 200 billion descriptors. The melting point model and the data it was derived from are available freely from http://www.ochem.eu.
CINF: CINFlash: Workflow Tools Lightning Round
10:30am - 12:00pm
Monday, August 17

Room 103 - Boston Convention & Exhibition Center
Erin Davis, Organizing
Erin Davis, Presiding
10:30am-10:35am Introductory Remarks

10:35am-12:00pm
CINF 64: CINFlash: Workflow tools lightning round

Erin Davis1 , erinbolstad@gmail.com
1 John McNeil & Co, Seattle, Washington, United States

Join us for a lightning round of talks and demos about workflow tools! PerkinElmer, ChemAxon, CDD, Schrodinger, Optibrium, XEChemistry, OpenPHACTS, John McNeil & Co, academic researchers, and many more will present 5 min talks and demos about their various workflow tools to help make your informatics research more efficient.
CINF: Retrosynthesis, Synthesis Planning, Reaction Prediction: When Will Computers Meet the Needs of the Synthetic Chemist?
1:30pm - 4:40pm
Monday, August 17

Room 104A - Boston Convention & Exhibition Center
David Evans, Wendy Warr, Organizing
David Evans, Wendy Warr, Presiding

1:30pm-1:55pm
CINF 65: SynTree, chemical synthesis on a PC

John Figueras1 , jjfigueras@gmail.com
1 Retired, Orleans, Massachusetts, United States

SynTree is an organic synthesis program based on the idea that a retrosynthetic chemical reaction can be represented as a picture and a series of graphics operations that change one picture (a target structure) into another (a precursor). In this environment, a transform is a combination of a substructure and a set of graphics operations linking the substructure and a precursor. The graphics operations are basic actions such as erase a bond, replace an atom, change bond type, etc. The substructure, mapped onto a goal structure, selects those atoms and bonds in the goal to be altered by the graphics operations to produce a precursor. Using a generated precursor as a new goal allows generation of a synthesis tree.

The SynTree Suite is comprised of three programs: 1) SynTree accepts user input of a chemical structure as a primary goal and generates precursors; 2) Transform allows the user to add transforms to the database; 3) IPlist creates a list of functional groups that might occur as interferences in a given transform. The programs are written for a Macintosh computer operating under OSX, version 6.0 and above, and will be available as freeware. The intended audience is undergraduate chemistry students.

1:55pm-2:20pm
CINF 66: Empowering chemists in synthesis planning – lessons from the evolution of ARChem

Orr Ravitz2 , orr.ravitz@gmail.com, Anthony Cook3 , Zsolt Zsoldos1 , Peter Johnson3
2 John Wiley & Sons, Toronto, Ontario, Canada; 3 School of Chemistry, University of Leeds, Leeds, United Kingdom

The question of “when” will computers meet the needs of synthetic chemists has been answered decisively in recent years (the answer is now). This is a result of significant advances in computer-aided synthesis design approaches, as well as a consequence of the increasing role computers play in every aspect of research in general. In the meantime, the focus of the work in this domain has already shifted to the question of “how” or even “how best” to address those needs. While the chemists’ scientific intuition, mechanistic understanding of reactions and strategic perspective will remain unmatched for years to come, computers already empower chemists to consider novel synthetic alternatives that are beyond their areas of expertise. They assist in looking for shorter routes as well as more efficient and more cost-effective synthetic approaches. The capacity of the machine to run fast, comprehensive and unbiased searches thus complements the ingenuity and judgement of the chemist. ARChem, which derives rules from reaction databases and employs them in retrosynthetic analysis is now used routinely by chemists in pharmaceutical companies both in discovery and in process development. In this talk we will share some of the lessons, both positive and negative, learned in the development of this system. Among the topics covered will be the necessary means to achieve sufficient chemical accuracy, primarily in the generation of the ruleset, which requires elaborate chemical perception algorithms as well as manual curation. We will discuss the challenges of prioritizing and displaying solutions, linking them to the literature examples and how users can influence the results both before and after the search. We will provide an overview of current usage scenarios as well as a glimpse into the future of this field as we envision it.

2:20pm-2:45pm
CINF 67: Computer-aided synthesis design (CASD) and forward reaction prediction tools for both idea generation in new synthesis route planning and for de novo molecule design

Valentina Eigner Pitto1 , ve@infochem.de, Fernando Huerta2 , Mike Hutchings1 , Heinz Saller1 , Peter Loew1
1 InfoChem GmbH, Munich, Germany; 2 Chemnotia AB, Södertälje, Sweden

This presentation will describe how methods based on specifically designed transform libraries, automatically generated from reaction databases, can be used in computer aided synthesis design for retrosynthesis as well as for forward reaction prediction. The precursor or product generation process is based on conceptual chemistry and the degree of complexity introduced in the suggested structures can be modulated using specific parameters.
New ideas will be described for route design in a series of commercial pharmaceutical targets predicted by the CASD tool ICSYNTH and related to those from historical brainstorm results, as well as literature data. In the conceptually opposite direction, the new ICFRP tool enables de novo design of synthetically feasible molecules via integration into different workflows to calculate or predict other important physicochemical properties of the newly suggested molecules.

2:45pm-3:10pm
CINF 68: Chematica – the Deep Blue of chemistry

Bartosz Grzybowski12 , nanogrzybowski@gmail.com
1 Chemistry, UNIST, Ulsan, Korea (the Republic of); 2 Institute of Organic Chemistry of the Polish Academy of Sciences, Warsaw, Poland


The rise of chess playing programs has aroused considerable interest and the victory of Deep Blue over Garry Kasparov hit the headlines all over the world – for the first time, a machine outperformed a human in a game that was considered the pinnacle of human intellect. Yet, another “game” – with infinitely more economical and societal impact than chess – has so far proven too hard for computers to tackle in any meaningful way. This “game” is to teach computers how to plan the synthesis of organic chemicals, new drugs, new pigments, or new organic-electronic materials. To illustrate the complexity of the problem, suffice it to say that whereas chess uses six different types of pieces and on the order of ten rules for how they can move, organic chemistry needs to consider numbers of different molecules that are comparable to the number of atoms in the universe and O(10,000) rules rules/”moves” that specify how the molecules can react. For over a decade now, our group has been developing a family of algorithms collectively known as Chematica that solve this grand challenge of computer-planned organic synthesis. In my talk I will discuss the theoretical basis of our approach and will then run a live demo of Chematica so that the audience can see its performance in real time. Please come see how a computer can be put to work to solve chemical riddles and help a practicing organic chemist in his/her everyday work.
3:10pm-3:25pm Intermission

3:25pm-3:50pm
CINF 69: Reaction mining with condensed graphs of reactions: Problems and perspectives


Alexandre Varnek1 , varnek@unistra.fr
1 Chemistry, University of Strasbourg, Strasbourg, France

Chemical reaction represents a difficult object because it involves several molecular graphs describing both reactants and product. On the other hand, most of chjemoinformatics approaches were developed for individual molecules. Our idea is to use Fujita-Vladutz representation of reaction as one sole graph – Condensed Graph of Reaction (CGR). In such a way, chemical reaction is represented as a pseudo-molecule (see Figure) for which molecular descriptors can be generated and further used in different chemoinformatics tasks. In this presentation, we discuss an application of the CGR approach to predictive modeling of kinetic and thermodynamic parameters of reactions, optimal reaction conditions, regioselectivity of enzymatic reactions and reaction mining in large databases.


References
1. A. Varnek, D. Fourches, F. Hoonakker, V. P. Solov’ev J. Computer-Aided Molecular Design, 2005, 19, 693-703
2. F. Hoonakker, N. Lachiche, A. Varnek, A. Wagner Int. J. Artificial Intelligence Tools, 2011, 20, (2), 253-270
3. C. Muller, G. Marcou, D. Horvath, J. Aires-de-Sousa, A. Varnek J. Chem. Inf. Model. 2012, 52 (12), 3116–3122
4. G. Marcou, J. Aires de Sousa, D. Latino, A. Deluca, D. Horvath, V. Rietsch, and A. Varnek J. Chem. Inf. Model., 2015, DOI: 10.1021/ci500698a




3:50pm-4:15pm
CINF 70: Assessment of optimal conditions for selective deprotection reactions resulted from analysis of large reaction database

Timur Madzhidov1 , tmadzhidov@gmail.com, Arkadii Lin12 , Igor Antipin1 , Olga Klimchuk2 , Alexandre Varnek2
1 A.M. Butlerov Institute of Chemistry, Kazan (Volga region) Federal University, Kazan, Russian Federation; 2 Chemistry, University of Strasbourg, Strasbourg, France

Protection/deprotection reactions play an important role in synthetic organic chemistry. A key problem is to choose optimal experimental conditions (catalyst, solvent, additives, etc) leading to selective deprotection of a given group in particular environment. Up to now, for this purpose chemists use reactivity charts from a famous the Green’s book [1] which has become a recognized guide in the chemistry of protecting groups. On the other hand, these reactivity charts resulted from manual analysis of relatively small amount of data and therefore may miss important information hidden in large reaction databases.

In this presentation we report statistical analysis of protecting groups stability in hydrogenation conditions using large dataset of reactions (142.111 reactions) extracted from Reaxys database. For this purpose, we built a workflow involving numerous in-house tools for reaction data processing based on the Condensed Graph of Reaction (CGR) approach [2]. At the first step, raw reaction data were curated, normalized and annotated thus forming well-structured database. Its analysis clearly shows some disagreements with the Green’s reactivity charts in respect to (i) reactivity of particular protective groups and (ii) selectivity of deprotection of a given group in presence of other groups or chemical functions.

We have developed a prototype of an expert system able to provide chemist with detailed recommendations of experimental conditions leading to desirable chemical transformations. This tool implements CGR-based similarity searching to the reaction database issued from raw data processing and could be easily implemented in any database system.

The work was supported by the grant of Russian Scientific Foundation (No 14-43-00024).

References:
[1] Peter G. M. Wuts, Theodora W. Greene. Greene's Protective Groups in Organic Synthesis / Edition 4, Wiley, 2006
[2] Varnek A., Fourches D., Hoonakker F., Solov_ev V.P. // J. Comput. Aided. Mol. Des. 2005, 19, 693 – 703.

4:15pm-4:40pm
CINF 71: Energy refinement of reactive molecular dynamics pathways

Lee-Ping Wang3 , officer.ping@gmail.com, Robert McGibbon4 , Vijay Pande1 , Todd Martinez2
1 Stanford University, Stanford, California, United States; 2 Chemistry Department, Stanford University, Stanford, California, United States; 3 Chemistry, UC Davis, El Cerrito, California, United States; 4 Chemistry, Stanford University, Palo Alto, California, United States

We describe an approach for identifying and characterizing the reaction events in a molecular dynamics (MD) simulation, applicable to ab initio simulations on a large scale (hundreds of atoms) containing complex and multi-molecular reaction events. A key aspect of this approach is smoothing of the reactive MD trajectory in internal coordinates to provide an initial guess for the reaction path. The smoothed pathway is used initiate a search for the minimum energy pathway connecting reactants and products on the potential surface using standard approaches, such as nudged elastic band or the string method. Our approach is applied to analyze the reaction events in an ab initio nanoreactor simulation that discovers new molecules and mechanisms, including a C-C coupling pathway for glycolaldehyde synthesis and pathways for glycine synthesis from simple inorganic compounds.



CINF: The Growing Impact of Openness in Chemistry: A Symposium in Honor of JC Bradley
1:00pm - 5:50pm
Monday, August 17

Room 103 - Boston Convention & Exhibition Center
Andrew Lang, Antony Williams, Organizing
Andrew Lang, Antony Williams, Presiding
1:00pm-1:05pm Introductory Remarks

1:05pm-1:25pm
CINF 78: Contributions of Jean-Claude Bradley to the vision and execution of Open Notebook Science

Antony Williams12 , tony27587@gmail.com, Andrew Lang3
1 Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina, United States; 2 ChemConnector Inc., Wake Forest, North Carolina, United States; 3 Oral Roberts University, Tulsa, Oklahoma, United States

For the majority of chemists driven to share their data openly Jean-Claude (JC) Bradley will forever be the father of Open Notebook Science (ONS). Having coined the term in 2006 JC spent the next few years working to educate us in the ONS philosophy, developing tools, systems and even games for the community. This presentation will provide an overview of JC’s contributions to Open Notebook Science by examining some of the various projects that he conducted with his students and collaboratively. The outcome of the work he led includes a number of open datasets that have been used by scientists around the world, online prediction models that are accessible to anyone, games that have been played by thousands of scientists and an enormous collection of presentations, video tutorials and Open Courseware that will likely be references for the Open Notebook Science movement for many years to come.

1:25pm-1:45pm
CINF 79: Making it open: Putting cheminformatics to use against the Ebola virus

Sean Ekins1 , ekinssean@yahoo.com
1 Collaborations In Chemistry, Fuquay Varina, North Carolina, United States

In 2014 Africa faced the most severe Ebola Virus epidemic to date. With no treatment in sight, limited knowledge of the virus and prompted by discussions on Twitter, I set about to see how much could be done without a lab to test compounds. In the past couple of years there have been multiple high throughput screens in an effort to find compounds active against the Ebola Virus, the challenge however, is that for most molecules the mechanism or a target is unknown. I focused initially on two antimalarials (amodiaquine and chloroquine) and two selective estrogen receptor modulators (clomiphene and toremifene). I used several cheminformatics approaches to propose a common pharmacophore, receptor-based pharmacophores and docking. Several database searches with the pharmacophores were made available on FigShare and eventually led to a paper on the work. With collaborators we also summarized what could be done for future outbreaks as well as highlighted that FDA approved drugs were apparently active at physiologically relevant concentrations. In addition we curated a literature dataset of compounds shown to be active in vitro or in vivo against the Ebola Virus and used an experienced medicinal chemist to evaluate the molecules. Most recently two-pore channels were shown as key for viral entry into host cells and seven small molecules tested as inhibitors of infection. These molecules were used to create a common feature pharmacophore that was compared with the previously published pharmacophore, suggesting the 4 compounds may also be targeting this channel. Searching the literature dataset with the two-pore channel pharmacophore suggested additional molecules that might share the same mechanism. To date this cheminformatics work has resulted in 4 manuscripts published in F1000Research as well as data and models that have been shared openly.

1:45pm-2:05pm
CINF 80: Opening up and connecting up antimalarial data: Progress but with caveats


Christopher Southan1 , cdsouthan@gmail.com
1 IUPHAR/BPS Guide to PHARMACOLOGY, University of Edinburgh, Göteborg, Sweden

Among JCBs achievements his work on Open notebook science (ONS) has not only perhaps the largest impact but the ripple effect continues to broaden. This is particularly the case in Open Source Drug Discovery (OSDD) where ONS is a natural fit. This presentation will review the “findability” of new antimalarial drug discovery data. While antimalarials are very much a poster child for OSDD the patterns of result disclosure and practical extent of openness varies widely. This recent blogpost
http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html describes “digging out” 26 antimalarial leads to add to a new MMV pathogen box. The difficulties associated with this task will be outlined. In particular, examples are still emerging from conventional (i.e. closed) drug discovery operations, even to the extent of finding patent-only lead compounds. Even for the academic groups that do publish papers, examples show the system can be slow and patchy in getting the structures surfaced in database records. This may not happen at all if MeSH curation fails to index the lead compound in PubChem so curation of paper is necessary. This slowness contrasts with the Sydney University Open Source Malaria project (OSM http://opensourcemalaria.org/) with its declared open source principles. It thus comes closest to ONS in that they and their collaborators endeavour to surface results in close to real time. Technical aspects of extracting the information from open web instantiations will be described including the use of SMILES, InChI strings and Keys. The latter comes close to a perfect ONS vehicle for chemistry since it makes an explicit chemical structure globally “findable” literally within minutes of being written into a blogpost, via a search taking ~0.3 seconds (PMID 23399051). Because JCBs ideas still need wider implementation issues around improving connections between papers, patents, database entries, OSM data and potential new box inclusions will be discussed.

2:05pm-2:25pm
CINF 81: Context of crowdsourcing: A driver of organizational openness?

David Thompson1 , d.c.thompson.00@gmail.com, Jorg Bentzien2
1 Public Affairs and Communications, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, Connecticut, United States; 2 Medicinal Chemistry, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, Connecticut, United States

The term ‘crowdsourcing’ typically refers to the use of a web-based platform to engage a diverse community of participants in the solving of a problem. A broad array of problems is amenable to this methodology, which, at the highest level, can be grouped into having objective or subjective solutions (Boudreau 2013; Brabham 2008).

The use of crowdsourcing platforms to build new, or improve upon existing, computational models have caught traction within the computational chemistry community finding use in both academia and industry (Lessl et al. 2011; Bentzien et al. 2013; Bradley et al. 2009; Lee et al. 2014; Lakhani et al. 2013).

Despite it’s proven success it is not clear whether this process for internalizing external innovation will become a standard tool in the modeler’s toolkit, and in this short contribution we will specifically explore the role of context (Murcko 2012). Through crowdsourcing a solution to a modeling problem, an organization has tasked the solving of it’s problem to participants working in an independent and different context. This carries an overhead that is not inconsequential, and merits consideration.

Specifically, we will examine the important interplay of internal modeling infrastructure and the independent choice of tools used to construct a solution. We will connect this discussion to the continued exhortation for Open Science (Royal Society (Great Britain). and Policy Studies Unit. 2012) and Data Sharing (Warr 2014), and examine what this might mean for the broader computational chemistry community. We will also explore the role of community observation as a solution unfolds, and the connection to organizational learning (“Lessons from the Netflix Prize Challenge” 2014).

Bentzien, J. et al. 2013 Drug Disc. Today 18 (9): 472–78.
Boudreau, K.J. and Lakhani, K.R. 2013 Harvard Business Rev. 91 (4): 60–69.
Brabham, D. C. 2008. Convergence 14 (1): 75–90.
Bradley, Jean-Claude et al. 2009. J. Cheminform. 1 (1): 9.
Lakhani, K.R. et al. 2013 Nat. Biotechnol. 31 (2): 108–11.
Lee, J. et al. 2014 Proc. Natl. Acad. Sci. U.S.A. 111 (6): 2122–27.
Lessl, M. et al. 2011 Nat. Rev. Drug Discov. 10 (4): 241–42.
“Lessons from the Netflix Prize Challenge.” 2014. http://dl.acm.org/citation.cfm?id=1345465.
Murcko, M.A. and Walters, W.P. 2012 J. Comput. Aided Mol. Des. 26 (1): 97–102.
Royal Society (Great Britain)., and Policy Studies Unit. 2012. Science as an Open Enterprise.
Warr, W.A. 2014 J. Comput. Aided Mol. Des. 28 (1): 1–4.
2:25pm-2:35pm Intermission

2:35pm-2:55pm
CINF 82: Promoting, supporting, and incentivizing openness in scientific research

Sara Bowman1 , sed8n@virginia.edu
1 Center for Open Science, Charlottesville, Virginia, United States

The non-profit Center for Open Science (COS) pursues its mission of increasing the openness, integrity, and reproducibility of scientific research through 3 primary areas of focus: building infrastructure to support the research workflow, supporting communities of researchers and stakeholders invested in open science practices, and conducting metascience research to understand best practices that lead to reproducibility. This talk will touch on initiatives in all three of those areas.
COS’s flagship infrastructure project, a free and open source web application called the Open Science Framework (OSF), helps researchers manage their entire research workflow and share their process and product. Features like automatic file versioning and logging of actions streamline research workflows and make the research process more transparent. The OSF can be used privately, among collaborators, or opened to the general public with just the click of a button. Every resource, project, and contributor on the OSF is given a persistent, unique identifier, which allows work to be cited and researchers to earn credit for their contributions. The OSF represents a technical solution for researchers wishing to increase the openness of their work.
The community-building efforts of COS support researchers, journal editors and publishers, and funders interested in enhancing openness and reproducibility of research. In partnership with the Berkeley Initiative for Transparency in the Social Sciences (BITSS) and SCIENCE magazine, COS convened a meeting of stakeholders to write the Transparency and Openness Promotion (TOP) Guidelines. The TOP Guidelines are a set of author guidelines which can be readily adopted by journals across disciplines, with the intent of increasing transparency of the research process and product. This talk will provide an overview of the guidelines, an update on the adopting journals, and provide more information on how journals in the chemical sciences can participate to enhance their own transparency standards.
Finally, the metascience work of COS seeks to understand best practices that make science reproducible. This talk will detail some work the Center is undertaking to investigate the effects of incentivizing data and materials sharing and its effect on open science and reproducibility.

2:55pm-3:15pm
CINF 83: OpenTox - an open community and framework supporting predictive toxicology and safety assessment

Barry Hardy1 , barry.hardy@douglasconnect.com
1 Douglas Connect, Zeiningen, Switzerland

One important goal of OpenTox is to support the development of an Open Standards-based predictive toxicology framework that provides a unified access to toxicological data and models. OpenTox supports the development of tools for the integration of data, for the generation and validation of in silico models for toxic effects, libraries for the development and integration of modelling algorithms, and scientifically sound validation and reporting routines.

The OpenTox Application Programming Interface (API) is an important open standards development for software development purposes. It provides a specification against which development of global interoperable toxicology resources by the broader community can be carried out. The use of OpenTox API-compliant web services to communicate instructions between linked resources with URI addresses supports the use of a wide variety of commands to carry out operations such as data integration, algorithm use, model building and validation. The OpenTox Framework currently includes, with its APIs, services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, reporting, investigations, studies, assays, and authentication and authorisation, which may be combined into multiple applications satisfying a variety of different user needs. As OpenTox creates a semantic web for toxicology, it should be an ideal framework for incorporating toxicology data, ontology and modelling developments, thus supporting both a mechanistic framework for toxicology and best practices in statistical analysis and computational modelling.

In this presentation I will review the recent OpenTox-based development of applications including the ToxBank data infrastructure supporting integrated analysis across biochemical, functional and omics datasets supporting the safety assessment goals of the SEURAT-1 program which aims to develop alternatives to animal testing.

Finally, I will provide an overview of the working group activities of the newly formed OpenTox Association which aim to progress the development of open source, data, standards and tools in this area.

3:15pm-3:35pm
CINF 84: Topliss batchwise scheme reviewed in the era of Open Data

Lars Richter2 , Gerhard Ecker1 , gerhard.f.ecker@univie.ac.at
1 Dept Medicinal Chemistry, Wien, Austria; 2 Pharmaceutical Chemistry, University of Vienna, Vienna, Austria

In 1977 Topliss [1] introduced a pragmatic procedure using the tabulated potency ranking within a series of five compound to infer general physicochemical potency trends in this series. The uncovered trends are deduced by comparing the potency ranking within the compound series varying in its 3,4-phenyl substitution pattern (Fig. 1), with rankings sorted by calculated physicochemical (pi for hydrophobicity, sigma for electronic and Es for steric effects) properties and combinations of them (Table 1). For ten potency rankings, underlying physicochemical schema are proposed.

Recent open data initiatives such as ChEMBL and PubChem allow to assess the general applicability of the Topliss approach. Querying the ChEMBL database gave 120 compound series compliant with the Topliss substitution pattern tested in the same bioassay against the same target. Thirty eight of these series could be assigned to one or more of the ten rankings proposed by Topliss (Table 1). As expected, rankings driven by lipophilicity are highly populated. Surprisingly, not a single following the minus sigma ranking could be identified.

Subsequently, querying each of the thirty eight series for potency data for substituents proposed by Topliss for follow up studies showed, that the Topliss scheme worked quite well in predicting substituents that are more active than the unsubstituted parent compound (Table 1).

[1] Topliss, J. G. A manual method for applying the Hansch approach to drug design. J. Med. Chem. 1977, 20, 463-469.


,





3:35pm-3:55pm
CINF 85: Anatomy of a chemical reaction: Dissection by machine learning algorithms

Alex Clark1 , aclark.xyz@gmail.com
1 Independent, Montreal, Quebec, Canada

Lab notebooks for synthetic chemistry are increasingly being incorporated into the corpus of open science, which means that experimental results become available to the entire community, without restriction, with only a short delay after the completion of the work. In many cases the data is being prepared using the same tools as are appropriate for print-ready documents, which are not necessarily well designed for analysis by machine learning algorithms.

The issue is a timely one, since we are at a point when the quantity of openly available data is rapidly increasing. Even with the bottlenecks imposed by peer reviewed publication, it is increasingly necessary to defer to cheminformatics algorithms to sift through an ocean of data, much of which needs to be explicitly curated prior to use. Taking the effort to prepare open notebook science data in a way that is equally readable by humans and machines is a way to greatly magnify the impact of research, since it can be made immediately available to databases designed for searching, as well as algorithms designed for drawing inferences on a scale that is not possible by individual scientists.

This talk will describe the core datastructures required to represent a chemical synthesis in a way that makes perfect and complete sense to cheminformatics software, and is also able to be rendered graphically in a way that is consistent with the aesthetics that scientists expect in the literature. It will address the issues that arise when using diagram conventions that do not map to the alphabet of chemical objects, such as atom symbols, bond orders, charges, stereochemistry, component roles, reaction conditions, stoichiometry, etc. Products that adhere to these principles, such as the Green Lab Notebook app, will be described in order to demonstrate the functionality.

3:55pm-4:15pm
CINF 86: Cheminformatics OLCC

Robert Belford4 , rebelford@ualr.edu, David Wild8 , Leah McEwen2 , Antony Williams3 , Stuart Chalk6 , Jennifer Muzyka1 , John Penn7 , Jon Holmes5
1 Chemistry Dept, Centre College, Danville, Kentucky, United States; 2 Clark Library, Cornell University, Ithaca, New York, United States; 3 Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina, United States; 4 Univ of Arkansas at Little Rck, Little Rock, Arkansas, United States; 5 Univ of Wisconsin, Madison, Wisconsin, United States; 6 Department of Chemistry, University of North Florida, Jacksonville, Florida, United States; 7 Chem Dept, West Virginia Univ, Morgantown, West Virginia, United States; 8 Informatics and Computing, Indiana University, Bloomington, Indiana, United States

Jean-Claude Bradley was not only a scientist, but an educator who pioneered the use of many online educational technologies that have had an impact far beyond the students in his classes. This presentation will describe the Cheminformatics OLCC, a hybrid online/f2f intercollegiate cheminformatics course that will be offered on 5 campuses in the Fall of 2015. This course is an introductory course targeting undergraduate chemistry majors and will develop open TLOs (Teaching and Learning Objects) that faculty across the world can freely use to integrate cheminformatics into their traditional classes. Jean Claude was part of this project and was going to function as both an intercollegiate lecturer and as a facilitator for a class at Drexel. This presentation will describe this course from the perspectives of content and implementation.
4:15pm-4:25pm Intermission

4:25pm-4:45pm
CINF 87: PubChem project and annotations

Jian Zhang1 , jiazhang@ncbi.nlm.nih.gov, Paul Thiessen1 , Sunghwan Kim1 , Asta Gindulyte1 , Renata Geer1 , Evan Bolton1
1 NCBI-NLM/NIH, Bethesda, Maryland, United States

PubChem is a public repository for chemical information and related bioassay screening data. First released in 2004, PubChem serves the biomedical and chemistry communities and is heavily used (averaging more than one million web pages per day and more than one million unique users per month). More than 300 contributors provide nearly 200 million chemical substance descriptions. These substances yield nearly 70 million unique structure compounds. PubChem provides basic chemical information, such as structure, formula, SMILES, InChI, etc. In addition, PubChem includes annotation information, including physical and chemical properties, safety and hazard information, toxicity, drug and related classifications. This presentation discusses integration of annotation information and new functions added to PubChem services to support their access.

4:45pm-5:05pm
CINF 88: Open Spectral Database: Open data, open code, open concept

Stuart Chalk1 , schalk@unf.edu
1 Department of Chemistry, University of North Florida, Jacksonville, Florida, United States

In honor of JC Bradley, and the spirit of openness that he inspired, a new online resource called the Open Spectral Database (http://www.osdb.info/) is now available. Built using open source tools, using open code, and open to community input about design and functionality, the OSD is available for anyone to submit spectral data and make it available to the scientific community.

This paper will detail the initial concept and coding, internal architecture, data formats, REST API and options for submission of data.

5:05pm-5:25pm
CINF 89: DeepLit WikiHyperGlossary

Michael Bauer1 , mbauer2@uams.edu, Andrew Cornell2 , Dan Berleant3 , Robert Belford2
1 Myeloma Institute, UAMS, Little Rock, Arkansas, United States; 2 Chemistry, UALR, Little Rock, Arkansas, United States; 3 Information Sciences, UALR, Little Rock, Arkansas, United States


The “Google Age” of instant access for novice readers of expert-expert level documents has created new information literacy challenges. Shallow browsing of multiple documents is a common outcome when novice readers try to understand expert level documents. This presentation will describe the DeepLit WikiHyperGlossary, a Deeper Literacy project designed to enhance reading comprehension by connecting documents to data and discourse. This technology actually grew out of discussions during the 2006 ConfChem, when a paper on the MSDS hyperglossary was concurrently discussed with Jean-Claude Bradley’s paper, “Expanding the role of the organic chemistry teacher through podcasting, screencasting, blogs, wikis and games”. The result of the discussion was why not create a WikiHyperGlossary”? This presentation will describe the programming behind this social and semantic web information litetacy technology that can support both open science and open education.

5:25pm-5:45pm
CINF 90: Changing landscape of scientific publishing: Open access, open data, and more

Charlotte Hollingworth1 , charlotte.hollingworth@springer.com
1 Chemistry, Springer-Verlag London, London, United Kingdom

The support for open access in chemistry has grown tremendously in recent years, with authors benefiting from the increased visibility of their work, expert peer review and retaining the copyright of their work so that it can be redistributed freely. Scientists also now have access to large amounts of data from open access repositories, the use of open lab notebooks and publication of big data. As one of the largest STM publishers, Springer has several options to support authors in their quest for openness in science. These options and the continuing developments that publishers are implementing will be discussed.
5:45pm-5:50pm Concluding Remarks
CINF: Enabling Machines to 'Read' the Chemical Literature: Techniques, Case Studies & Opportunities
1:30pm - 4:15pm
Monday, August 17

Room 104B - Boston Convention & Exhibition Center
Daniel Lowe, Organizing
Daniel Lowe, Presiding

1:30pm-1:55pm
CINF 72: Identifying chemical species in combustion models

Richard West1 , r.west@neu.edu
1 Department of Chemical Engineering, Northeastern University, Boston, Massachusetts, United States

Detailed kinetic models have become integral to combustion research over the last 40 years. The latest models can explain many complicated combustion phenomena and offer increasingly accurate simulations for novel engines and fuels. These models can be very large (eg. the LLNL model for 2-methylalkanes has over 7,000 species and 30,000 reactions) and there are now dozens of large published models. Unfortunately, these ever-proliferating detailed kinetic models are usually incompatible and inconsistent, are seldom compared directly, and often contain undetected mistakes.

The usual publication format remains a “Chemkin file” for use in compatible simulation tools. This format, devised in the 1970’s when input was limited by the width of 80-column punch-cards, forces model-builders to abbreviate species’ names, thereby losing their chemical identity, and to discard other metadata. The main challenge in comparing these models is in recognizing, for example, that the name “C3KET12” in one model represents 1-hydroperoxypropan-2-one, which another research group may have named “CH3COCH2O2H” in a different model.

The clues available to determine the molecule corresponding to a given nickname are: the name itself (often cryptic and sometimes misleading), thermochemical data (usually estimated), and a list of reactions (usually incomplete) in which the species participates, connecting it to other species in the model (that are also initially unknown).

We have developed tools to facilitate the identification of chemical species in a kinetic model “Chemkin file', and then to allow comparison of the models. The tools are built on top of the open source Python version of Reaction Mechanism Generator software (RMG-Py), originally designed to create detailed kinetic models of its own. By comparing reported reactions with its own predicted reactions between already-identified species, it is able to propose new species and eliminate unlikely matches. A web-based user interface allows a team of humans to quickly review the evidence and confirm or block the proposed matches.

We will present how the tool works, opportunities for improvement, and some findings from analyzing recent publications in the combustion literature. This material is based upon work supported by the National Science Foundation under Grant No. 1403171.

1:55pm-2:20pm
CINF 73: Text mining the chemical literature to find chemicals in context

Tong-Ying Wu1 , tony.wu@linguamatics.com, Andrew Hinton2 , David Milward2
1 Linguamatics, Westborough, Massachusetts, United States; 2 Linguamatics, Cambridge, United Kingdom

Techniques for extracting chemicals from documents have improved considerably during the last few years. However, we are often interested in knowing the role of the chemicals within the document and the properties asserted for each chemical. To achieve this we need text mining techniques that can find the context around chemicals and link together information from different parts of the same document. In this talk we will present the latest version of the I2E text mining software, and show how it can be used on English and Chinese text. In particular we will show how we can discover relationships between chemicals and biological entities or properties expressed either within free text or from embedded tables.

2:20pm-2:45pm
CINF 74: Unlocking chemical information from tables and legacy articles

Daniel Lowe1 , daniel@nextmovesoftware.com, Roger Sayle1 , Antony Williams2
1 NextMove Software, Cambridge, United Kingdom; 2 Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina, United States

Many tools for text-mining are designed to work with unstructured text. Here we present the results of our efforts to apply text-mining to the semi-structured content of tables.
We will cover the difficulties of coping with the various different ways that the contents of a table may be specified and the challenges of resolving references to elsewhere in the document. We report on the extraction of melting points, boiling points, NMR spectra and biological activity data from tabular data.
In collaboration with the Royal Society of Chemistry (RSC) we have also investigated the application of these tools to the RSC’s back archive, both in tables and free text. We cover the difficulties in adapting tools optimized for patents to journal articles and the difficulties in handling the older, less structured, text that dates from as far back as 1841.
The information extracted from this project, both from patents and the RSC’s back archive, will form a key contribution to the RSC’s public data repository. In all cases the evidence text for the extracted information is provided along with a link back to the document from which it was extracted, ensuring that the provenance of the information can be verified.
2:45pm-3:00pm Intermission

3:00pm-3:25pm
CINF 75: Chemical structure identification and retrieval with OSRA

Igor Filippov2 , igor.v.filippov@gmail.com, Iwona Weidlich1
1 CODDES LLC, Rockville, Maryland, United States; 2 VIF Innovations, LLC, Gaithersburg, Maryland, United States

We present the most recent developments in small molecule, reaction and polymer image recognition in Optical Structure Recognition Application (OSRA). Algorithm enhancements and new features will be discussed.


OSRA recognition result timeline

3:25pm-3:50pm
CINF 76: P-OSRA: Translating polymer images to text using extensions of open source software

Bryn Reinstadler21 , br6@williams.edu, Hans Horn2
1 Williams College, Williamstown, Massachusetts, United States; 2 IBM - Almaden, San Jose, California, United States

Mining the chemical literature for polymeric information is an important task in advancing novel polymer discovery and synthesis. Currently, this task is made difficult by the hard-to-navigate nomenclature rules for polymers, as in many cases the difficulty of polymer nomenclature discourages scientists from giving proper names to new polymers in their papers. The difficulty of searching the literature slows down the progress in this field. P-OSRA, the Polymer Optical Structure Recognition Application, built on OSRA [1], is open-source software that aims to process images of polymers mined from the literature, output structural results in the SMILES format, and then store those results in a database available for user requests. The current implementation takes direct input of polymer images and is able to process the images and store the structural results.

[1] Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution. Igor V. Filippov and Marc C. Nicklaus, Journal of Chemical Information and Modeling 2009 49 (3), 740-743. DOI: 10.1021/ci800067r

3:50pm-4:15pm
CINF 77: Practical case studies of the application of CLiDE for the efficient extraction of chemical structures from documents

Aniko Valko1 , Aniko.Valko@keymodule.co.uk, Peter Johnson2
1 Keymodule Ltd, Leeds, United Kingdom; 2 School of Chemistry, Leeds, United Kingdom

CLiDE is a well-established optical chemical structure recognition (OCSR) tool that is designed to identify chemical structure diagrams rendered in documents as images, and to convert these diagrams into chemical connection tables with auto-detection and correction of errors.

This presentation will provide case studies to showcase the practical use of CLiDE for efficient structure extraction and error correction. Examples from various document types such as patents and journal articles will illustrate the way CLiDE deals with practical issues arising from document degradation and problematic drawing features which may convey complex or ambiguous meanings.
CINF: Sci-Mix
8:00pm - 10:00pm
Monday, August 17

Hall C - Boston Convention & Exhibition Center

8:00pm-10:00pm
CINF 109: Dark chemical matter: Could 'inactive' compounds be good starting points for drug discovery?


Anne Wassermann1 , anne.wassermann@pfizer.com
1 Pfizer Inc, Cambridge, Massachusetts, United States

8:00pm-10:00pm
CINF 118: Chemical Information Sources Wikibook - the open source created by chemical information professionals for chemical information professionals
View Session Detail

Charles Huber1 , huber@library.ucsb.edu
1 Davidson Library, University of California, Santa Barbara, California, United States

8:00pm-10:00pm
CINF 128: Scaffold-based analytics: Enabling hit-to-lead decisions by visualizing chemical series linked across large datasets


Deepak Bandyopadhyay1 , Deepak.2.Bandyopadhyay@gsk.com, Constantine Kreatsoulas1 , Pat Brady1 , Genaro Scavello1 , Dac-Trung Nguyen2 , Tyler Peryea2 , Ajit Jadhav2
1 GlaxoSmithKline, Collegeville, Pennsylvania, United States; 2 National Center for Advancing Translational Sciences, Bethesda, Maryland, United States

8:00pm-10:00pm
CINF 138: Linking transporter interaction profiles to in vivo side effects


Eleni Kotsampasakou1 , Sylvia Escher3 , Andreas Jurik2 , Harald Sitte4 , Lukas Pezawas5 , Gerhard Ecker1 , gerhard.f.ecker@univie.ac.at
1 Dept Medicinal Chemistry, Wien, Austria; 2 Dept. of Medicinal Chemistry, University of Vienna, Vienna, Austria; 3 Chemical Risk Assessment, Fraunhofer Institute of Toxicology and Experimental Medicine, Hannover, Germany; 4 Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria; 5 Department of Psychiatry and Psychotherapy, Medical University of Vienna, Vienna, Austria

8:00pm-10:00pm
CINF 13: Analyzing ToxCast data using nebula (neighbor-edges based and unbiased leverage algorithm)


Huixiao Hong1 , Huixiao.Hong@fda.hhs.gov
1 FDA, Jefferson, Arkansas, United States

8:00pm-10:00pm
CINF 149: Data driven multi-object optimization (MOO) in drug design
View Session Detail

Shahar Keinan1 , skeinan@cloudpharmaceuticals.com, Elizabeth Hobbs1 , Elizabeth Hatcher-Frush1
1 Cloud Pharmaceuticals, Inc., Durham, North Carolina, United States

8:00pm-10:00pm
CINF 160: From QSAR to big data: Developing mechanism-driven predictive models for animal toxicity


Marlene Kim2 , Hao Zhu1 , hao.zhu99@rutgers.edu
1 Chemistry Department, Rutgers Univesity, Camden, New Jersey, United States; 2 Chemistry & Biochemistry Department, Rutgers, The State University of New Jersey, Pennsauken, New Jersey, United States

8:00pm-10:00pm
CINF 166: CIIPro: An online cheminformatics portal for large scale chemical data analysis


Daniel Russo1 , danrusso@scarletmail.rutgers.edu, Wenyi Wang1 , Marlene Kim1 , Daniel Pinolini1 , Hao Zhu12
1 Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, United States; 2 Chemistry Department, Rutgers Univesity, Camden, New Jersey, United States

8:00pm-10:00pm
CINF 168: “Graphical abstracts only”: The changing use of periodicals among early career chemists


Marianne NOEL1 , noel@ifris.org
1 LISIS & IFRIS, Université Paris-Est, Marne-la-Vallée Cendex 2, France

8:00pm-10:00pm
CINF 19: Tools and strategies: Incorporating Wikipedia-based assignments into a course


Eryk Salvaggio1 , eryk@wikiedu.org, Jami Mathewson1 , jami@wikiedu.org
1 Wiki Education Founation, San Francisco, California, United States

8:00pm-10:00pm
CINF 31: Interactive web-based tools for navigating the biologically relevant chemical space


Obdulia Rabal1 , orabal@unav.es, Julen Oyarzabal1
1 Small Molecule Discovery Platform, Center for Applied Medical Research (CIMA) - University of Navarra, Pamplona, Spain

8:00pm-10:00pm
CINF 34: P-OSRA: Polymer Optical Structure Recognition Application


Bryn Reinstadler21 , br6@williams.edu, Hans Horn2
1 Williams College, Williamstown, Massachusetts, United States; 2 IBM - Almaden, San Jose, California, United States

8:00pm-10:00pm
CINF 35: Withdrawn

8:00pm-10:00pm
CINF 43: Chess-like algorithms behind Chematica's retrosynthetic planning


Sara Szymkuc1 , sara.szymkuc@icho.edu.pl, Ewa Gajewska1 , Tomasz Klucznik1 , Piotr Dittwald1 , Michal Startek3 , Karol Molga1 , Michal Bajczyk1 , Bartosz Grzybowski21
1 Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland; 2 Chemistry, UNIST, Ulsan, Korea (the Republic of); 3 Mathematics and Computer Science, University of Warsaw, Warsaw, Poland

8:00pm-10:00pm
CINF 45: Mining chemical databases to obtain knowledge based information of non-covalent interactions


Mathew Koebel1 , mathew.koebel@stlcop.edu, Suman Sirimulla1
1 Basic Sciences, St.Louis College of Pharmacy, St. Louis, Missouri, United States

8:00pm-10:00pm
CINF 46: In silico assessment of toxicity endpoints: Case-studies using CORINA Symphony and ChemTunes Studio


Christof Schwab1 , Joerg Marusczyk1 , Aleksey Tarkhov1 , Thomas Kleinoeder1 , Dimitar Hristozov4 , Bruno Bienfait5 , Oliver Sacher1 , James Rathman34 , rathman.1@osu.edu, Chihae Yang24
1 Molecular Networks GMBH, Erlangen, Germany; 2 Molecular Networks, GmbH, Erlangen, Germany; 3 Ohio State University, Columbus, Ohio, United States; 4 Altamira LLC, Columbus, Ohio, United States; 5 Molecular Networks, Erlangen, Germany

8:00pm-10:00pm
CINF 54: CHEMDNER-Patents: Automatic recognition of chemical and biological entities in patents


Martin Krallinger2 , Florian Leitner3 , Obdulia Rabal1 , orabal@unav.es, Miguel Vazquez2 , Julen Oyarzabal1 , Alfonso Valencia2
1 Small Molecule Discovery Platform, Center for Applied Medical Research (CIMA) - University of Navarra, Pamplona, Spain; 2 Structural Computational Biology Group, Spanish National Cancer Research Center (CNIO), Madrid, Spain; 3 Computational Intelligence Group, Polytechnic University of Madrid, Madrid, Spain

8:00pm-10:00pm
CINF 69: Reaction mining with condensed graphs of reactions: Problems and perspectives


Alexandre Varnek1 , varnek@unistra.fr
1 Chemistry, University of Strasbourg, Strasbourg, France

8:00pm-10:00pm
CINF 80: Opening up and connecting up antimalarial data: Progress but with caveats


Christopher Southan1 , cdsouthan@gmail.com
1 IUPHAR/BPS Guide to PHARMACOLOGY, University of Edinburgh, Göteborg, Sweden

8:00pm-10:00pm
CINF 91: Chemistry enabling Chinese, Japanese, and Korean patents

Daniel Lowe1 , daniel@nextmovesoftware.com, Roger Sayle1
1 NextMove Software, Cambridge, United Kingdom

Chinese, Japanese and Korean (CJK) patents account for over half of all national patent filings and hence are of increasing importance to patent informatics. In chemistry, searching for relevant patents relies heavily on the ability to index by chemical structures mentioned. Chemical names are typically given in the native language of the patent significantly complicating their identification and interpretation by conventional chemical text mining tools. Here we present on our approach to the translation of chemical names from CJK text and give examples of the wealth of chemical knowledge that can be unlocked.
As novel compounds are described using systematic chemical nomenclature, our approach has been developed to be especially adept at translating systematic names. Systematic chemical nomenclature in CJK languages generally follow the rules described by the IUPAC1 meaning that after translation there will exist a corresponding English name which can then be used with conventional chemical text mining tools.
Strategies for translation vary between languages. In Chinese each morpheme of an English chemical name is represented by one or more Hanzi. The interpretation of a Hanzi may be context dependent which is handled by looking at the environment in which it occurs. Japanese and Korean chemical names, by contrast, are mostly transliterations of English/German chemical nomenclature into Katakana and Hangul respectively.
As a case study we applied our approach to 44 thousand Korean patents (1990-2013) that were likely to contain chemistry and extracted 1.5 million distinct compounds. 177 thousand of these compounds were not found by a comparable analysis of US patents. Of the 759 thousand compounds, first disclosed between 2006 and 2013 by both a US and a Korean patent, for 362 thousand the Korean patent was published earlier.

8:00pm-10:00pm
CINF 94: Non-specificity of drug-target ineractions: Consequences for drug discovery


Gerald Maggiora12 , gerry.maggiora@gmail.com, Vijay Gokhale3
1 Cancer Biology, Translational Genomics Research Institute, Tucson, Arizona, United States; 2 Bio5 Institute, University of Arizona, Tucson, Arizona, United States
CINF: Herman Skolnik Award Symposium
8:00am - 12:00pm
Tuesday, August 18

Room 104A - Boston Convention & Exhibition Center
Jürgen Bajorath, Veerabahu Shanmugasundaram, Organizing
Veerabahu Shanmugasundaram
Cosponsored by: COMP and MEDI
Financially supported by: Pfizer, Presiding
8:00am-8:05am Introductory Remarks

8:05am-8:45am
CINF 92: Withdrawn

8:45am-9:25am
CINF 93: Paradigm which permits the parsing of information content arising from receptor-independent ligand activity models and receptor-dependent activity models

Anton Hopfinger1 , hopfingr@unm.edu
1 Pharmacy, University of New Mexico, Lake Forest, Illinois, United States

Through the long, and on-going, journey of developing predictive methods to construct ligand-receptor binding models, most often to estimate IC50 values in the format of QSAR models, contributions from the receptor have been neglected. Receptor information was, in the beginning, simply not available. These days, while much ligand-receptor information is available, an overarching paradigm which permits the joining of totally equivalent and comparative receptor-independent [RI] and receptor-dependent [RD] modeling methods does not exist. Simply put, we cannot 'subtract' an RD model from an equivalent RI model to answer that most hidden, but also most worrisome question, ' how much crucial design information is lost when being limited to performing only an RI analysis. The paradigm of 4D-QSAR analysis does appear to afford identical and comparative model development capabilities for both RI and RD studies. As such, both numeric and actual spatial pharmacophore substractions of RI and RD QSAR models developed from training sets in which receptor information is available can be performed. Consequently, a general assessment of lost design information in an RI study can be made. However, there does appear to be only limited qualitiative, but yet interesting general trends, regarding the particular display of design information loss or mis-representation in RI models. On a brighter note, differences in receptor geometry, as well as unearthing of coupling of receptor units motions between 'tight' and 'loose' ligand binders over the course of MD simulations integral to the RD-QSAR study may provide an additional 'dimension' to explore as part of the benefit to performing a RD-4D-QSAR study.

9:25am-10:05am
CINF 94: Non-specificity of drug-target ineractions: Consequences for drug discovery


Gerald Maggiora12 , gerry.maggiora@gmail.com, Vijay Gokhale3
1 Cancer Biology, Translational Genomics Research Institute, Tucson, Arizona, United States; 2 Bio5 Institute, University of Arizona, Tucson, Arizona, United States

Dealing with the complexity of human systems in drug discovery is a daunting task. At best we have an imperfect picture of their underlying physiology and pharmacology, which raises the question of how to identify potential drug compounds from the vast sea of xenobiotics that populate chemical space. The dominant approach is still based on the single-drug, single-target paradigm, which has a number of inherent problems. While newer multi-target approaches address some of them, they are not entirely without problems of their own. Superimposed on all of these difficulties is a surprising lack of compound and target specificity that the growing amount of data clearly shows is a more pervasive problem than has generally been assumed. The talk will explore a number of issues associated with this non-specificity and indicate possible ways for addressing them.

10:05am-10:45am
CINF 95: Molecular similarity approaches in chemoinformatics: Early history and bibliometric analysis

Peter Willett1 , p.willett@sheffield.ac.uk
1 Information School, University of Sheffield, Sheffield, United Kingdom

The first part of this paper will describe the emergence and early history of three important applications of molecular similarity in chemoinformatics: similarity searching, database clustering and molecular diversity analysis. It will then report a biobliometric analysis of the literature of molecular similarity, as reflected in the Web of Science database, focussing on some of the key papers and on its impact as reflected in literature citations.
10:45am-11:00am Intermission

11:00am-11:30am
CINF 96: Generative topographic mapping: Universal tool for chemical space analysis

Alexandre Varnek1 , varnek@unistra.fr
1 Chemistry, University of Strasbourg, Strasbourg, France

Generative Topographic Mapping (GTM) represents a universal approach to visualize chemical space, to predict activity profiles, to conduct virtual screening and to compare databases of chemical compounds. Unlike other popular methods of data visualization (PCA, SOM, etc), for a given molecule GTM provides with the data probability distribution functions (PDF), both in the high-dimensional space defined by molecular descriptors and in 2D latent space 1, 3. This opens a possibility to build classification or regression models and to introduce their applicability domains 2. Numerous tests show that the performance of GTM-based models is similar to that of popular machine-learning methods (SVM, RF, etc) 4. Moreover, GTM allows one to build QSAR models not only for individual activities, but for whole pharmacological profiles. GTM offers a valuable solution of Big Data problem since it is able to analyze chemical spaces of large databases containing millions compounds (ChEMBL, suppliers DB) 5.
Here, we demonstrate an application of GTM to visualization and analysis of “optimal” chemical spaces of biologically active molecules, chemical reactions and protein-ligand interactions.

References:
1. N. Kireeva, I.I. Baskin,H. A. Gaspar, D. Horvath, G. Marcou and A. Varnek, Molecular Informatics, 2012, 31, 201-312
2. H. A. Gaspar, G. Marcou, A. Arault, S. Lozano, P. Vayer, A. Varnek J. Chem. Inf. Model., 2013, 53 (4), 763-772
3. V. Chupakhin, G. Marcou, H. Gaspar, A. Varnek
Computational and Structural Biotechnology Journal, 2014, 10 (16), 33–37
4. H. A. Gaspar, I. I. Baskin, G. Marcou, D. Horvath, A. Varnek , Molecular Informatics, 2015,
DOI: 10.1002/minf.201400153
5. H. Gaspar, I.I. Baskin, G. Marcou, D. Horvath and A.Varnek J. Chem. Inf. Model., J. Chem. Inf. Model., 2015, 55 (1), 84–94

11:30am-12:00pm
CINF 97: Development of a knowledge-generating platform driven by big data in drug discovery through production processes

Kimito Funatsu1 , funatsu@chemsys.t.u-tokyo.ac.jp
1 Univ Tokyo Dept Chem Sys Eng, Tokyo, Japan

While massive amounts of quantitative data have accumulated across the pipeline of a drug candidate's initial discovery up through its production process, knowledge of and data analysis for each of the discovery and production processes has remained isolated. In this project, we aim to establish a platform which allows us to unify relevant knowledge about the different processes and their associated data, and to advance research into improved and optimized systems that view pharmaceutical development from a comprehensive, correlated, and high-level perspective. The research will be driven by big data, beginning with extraction of patterns for directions in lead molecule development based upon large volumes of compound and protein data. These patterns will be combined with a virtually-generated compound library and identification of targets for such compounds, with candidate compounds further evaluated for their synthetic and production easibility. Massive quantities of production-related will also drive the development of methods for assessing safe operation of production plants that are sufficiently equipped for production of the generated drug candidates, leading to the establishment of enhanced models for risk assessment, risk management, and quality control. This research will lead to a new, collective platform for systematic and efficient development of pharmaceuticals from discovery through production.
CINF: Scientific Integrity: Can We Rely on the Published Scientific Literature?
9:00am - 12:25pm
Tuesday, August 18

Room 104B - Boston Convention & Exhibition Center
Judith Currano, William Town, Organizing
William Town
Cosponsored by: COMSCI, ETHC and PROF, Presiding
9:00am-9:05am Introductory Remarks

9:05am-9:30am
CINF 98: Integrity, ethics, and trust in scientific research literature

Christopher Leonard1 , christopher.j.leonard@gmail.com
1 QScience, Qatar Foundation, Doha, Qatar

Ever since the internet became the primary means of disseminating scientific research, it sometimes appears that research publishing has lurched from crisis to crisis with seemingly increasing rates of fraud, plagiarism, retraction and selective publications. There are many reasons behind this, including pressure to publish on researchers, and easier ways for publishers and academics to identify plagiarism. However the internet also offers many new ways to improve how results are communicated to our peers. The new publishing landscape is often defined in terms of technology and the 'bells and whistles' which can improve many aspects of a manuscript, but it also offers the potential of a new era of ethics, intergrity and reproducibility - especially in peer review and emerging publication areas such as datasets. The price is increased vigilance at the authoring and reviewing stages, but the cost of not doing so is incalcuable.

9:30am-9:55am
CINF 99: Policy making at the American Chemical Society: Developing a statement on scientific integrity

Sarah Cooney1 , sarah_cooney@bat.com, Christopher Proctor1 , christopher_proctor@bat.com
1 BAT Group Research Development, Southampton, United Kingdom

Acting with integrity is absolutely critical to the scientific process, so much so that this concept has been embedded in government policies around the world. Leading scientific membership associations have in turn created their own policies to support these initiatives and their members in best practice. This paper will examine, from the perspective of a member of the writing committee, how the American Chemical Society created its statement 'Scientific insight and integrity in public policy'. The second half of this paper will focus on the importance of integrity in controversial areas of science, in particular the case of tobacco harm reduction.

9:55am-10:20am
CINF 100: Publishability

Martin Hicks1 , mhicks@beilstein-institut.de
1 Beilstein Institut, Frankfurt, Germany

Publishing scientific papers has several functions: registration, certification, dissemination, inspiring innovation and archiving. The scientific community is then expected over time to verify the ideas and results and in the end uncover the objective truth. It is assumed that authors make best efforts to ensure that their submissions are correct, and that the scientific community has time and resources to concern itself with review, reproduction and validation. Nowadays, most people are in principle able to get access to individual articles that they need – but the amount of information has increased so much that scientists cannot keep up with all the publications in their own area of expertise more than skimming through the TOC. To maximize their reputation scientists are expected to publish a certain number of papers per year in journals having a certain minimum impact factor. Having a significant new idea is no longer sufficient – the numbers are what matters, behaviour adapts to the system and output is adjusted accordingly. It seems likely that this is also leading to the increase in plagiarism and other unethical behaviour. This presentation will discuss the effects of this increasingly problematical publish or perish paradigm and illustrate with experiences gained during the publishing journals of the Beilstein-Institut.
10:20am-10:35am Intermission

10:35am-11:00am
CINF 101: What is the role of peer review in protecting the integrity of scientific research?

Na Qin1 , qinna@msu.edu
1 Michigan State University, East Lansing, Michigan, United States

Scientific misconduct, such as plagiarism, data manipulation, data fabrication or duplicate publications occurs with distressing frequency in science communities. The peer review system has long been used as a self-regulation approach to maintain the standards of quality, improve performance, and provide credibility. However, fraudulent and flawed research has been published even in peer-reviewed journals, and the number of articles retracted for fraud or error has risen dramatically in the last decade. Is detecting scientific misconduct or errors a primary goal of peer review? What is the role of peer review’s role in ensuring the responsible conduct of scientific research?

This presentation addresses the difficulties and limitations of anonymous peer review in detecting the irresponsible conduct in scientific research. These include the naturally conflicting concepts from peer review: expertise and objectivity, and the capacity to expose or minimize legal or ethical issues. We also give attention to the recent devastating fraud cases. One is the case of Korean stem cell scientist Woo-Suk Hwang, who falsely claimed to have created 11 new human embryonic stem (ES) cell lines, with issues of fabricated data and other forms of deception. The other is the public disclosure of transgression by Dong- Pyou Han, who published research about a vaccine exhibiting anti-HIV activity. He was charged with making false statements in June 2014.

This presentation is intended to start a conversation on whether peer review is, by its nature, ill-equipped to detect scientific misconduct. The practice of sciences involves its own self-corrections, and the peer review system does not replace that. Understanding this can reduce the expense to researchers who try to use or replicate fabricated results.

11:00am-11:25am
CINF 102: Open, network-based answer to the reproducibility crisis: The ScienceOpen peer review concept

Stephanie Dawson1 , stephanie.dawson@scienceopen.com
1 ScienceOpen GmbH, Berlin, Germany

Spectacular failures of the anonymous peer review system, even in highly prestigious journals, paired with research demonstrating extremely low levels of reproducibility in landmark studies have called the present system of scientific quality assurance into question. To create a more effective, transparent and fairer system that begins to address the question of reproducibility, we developed the networking and publishing platform ScienceOpen. A researcher network forms the basis for public post-publication peer review, because we believe that a transparent network approach provides more rigorous quality control than two anonymous referees. Articles submitted to ScienceOpen are published rapidly after an editorial check, followed by an open peer review process. A unique versioning concept allows the researcher to continue to improve his/her published work based on comments and reviews by scientists in the field. Papers are not marked as approved because information on the reproducibility of experiments comes later than the first expert check, and thus the status of a paper may change. An article published on ScienceOpen is also placed within the wider context of all Open Access publications in its field as ScienceOpen aggregates content from a variety of sources, opening them up to discussion with the same tools for commenting, sharing and discovery. With this holistic concept ScienceOpen provides high-quality open access publishing services, while redefining publishing as one element in a whole suite of communication tools available to the researcher. Scholarly publishing is not an end in itself, but the beginning of a dialogue to move the whole field forward.

11:25am-11:50am
CINF 103: Managing new threats to the integrity of the scientific literature

Judith Currano1 , currano@pobox.upenn.edu, Kenneth Foster2
1 Chemistry Library, University of Pennsylvania, Jenkintown, Pennsylvania, United States; 2 Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, United States

This paper, by a chemistry librarian and a professor who edits an online journal, frames the challenges facing scientists at all levels as a result of the highly variable quality of the scientific literature resulting from the introduction of a deluge of new open-access online journals, many from previously unknown publishers with highly variable standards of peer review. The problems are so pervasive that even papers submitted to well-established, legitimate journals may include citations to questionable or even frankly plagiarized sources. The authors will suggest ways in which science librarians can work with students and researchers to increase their awareness of these new threats to the integrity of the scientific literature and to increase their ability to evaluate the reliability of journals and individual articles. Traditional rules of thumb for assessing the reliability of scientific publications (peer review, publication in a journal with an established Thomson-Reuters Impact Factor, credible publisher) are more challenging to apply given the highly variable quality of many of the new open access journals, the appearance of new publishers, and the introduction of new impact metrics, some of which are interesting and useful, but others of which are based on citation patterns found in poorly described data sets or nonselective databases of articles. The authors suggest that instruction of research students in Responsible Conduct of Research be extended to include ways to evaluate the reliability of scientific information.
11:50am-11:55am Concluding Remarks
CINF: Herman Skolnik Award Symposium
1:00pm - 5:00pm
Tuesday, August 18

Room 104A - Boston Convention & Exhibition Center
Jürgen Bajorath, Veerabahu Shanmugasundaram, Organizing
Veerabahu Shanmugasundaram
Cosponsored by: COMP, COMP, MEDI and MEDI
Financially supported by: Pfizer, Presiding

1:00pm-1:30pm
CINF 104: Enabling drug discovery by computational molecular design

Gisbert Schneider1 , gisbert@ethz.ch, Petra Schneider1
1 Institute of Pharmaceutical Sciences, ETH, Zurich, Switzerland

Innovative bioactive agents fuel sustained drug discovery and the development of new medicines. Future success in chemical biology and pharmaceutical research will fundamentally rely on the combination of advanced synthetic and analytical technologies that are embedded in a theoretical framework that provides a rationale for the interplay between chemical structure and biological effect. A driving role in this setting falls on leading edge concepts in computer-assisted molecular design and engineering, by providing real-time access to a virtually infinite source of novel tool compounds and lead structures, and guiding experimental screening campaigns. We will present concepts and ideas for the representation of molecular structure, predictive models of structure-activity relationships, the de-orphaning of bioactive compounds, automated molecular design, and discuss de novo design approaches that have proven their usefulness and will contribute to future drug discovery by generating innovative bioactive agents. Emphasis will be put on the reaction-based construction of potent and selective enzyme inhibitors and modulators of G-protein coupled receptors. As we are currently witnessing strong renewed interest in bioactive natural products we will showcase new methods for natural-product inspired molecular design and macromolecular target prediction.

Selected references:

Reker, D. et al. (2014) Revealing the macromolecular targets of complex natural products. Nature Chem. 6, 1072–1078.

Reker, D., Rodrigues, T., Schneider, P., Schneider, G. (2014) Identifying the macromolecular targets of de novo designed chemical entities through self-organizing map consensus. Proc. Natl. Acad. Sci. USA 111, 4067–4072.

Reutlinger, M., Rodrigues, T., Schneider, P., Schneider, G. (2014) Multi-objective molecular de novo design by adaptive fragment prioritization. Angew. Chem. Int. Ed. 53, 4244–4248.

Rodrigues, T., Schneider, P. and Schneider, G. (2014) Accessing new chemical entities through microfluidic technology. Angew. Chem. Int. Ed. 53, 5750–5758.

Spänkuch, B. et al. (2013) Drugs by numbers: Reaction-driven de novo design of potent and selective anticancer leads. Angew. Chem. Int. Ed. 52, 4676–4681.

Schneider, G. (2010) Virtual screening: An endless staircase? Nat. Rev. Drug Discov. 9, 273–276.

1:30pm-2:00pm
CINF 105: Integrating public data sources into the drug discovery workflow

Patrick Walters1 , pat_walters@vrtx.com, Alex Aronov1 , Brian Goldman1 , Jun Feng2 , Brian McClain2 , Lidio Meireles2 , Hsin-Pei Shih2 , Jonathan Weiss2
1 Vertex Phamaceuticals, Boston, Massachusetts, United States; 2 Vertex Pharmaceuticals, Boston, Massachusetts, United States

Over the last ten years, there has been a dramatic increase in the amount of chemical and biological data that is available on the Internet. Databases such as ChEMBL, PubChem, BindingDB, and RCSB provide valuable information that can influence the directions of a drug discovery project. While these databases provide a wealth of information, the data is often not in a format that is easily accessible to the bench scientist. In addition, scientists may be unaware of these resources, or may not know how to access and integrate the data. While it is tempting to simply integrate large amounts of public data into in-house systems, software developers must be careful to inform, without overwhelming, the target audience. This presentation will highlight a number of our efforts to integrate data from Internet sources into our workflow. We will provide examples to illustrate how we are leveraging public data in our drug discovery projects.

2:00pm-2:30pm
CINF 106: Going beyond R-group tables: Close-in analog prioritization using neighborhood information derived from SAR matrices

Liying Zhang1 , Kjell Johnson1 , Jeremy Starr1 , Chris Poss1 , Jared Milbank1 , Max Kuhn1 , Veerabahu Shanmugasundaram1 , Veerabahu.Shanmugasundaram@pfizer.com
1 Pfizer, Groton, Connecticut, United States

SAR matrices (developed by Prof. Jürgen Bajorath and coworkers) provide a novel framework to automatically extract SAR patterns from data sets and organize the exhaustive SAR information contained in the dataset in an easy and interpretable fashion. The approach uses a matched molecular pair-like algorithm to identify and automatically extract groups of structurally related compounds, and displays the resultant information in a chemically interpretable fashion. We have expanded the original methodology of prioritizing virtual compounds contained in SAR matrices that uses a nearest neighborhood analysis (NNA) to augmenting that analysis with structural similarity information, performing robust statistical evaluations to identify privileged core groups and R-groups (via ANOVA analysis) and searching for global optimums in large virtual compound space (using modified GA methods like SELC). We have also enabled interactive and dynamic SAR matrix mining and analysis within TIBCO/Spotfire DXP environment for enhanced SAR matrix visualization and easy use within medchem project teams. A few case studies illustrating the use of these methods to interrogate the wealth of SAR information contained within datasets that enable prioritization of close-in analogs will be presented.
2:30pm-2:45pm Intermission

2:45pm-3:15pm
CINF 107: AnalogExplorer: A new method for graphical analysis of analog series and associated structure−activity relationship information

Ye Hu1 , pauline810805@googlemail.com
1 Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, University of Bonn, Bonn, Germany

As rapidly increasing amounts of SAR data become available, graphical approaches have been introduced to explore structure-activity relationships (SARs) contained in compound data sets. We introduce a new computational methodology for the graphical representation and analysis of analog series that is distinct from currently available approaches and data structures, termed AnalogExplorer. The method is compound-based and systematically explores substitution sites or site combinations in analog series, regardless of the number of substitution sites they might contain or the structural diversity of R-groups. AnalogExplorer consists of three graphical components, including a complete graph that provides a hierarchy of all possible substitution site(s) for a given analog series, a reduced graph that only displays explored substitution site(s), and R-group trees that represent subsets of analogs with given substitution site(s). AnalogExplorer can be applied in different ways. The approach can be used to analyze individual analog series or all series that are contained in a given compound activity class. Furthermore, for series of analogs with activity against multiple targets, AnalogExplorer can be used to compare SAR information associated with a series for different targets. Moreover, it can be applied to compare SAR patterns for structurally related analog series with activity against a given target. Taken together, AnalogExplorer further extends the current spectrum of methods for the analysis of analog series and adds to the chemoinformatics tool box for medicinal chemistry applications.

3:15pm-3:45pm
CINF 108: How many fingers does a compound have? The various ways to define molecular similarity

Eugen Lounkine1 , lounkine@gmail.com
1 Novartis Institutes for Biomedical Research, Cambridge, Massachusetts, United States

The concept of molecular fingerprints and molecular similarity have matured and found innumerable applications in academia as well as industry, in particular in drug discovery. Chemical similarity is almost too commonplace for us to notice anymore. Still, this powerful concept – that molecules can a) be represented in terms of their interesting properties and b) are in one way or another similar to each other – has been growing over the past two decades to break out of the confinement of chemical space. Today, we don’t just use chemical similarity to find compounds that biologically will behave the same; rather, we directly build on ever growing biological profiles to directly assess bio-similarity. In addition, capturing the rich descriptions of compound-induced phenotypes from literature gives us yet another molecular fingerprint. This brings new challenges and opportunities such as: How do we define and encode bioactivity and literature profiles in form of comparable fingerprints? How do we deal with the inherent sparseness of such representations? And, most importantly, how do we use these various ways of defining similarity in concert? Network concepts that have emerged and matured in social sciences, such as friend-of-a-friend may be of help – after all, we have been using the concept of “chemical neighborhoods” all along. Here we will present strategies to defining and slicing non-conventional molecular fingerprints as well as the application of network algorithms to build and navigate heterogeneous similarity networks.

3:45pm-4:15pm
CINF 109: Dark chemical matter: Could 'inactive' compounds be good starting points for drug discovery?


Anne Wassermann1 , anne.wassermann@pfizer.com
1 Pfizer Inc, Cambridge, Massachusetts, United States

Little attention is given to small molecules in (as of yet) biologically inactive regions of the chemical space. Here we coin the tem “dark chemical matter” (DCM) for compounds that have never shown any biological response despite having been tested in numerous biological experiments. We quantify DCM, validate it in quality control experiments, and map it into chemical space. We show that DCM is characterized by defined substructures that go beyond simple physicochemical properties and are found in both corporate and public compound collections. With the help of cancer cell proliferation, reporter gene, gene expression, and yeast chemogenomics assays, we evaluate the potential of DCM to show biological activity in future screens. Our experiments demonstrate that, when tested for the right phenotype or target, DCM can elicit strong biological responses. Consequently, we believe that DCM is not generally biologically inert and conclude that that their reduced promiscuity makes compounds from DCM a valuable resource for selective biological probes and starting points for drug discovery programs.

4:15pm-4:45pm
CINF 110: Complexity and heterogeneity of data for chemical information science

Jürgen Bajorath1 , bajorath@bit.uni-bonn.de
1 Life Science Informatics, University of Bonn, B-IT, Bonn, Germany

Similar to the situation in biology a few years ago, we currently witness the advent of the big data era in medicinal chemistry. Rapidly growing compound numbers and volumes of activity data require elaborate infrastructures for deposition, curation, and organization. However, the need for such infrastructures only partly reflects the challenges associated with big data phenomena. Especially increasing complexity and heterogeneity of compound data challenge computational analysis and knowledge extraction, probably even more so than mere data volumes. Confidence criteria must be carefully considered when drawing conclusions from compound data mining, for example, in the large-scale exploration of structure-activity relationships. Otherwise, trends in activity progression might be overestimated or, on the contrary, relevant relationships might be overlooked. Popular concepts such as compound promiscuity or activity cliffs are viewed in light of increasing volumes, complexity, and heterogeneity of compound activity data. It is shown that inconsistent results are often obtained when different criteria are applied. Furthermore, studying compound data growth and characteristics on a time course often reveals interesting or unexpected trends. It is also evident that visualization techniques such as network representations are of critical importance for handling the activity data deluge, analyzing data mining results, and enabling meaningful interpretation. Despite the difficulties involved in compound data mining and analysis in the imminent big data era, there are exciting opportunities for chemical information science at the interface with pharmaceutical research.
4:45pm-5:00pm Awards Presentation
CINF: Scientific Integrity: Can We Rely on the Published Scientific Literature?
1:30pm - 5:20pm
Tuesday, August 18

Room 104B - Boston Convention & Exhibition Center
Judith Currano, William Town, Organizing
Judith Currano
Cosponsored by: COMSCI, ETHC and PROF, Presiding
1:30pm-1:35pm Introductory Remarks

1:35pm-2:00pm
CINF 111: Toward a more reproducible corpus of scientific literature

Cesar Berrios1 , cesar.berrios-otero@f1000.com
1 Faculty of 1000, Brooklyn, New York, United States

Several recent reportsand high profile retractions have added to a growing chorus of concern among scientists and laypeople clamoring for a restructuring of the system necessary for reproducibility in science. However, the problem is a complex one for which there is no single solution. Some factors that have contributed include: poor training in proper experimental design; increased emphasis in making outlandish statements; and an over reliance on publishing papers in peer reviewed journals with high impact factors for purposes of career progression and tenure.

The availability and accessibility of all underlying data necessary to reproduce a study has been identified as integral to solving these issues, yet most traditional journals often have limited space available for each paper. Furthermore, there are numerous technical obstacles in making datasets truly accessible. These issues combine to create a scientific culture where sharing and publishing data ends up low on a researchers’ list of priorities impeding further progress towards reproducible research.
At F1000Research we are working with to begin addressing some of these challenges. We have implemented several initiatives to provide methods and tools to capture the production of scientific data, and to establish this as an important output of research activity in itself. All F1000Research articles include the underlying data to enable others to attempt to reproduce the findings, and even to reuse the data. We also offer authors the option to publish data-only papers, which include just the data together with a detailed description of the protocol used to generate the data. In addition, all articles are openly peer reviewed, post-publication, and previous versions of each article are archived.

We will describe how our data policy and transparency in the peer review process allows reviewers and readers to carefully scrutinize the data underlying the conclusions and to follow the full provenance of each paper and to, ultimately leading to a more trustworthy corpus of scientific literature.

2:00pm-2:25pm
CINF 112: Extraordinary public access to scientific evidence in the FDA modified risk tobacco product process

James Solyst1 , jim.solyst@smna.com
1 Swedish Match North America, Severna Park, Maryland, United States

Section 911 of the US Tobacco Control Act--Modified Risk Tobacco Product (MRTP)-- provides a process for a company to submit scientific evidence demonstrating a product is of lower risk (modified risk) than another tobacco product. A MRTP application must demonstrate that by switching from one product (cigarettes for example) to a another product (Swedish snus, for example) a user reduces his or her individual risk and the switch benefits the health of the overall population. Reducing harm is generally accepted as a good thing, but tobacco use is widely viewed as something to be eliminated. In addition, the tobacco industry has a challenging history of scientific integrity Thus, the MRTP process is filled with difficult public health issues and FDA, which implements the Tobacco Control Act, has had to manage a process that ensures scientific integrity and is consistent with public health goals. One way that has been accomplished is through extraordinary transparency: specifically by making all but infromation in a MRTP publicly available (except for condifiential busines information) which is not the case with other FDA product applications. A current MRTP application fro Swedish snus provides a case study.

2:25pm-2:50pm
CINF 113: Validation and fraud in small-molecule crystallography

Sean Conway1 , sc@iucr.org
1 IUCr, Chester, United Kingdom

Publishing in crystallography is underpinned by a wealth of structural data. On International Union of Crystallography (IUCr) journals, submitted data is rigorously checked for correctness and consistency. Even so, the journals have experienced cases of fraud in small-molecule reports. In-house validation software is constantly evolving to guard against a growing variety of egregious errors and fraudulent practice.

2:50pm-3:15pm
CINF 114: Scientific integrity: A crystallographic perspective

Ian Bruno1 , bruno@ccdc.cam.ac.uk
1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom

The Cambridge Structural Database (CSD) contains over 750,000 experimental determinations of small molecule crystal structures, the majority of which are made available by researchers to support the science published in journal articles. The crystallographic community has over the years developed tools that support evaluation of the scientific integrity of crystal structure data and some publishers and journals pay particular attention to this during the peer review process. Once structures are published, the Cambridge Crystallographic Data Centre (CCDC) undertakes further scientific processing of the data before including structures in the CSD.

This presentation, timed to coincide with the 50th anniversary of the Cambridge Structural Database, will offer a perspective on scientific integrity based on crystal structure data collected over the last half century and experiences encountered during this time. It will also look at the role domain-specific data centres such as the CCDC can play now and in the future to help ensure trust in the results of scientific research.
3:15pm-3:30pm Intermission

3:30pm-3:55pm
CINF 115: Ways publishers help, maintain, and support responsible research

Raymond Boucher1 , rboucher@wiley.com
1 John Wiley and Sons Ltd, Chichester, United Kingdom

At all stages of the publishing process including pre-publication and post-publication the Publisher is engaged in helping to maintain the integrity of the scientific record. The talk will cover areas where the publisher is involved such as: publishing workshops and how the next generation are trained; plagiarism and other pre-publication software packages for specific communities; maintaining the quality of peer review, ethics guidelines, bodies such as the Committee on Publication Ethics (COPE) and how issues are dealt with; retractions - an analysis of process and practice.
The talk will illustrate how the Publisher supports and promotes the publication of responsible research and how interaction with the community is key to this process.

3:55pm-4:20pm
CINF 116: Integrity, trust, and reproducibility: How scientific publishers can contribute

Guido Herrmann1 , guido.herrmann@thieme.de
1 Georg Thieme Verlag Kg, Stuttgart, Germany

Thieme has been a chemistry publisher since 1909. We publish scientific information in various formats: journals, reference works, encyclopaedia, monographs and textbooks.
Scientists have to rely on the validity of the published information.
How does Thieme address this issue? What are our internal procedures and mechanism to safeguard the quality of our publications? What are Thieme’s experiences with fraud and plagiarism? How do we engage our authors, editors, advisors and readers in this process? Are there differences between original research articles, reference works and textbooks?

The talk will look into these questions, present background information and will highlight some of our key findings and best practises.

4:20pm-4:45pm
CINF 117: The write stuff – scientific integrity and publishing

Jamie Humphrey2 , humphreyj@rsc.org, Richard Kidd1 , kiddr@rsc.org
1 Royal Soc of Chem T Graham Hse, Cambridge, United Kingdom; 2 Royal Society of Chemistry, Cambridge, United Kingdom

What are the responsibilities of a publisher in addressing the questions of scientific integrity, and how is this changing? We will give a view from the Royal Society of Chemistry covering our principles and practices, and how we work with our community worldwide to evolve our approaches. Will the increasing push towards the availability of original data make validation easier, and what does ‘reproducible’ mean exactly?
4:45pm-4:50pm Concluding Remarks
CINF: Computational Toxicology: From QSAR Models to Adverse Outcome Pathways
8:15am - 11:35am
Wednesday, August 19

Room 103 - Boston Convention & Exhibition Center
Mohamed AbdulHameed, Organizing
Mohamed AbdulHameed
Cosponsored by: AGRO, COMP, ENVR and MEDI, Presiding
8:15am-8:20am Introductory Remarks

8:20am-8:40am
CINF 132: Using mode-of-action (MOA) data to guide the development of local quantitative structure-activity relationship (QSAR) models for molecular and early cellular events in an adverse outcome pathway (AOP)

Jay Tunkel1 , tunkel@srcinc.com, Julie Melia1 , Kelly Salinas1 , Laura Morlacci1 , Jennifer Rhoades1 , Mary Kawa1 , Catherine Rudisill1 , Heather Carlson-Lynch1
1 EHA, SRC, Inc., La Fayette, New York, United States

The development of local QSAR models for specific health outcomes within a chemical class holds promise for predicting the toxicity of unstudied compounds in support of Green Chemistry and safer alternative efforts. The utility of Mode-of-Action (MOA) data for developing local QSAR models was evaluated in the following steps: 1) publicly available databases were queried to identify well-studied chemicals (i.e., chemicals with toxicity values from EPA, ATSDR or CalEPA) with MOA information available; 2) these chemicals were subsequently clustered for read-across using a structure-based approach; 3) clusters were selected where all members are expected to undergo a similar MOA; 4) evaluation of the data from a chemistry and toxicological perspective to aid in the identification of information rich, QSAR descriptors that specifically describe the MOA and are not based on a serendipitous relationship. The clustering exercise, performed on 877 chemicals using our Chemical Assessment Clustering Engine (ChemACE), resulted in the formation of approximately 90 clusters. The candidate cluster selected for evaluation contained seven compounds that were structurally similar to 2-butoxyethanol, a compound of current interest arising from its potential hematotoxicity and commercial uses. Importantly, the limited structural features present in cluster members would facilitate discussions between SRC chemists and toxicologists while unraveling the MOA and defining potency trends based on steric, electronic, lipophilic, or other factors. An AOP was described for 2-butoxyethanol-induced hematotoxicity, where the molecular initiating event (MIE) involves interaction between the oxidized metabolite (2-butoxy acetic acid) and the erythrocyte membrane. With this additional understanding of the AOP, we identified additional chemicals expected to behave in a similar fashion and also possess experimental data through the use of our Analog Identification Method (AIM) software, which provided an additional 78 structural compounds for consideration. The results of the data collection and evaluation process indicated that neither the rate of metabolism of the alcohol to the acid nor their corresponding dissociation constants are sufficient to explain the observed trends. Instead, the participation of the beta ether oxygen in an intramolecular 5-membered calcium chelate better describes the observed trends.

8:40am-9:00am
CINF 133: QSAR models could replace LLNA test for predicting human skin sensitization potential of chemicals

Vinicius Alves2 , viniciusm.alves@gmail.com, Rodolpho Braga2 , Eugene Muratov4 , murik@email.unc.edu, Denis Fourches5 , Nicole Kleinstreuer6 , Judy Strickland6 , Carolina Andrade1 , Alexander Tropsha3 , alex_tropsha@unc.edu
1 Faculty of Pharmacy, Federal University of Goias, Goiania-GO, Brazil; 2 Faculty of Pharmacy, Federal University of Goiás, Goiania, Goias, Brazil; 3 Univ of North Carolina, Chapel Hill, North Carolina, United States; 4 Medicinal Chem Natural Products, University of North Carolina, Chapel Hill, North Carolina, United States; 5 North Carolina State University, Raleigh, North Carolina, United States; 6 ILS/NICEATM, Raleigh, North Carolina, United States

We have compiled, curated, and integrated the largest dataset of publicly available data on 135 compounds tested in both mouse rLLNA and human HMT or HRIPT tests. The concordance between the respective mouse and human endpoints was only 62%. Thus, validated binary QSAR models for the human endpoints were built using SVM with SiRMS, Dragon, Morgan, and CDK descriptors. The consensus model integrating individual models showed external balanced accuracy of 70% (at 75% coverage). It outperformed the mouse test as a predictor of the human response in all statistical characteristics other than sensitivity. Virtual screening of the COSMOS cosmetic chemical library using the consensus model identified ~200 putative skin sensitizers. The significant descriptors were interpreted in terms of structural moieties responsible for skin sensitization. This analysis also helped identifying chemotypes with good concordance between rLLNA and human data, as well as those that were stronger predictors of the human response.

9:00am-9:20am
CINF 134: Assessing skin sensitization potential by combining AOP-informed chemotype alerts, QSAR models, and in vitro biological assay data

James Rathman34 , rathman.1@osu.edu, Chihae Yang24 , Aleksandra Mostrag-Szlichtyng4 , Bruno Bienfait2 , Joerg Marusczyk2 , Christof Schwab1
1 Molecular Networks GMBH, Erlangen, Germany; 2 Molecular Networks, GmbH, Erlangen, Germany; 3 Ohio State University, Columbus, Ohio, United States; 4 Altamira LLC, Columbus, Ohio, United States

We present a workflow for evaluating skin sensitization potential in which evidence from multiple sources is rigorously and quantitatively combined to arrive at a weight-of-evidence prediction with associated estimation of uncertainty. In silico approaches, based on chemical structure and physicochemical properties are combined with experimental results from in vitro assays. Both the computational and experimental strategies are designed to reflect key events in the adverse outcome pathway (AOP) for skin sensitization, in which the molecular initiating even (MIE) is the covalent modification of proteins. ToxPrint chemotypes, structural fragments encoded with physicochemical properties and electronic system information, were used to categorize chemicals into MIE classes, including: alkyl halides, carboxylates, Michael acceptors, Schiff base formers, and phenols. Skin metabolic rules, also coded in chemotypes, were developed and added to the workflow to include pre-and prohaptens. Predictive QSAR models were developed from these descriptors and a number of chemotype alerts were identified. While structure- and property-based in silico methods are suitable for molecular events, they are somewhat limited in their ability to adequately address subsequent biological processes. Experimental data from biological assays relevant to multiple key events in the AOP were therefore included in the workflow: DPRA (direct peptide reactivity assay), KeratinoSens and LuSens (activation of Keap1/Nrf2 signaling pathway), and h-CLAT (dendritic cell activation). The results of these biological assays enhanced the reliability and reduced the uncertainty of skin sensitization predictions. Cross-validation performance was very good, as indicated by sensitivity greater than 90% and specificity higher than 80%. Even for compounds correctly predicted by the in silico approach alone, integration of biological assay data reduced the uncertainty of the prediction.

9:20am-9:40am
CINF 135: Using OpenTox to map toxicity data to adverse outcome pathways

Barry Hardy1 , barry.hardy@douglasconnect.com
1 Douglas Connect, Zeiningen, Switzerland

OpenTox provides a framework where data and model predictions can be retrieved from a federated set of data sources.

In this presentation we will discuss the concept of using relevant information (including its retrieval and organisation from multiple sources) in the development and application of Adverse Outcome Pathways (AOPs).

Analysis and discussion of the information obtained were used in conjunction with a collaborative wiki based approach to AOP development and interpretation.

We discuss the requirements and prospective solutions for the further computational development of AOPs e.g., using data and metadata to semantically annotate AOP nodes, challenge limitations in an AOP or propose new nodes for the AOP.

We will present meta analysis examples from the ToxBank data infrastructure project supporting integrated analysis across biochemical, functional and omics datasets supporting the safety assessment goals of the SEURAT-1 program which aims to develop alternatives to animal testing.

9:40am-10:00am
CINF 136: Cheminformatic tools in support of pharmacokinetics and ADME profiling

Michael Goldsmith1 , r.goldsmith@chemcomp.com, Daniel Chang2
1 Applications Sciences, Chemical Computing Group, Montreal, Quebec, Canada

The AOP framework can effectively link HTS toxicity predictions corresponding to a molecular initiating event (MIE) with adverse outcomes of regulatory concern. In an analogous manner, dosimetry modeling, such as physiologically-based pharmacokinetic (PBPK) modeling, can effectively link external exposures to target tissue doses required to trigger the MIE. However, current HTS approaches can screen thousands of compounds per year in contrast with a few hundred existing PBPK models. In this study, a comprehensively curated PBPK-related corpus was developed to provide model predictions based on chemical structure similarity. First, publications that contains PBPK models in either title or abstract were retrieved from the literature. Chemical Named-Entity Recognition (NER) tools were then used to annotate chemical names, which were checked for redundancies and mapped to chemical SMILES strings and CAS numbers through a name lookup in several molecular repositories. In addition, the 3-dimensional molecular geometries, as well as both 2D and 3D molecular descriptors, were computed for this entire chemical dataset using the Molecular Operating Environment (MOE 2014.09, Chemical Computing Group Inc, Montreal Canada). From 1977 through October 30 2013, a total of 340 unique chemicals were identified for which PBPK models have been developed, corresponding to a total of 798 published models in a PBPK literature corpus containing a total of 1707 articles. This compiled molecular/bibliographic dataset provides a chemical structure-centric basis for identifying relevant PBPK modeling literature for new chemical entities that need to be modeled. Within suitable distance metrics, models for nearest neighbor chemicals can closely mimic the pharmacokinetics of a novel chemical structure when a chemical lacks previous PBPK models. This tool, when coupled with HTS toxicity and external exposure predictions, can provide a risk-based rather than hazard-based prioritization of chemicals for regulatory purposes.
10:00am-10:15am Intermission

10:15am-10:35am
CINF 137: Predicting off target profiles using local 3D QSAR models generated 'on the fly'

Brian Masek1 , brian.masek@certara.com, Alexander Steudle1 , Lei Wang1 , Bernd Wendt1
1 Certara Inc, Saint Louis, Missouri, United States

Traditionally, in silico predictive toxicology has focused on development of “global” QSAR models. In this talk, we present an alternative strategy, where predictions of off-target pharmacology are based on local 3D QSAR models built “on the fly.” This approach offers a number of advantages. First, all the information available at any given moment is used to make predictions. Second, the 3D QSAR models can be used not only for prediction of biological activity, but also the models are interpretable and can guide the design of new compounds to improve (or avoid) desirable (or undesirable) activity. Several examples will be provided to illustrate the benefits of this approach.

10:35am-10:55am
CINF 138: Linking transporter interaction profiles to in vivo side effects


Eleni Kotsampasakou1 , Sylvia Escher3 , Andreas Jurik2 , Harald Sitte4 , Lukas Pezawas5 , Gerhard Ecker1 , gerhard.f.ecker@univie.ac.at
1 Dept Medicinal Chemistry, Wien, Austria; 2 Dept. of Medicinal Chemistry, University of Vienna, Vienna, Austria; 3 Chemical Risk Assessment, Fraunhofer Institute of Toxicology and Experimental Medicine, Hannover, Germany; 4 Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria; 5 Department of Psychiatry and Psychotherapy, Medical University of Vienna, Vienna, Austria

Transmembrane transporters (TMTs) are increasingly recognized to be involved in ADMET properties of drugs. In addition, inhibition of TMTs such as P-glycoprotein or the bile salt export pump (BSEP), is quite often leading to drug-drug interactions and thus might be linked to in vivo toxicity. In our attempt to link in vitro transporter interaction profiles to in vivo side effects, we followed different approaches:

-) Docking of a set of tiagabine analogs into a protein homology model of the GABA transporter GAT1 identified the molecular basis of ligand-transporter interaction. Translation of the molecular interaction profile into a set of pharmacophoric features followed by in silico screening and experimental testing of top ranked hits identified thyroid hormones as GAT1 inhibitors. [1]

-) For prediction of CNS side effects, the interaction profiles of 40 antidepressant drugs with 20 receptors and transporters expressed in the brain were linked to side effects extracted from clinical meta studies using a modified PLS algorithm. This mechanistic driven approach resulted in a set of significant correlations between selected side effects and distinct interaction profiles. Besides many well known pharmacological principles, such as the association of sedation with H1-receptors, also novel mechanisms including the potential involvement of 5-HT6 antagonism in weight gain, could be observed. [2]

-) For prediction of hyperbilirubinemia, a combined statistical/mechanistic approach was used. A small set of compounds extracted from the eTOX databse were characterized by a set of physicochemical descriptors. Subsequently, the descriptor matrix was used is input vector for classification algorithms. Furthermore, as inhibition of the transporters OATP1B1 and OATP1B3 is often linked to hyperbilirubinemia, predicted interaction profiles with these two transporter were also included as descriptors. Unfortunately, this did not improve the models, which might be due to the multifactorial and complex nature of hyperbilirubinemia and/or the small and unbalanced data set. Further studies including also other hepatotoxicity endpoints will allow assessing the general feasibility of this approach.

Acknowlegement
We gratefully acknowledge financial support provided by the Austrian Science Fund (grant F03502) and by the Innovative Medicines Initiative (eTOX, grant 115002)

[1] Jurik A, et al., J Med Chem 2015, 2159
[2] Michl J, et al., Europ. Neuropsychopharmacol, 2014, 1463

10:55am-11:15am
CINF 139: Enhancing structural alerts for toxicity with mechanism-based metabolism and reactivity models

S. Joshua Swamidass2 , swamidass@wustl.edu, Tyler Hughes2 , Grover Miller1
1 Dept of Biochem and Mol Biol, Little Rock, Arkansas, United States; 2 Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, United States

“Stuctural alerts” flags molecules that contain specific substructures (like epoxides or alkynes) as likely to form reactive metabolites, and is used widely within industry, the FDA, and discovery tools (e.g. Derek, SpotRM, ChEMBL, PAINS Glaxo Wellcome Hard Filters, and others). Structural alerts account for metabolism (inadequately) by flagging substructures of a molecule that, if metabolized, will yield reactive molecules. Structural alerts are limited because they neither account for the likelihood (or unlikelihood) of metabolism to bioactivate the alert, nor do they account for the effect of substituents on the bioactivated molecule’s reactivity. Molecules are flagged even if (1) their alert substructure is not metabolized, (2) the reactive metabolite is further metabolized into a benign form, or (3) an alternate, non-activating pathway is responsible for metabolic clearance of the compound. Consequently, alerts are very difficult to interpret; safe molecules are often flagged as toxic and unsafe molecules slip through the process. Overcoming these limitations, we present an approach that combines alerts with metabolism and reactivity models. This approach explicitly models metabolism and reactivity to (1) make mechanistic predictions, (2) identify problematic molecules not associated with alert structures and (3) discriminate between molecules with bioactivated alerts vs. those with alert structures that are not bioactivated into the toxic form.

11:15am-11:35am
CINF 140: Toxicity biomarker identification and drug repurposing using gene co-expression modules

Gregory Tawa2 , gtawa@hotmail.com, Mohamed AbdulHameed1 , Danielle Ippolito3 , kamal kumar1 , John Lewis3 , Jonathan Stallings3 , Anders Wallqvist1
1 DoD BHSAI, Frederick, Maryland, United States; 2 NCATS division, National Institutes of Health, Doyelstown, Pennsylvania, United States; 3 U.S. Army Center for Environmental Health Research, Fort Detrick, Frederick, Maryland, United States

Disease diagnosis and therapy are often ineffective if they target individual proteins. Proteins work together in groups, or modules, and perturbation of these modules can lead to disease. Furthermore, administration of drugs may induce particular patterns of module activation. Identifying module activation patterns, or profiles, associated with diseases or drugs provides important insights for diagnoses and therapies. Module activation profiles linked to disease can be mined for diagnostic biomarkers. Drugs can be repurposed by finding diseases with module activation profiles anticorrelated with that of the drug’s. Here, we have used DrugMatrix, a toxicogenomics database containing organ-specific gene expression data matched to dose-dependent chemical exposures and clinical pathology assessments in Sprague Dawley rats, to identify groups of co-expressed genes (modules) specific to injury endpoints in the liver and to drugs that cause these endpoints. We identified 78 gene co-expression modules associated with 25 injury endpoints categorized from clinical pathology, organ weight changes, and histopathology. Using gene expression data associated with an injury or a drug perturbation, we showed that these modules exhibited different patterns of activation characteristic of each injury or drug. We proposed three gene sets characteristic of liver fibrosis, steatosis, and general liver injury, based on genes from the co-expression modules. Putative biomarkers were chosen from these gene sets and validated in animal models of liver fibrosis and steatosis. We then clustered drugs with similar modules profiles. For each drug in a cluster we pulled curated diseases from the Comparative Toxicogenomics Database to collect all diseases associated with the cluster. We showed that the cluster curated disease list overlapped significantly with the curated disease list for each molecule, therefore, validating the idea that drugs with similar profiles can be repurposed for each other’s disease. However we found many molecules within the clusters that have no prior association with the cluster disease list. Repurposing hypotheses were generated from these cases and await experimental validation.



CINF: Find the Needle in a Haystack: Mining Data from Large Chemical Spaces
8:30am - 11:50am
Wednesday, August 19

Room 104B - Boston Convention & Exhibition Center
David Deng, Organizing
David Deng, Presiding
8:30am-8:35am Introductory Remarks

8:35am-9:05am
CINF 126: Frequency of activity cliffs and distribution over different potency ranges

Dagmar Stumpfe1 , stumpfe@bit.uni-bonn.de, Dilyana Dimova1 , Jürgen Bajorath1
1 Life Science Informatics, University of Bonn, B-IT, Bonn, Germany

Activity cliffs are defined as pairs or groups of structurally similar compounds having large differences in potency. Most activity cliffs are formed in a coordinated manner involving multiple analogs with large potency variations, as opposed to isolated cliffs (formed by pairs of compounds in the absence of structural neighbors with varying potency). The majority of compound activity data available in medicinal chemistry databases such as ChEMBL fall into the nanomolar range. This data imbalance also affects the distribution of activity cliffs over different potency ranges. In 2012, depending on the similarity methods or criteria applied to assess activity cliffs, 30-40% of all bioactive compounds were found to participate in at least one cliff. When the union of different similarity approaches was considered this proportion increased to more than 65%. Over the past years, the amount of compound activity data has substantially increased. For example, between 2011 and 2015, high-confidence activity data available in ChEMBL have nearly doubled. In this presentation, trends in activity cliff frequency and potency distribution are monitored over time.

References
Stumpfe, D.; Bajorath, J. Frequency of Occurrence and Potency Range Distribution of Activity Cliffs in Bioactive Compounds. J. Chem. Inf. Model. 2012, 52, 2348-2353
Stumpfe, D.; Hu, Y.; Dimova, D.; Bajorath, J. Recent Progress in Understanding Activity Cliffs and Their Utility in Medicinal Chemistry. J. Med. Chem. 2014, 57, 18-28.

9:05am-9:35am
CINF 127: Random indexing for comparing path-based chemical fingerprints

Patrick Devaney2 , p.devaney@formatherapeutics.com, David Lancia2 , Jared Milbank2 , Mary Bradley1
1 Informatics, FORMA Therapeutics, Brookline, Massachusetts, United States; 2 Computational Discovery, Forma Therapeutics, Inc, Watertown, Massachusetts, United States

Random indexing(RI) is a high-dimensional representation that was developed for document searching. RI is also a dimensionality reduction (DR) technique. RI can take very sparse, very high-dimensional, path-based chemical fingerprints as inputs and reduce them to a user-selected intermediate dimensionality that is suitable for input to standard DR embedding algorithms, such as t-distributed Stochastic Neighborhood Embedding (t-SNE). RI can accomplish this reduction because it is yet another technique that, like Locality Sensitive Hashing, relies upon the Johnson-Lindenstrauss theorem to put an error bound on the distortion of DR.

To date, chemical fingerprint inputs to t-SNE have been either Tanimoto Distances (TD) of folded fingerprints or the most significant bits (by information gain measure) of raw fingerprints. Both methods throw away information prior to the t-SNE mapping. RI retains all information. We compare the statistical properties of TD vs RI over a range of parameters; and we evaluate RI/t-SNE maps of various chemical libraries.

The maps have been generated by the latest, Barnes-Hut accelerated version of t-SNE, which is capable of mapping libraries of over one million molecules. We discuss scalability issues that we encountered; and, also, the issue of parametric versus non-parametric mapping.

9:35am-10:05am
CINF 128: Scaffold-based analytics: Enabling hit-to-lead decisions by visualizing chemical series linked across large datasets


Deepak Bandyopadhyay1 , Deepak.2.Bandyopadhyay@gsk.com, Constantine Kreatsoulas1 , Pat Brady1 , Genaro Scavello1 , Dac-Trung Nguyen2 , Tyler Peryea2 , Ajit Jadhav2
1 GlaxoSmithKline, Collegeville, Pennsylvania, United States; 2 National Center for Advancing Translational Sciences, Bethesda, Maryland, United States

We present a method for visualizing and navigating large and diverse chemical spaces, such as screening datasets, along with their activities and properties. Our approach is to annotate the data with all possible scaffolds contained within each molecule using an exhaustive algorithm developed at NCATS. We have developed a Spotfire visualization that is used to drive the hit triage process. Progression decisions can be made using aggregate scaffold parameters and data from multiple datasets merged at the scaffold level. This visualization easily reveals overlaps that help prioritize hits, highlight tractable series and posit ways to combine aspects of multiple hits . The SAR of a large and complex hit is automatically mapped into all constituent scaffolds making it possible to navigate, via any shared scaffold, to all related hits. This scaffold “walking” helps address bias toward a handful of potent and ligand-efficient molecules at the expense of coverage of chemical space. The mapping also automates the laborious process of substructure searches within a dataset as structures are now linked to pre-processed search results. We compare the NCATS scaffold generation method with published screening triage methods such as nearest-neighbor clustering, data-driven clustering and scaffold networks. We believe that our Spotfire visualization used in combination with structure annotation provides a novel view of large and diverse datasets. This allows teams to effortlessly navigate between structurally related molecules and enriches the population of leads considered and progressed in a manner complementary to established approaches.
10:05am-10:20am Intermission

10:20am-10:50am
CINF 129: Resolving cryptic needles to molecular structures: The GtoPdb experience

Christopher Southan1 , cdsouthan@gmail.com, Adam Pawson1 , Joanna Sharman1 , Helen Benson1 , Elena Faccenda1
1 IUPHAR/BPS Guide to PHARMACOLOGY, University of Edinburgh, Göteborg, Sweden

The IUHAR/BPS Guide to PHARMACOLOGY database (GtoPdb) team has data-mined bioactive chemistry since 2009 (PMID 24234439). Consequently, during the curation of 7586 needles (as ligand entries) we have grappled extensively with the haystack. This work outlines challenges of mapping company code numbers to structures (n2s) and lead compounds from a haystack of anywhere between one and five million bioactive structures. By the time these are assigned non-proprietary names, data linkages can usually be found. However, other valuable needles are lead compounds approaching clinical development that can also be incisive pharmacological tools. The use of company codes to designate these is often obfsucatory, with some journals even allowing blinding where clinical reports have no n2s or links to primary data. The efforts to resolve the NCATS and MRC repurposing candidates exemplified the problem (PMID 23159359). Notwithstanding we have now curated 50 AZDs n2s including Open Innovation structures. Codes also present back-mapping problems where we need to synonym-chain a) first-filings and early papers b) consecutive different codes via mergers c) INN or USAN and d) an eventual trade name. The mining challenges are compounded by ad hoc permutations of hyphen, no space, and space, comma inclusions, dropping a leading zero, appending suffixes or even ghost codes. In some cases we curate plausible patent structures pending disclosure. For others we found the vendor-only n2s corroborated via patent match. Recent reports of potent lead structures are particularly difficult to name-link and synonym-map. To ameliorate the problem we have recently introduced binding synonyms such as “compound 17d [PMID 23099093]” or “example 98 (WO2011020806)”. This means users can not only immediately locate the exact structures inside documents, including via our PubChem submissions, but often find expanded SAR series. Broader issues of n2s obfuscation will be discussed, including the inherent contradiction with the trend towards greater clinical trials transparency

10:50am-11:20am
CINF 130: Current and future developments of Markush technology in drug discovery

David Deng1 , ddeng@chemaxon.com, Árpád Figyelmesi1
1 ChemAxon, Cambridge, Massachusetts, United States


Markush claims are widely used in chemical patents and combinatorial libraries to define large chemical spaces. However, the complexity of Markush structures makes their representation, searching and enumeration very difficult. It is even more challenging when analyzing the overlap between two patent Markush structures. All of existing Markush technologies have certain limitations. The search systems are proprietary hence unavailable for in-house integration. One of the issues is that the search algorithms were not well published and further developed for decades, despite the high technical demand. This presentation summarizes the current state of Markush technology, the existing challenges and highlights some promising new developments relevant to today's drug discovery processes.

11:20am-11:45am
CINF 131: GPU-accelerated virtual screening: Rationale, challenges, and case studies

Olexandr Isayev1 , olexandr@olexandrisayev.com, Denis Fourches2
1 UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States; 2 Department of Chemistry, North Carolina State University, Raleigh, North Carolina, United States

With the unprecedented growth of chemical databases incorporating up to several hundred billions of synthetically feasible chemicals, modelers are not in shortage of chemicals to process. Importantly, such Big Chemical Data offers humongous opportunities for discovering novel bioactive molecules. However, the current generation of cheminformatics software tools is not capable of handling, characterizing, and processing such extremely large chemical libraries. In this presentation, we will discuss the rationale and the main challenges (theoretical and technical) for screening very large repositories of compounds in the current context of drug discovery. Moreover, we will present several proof-of-concept studies regarding the screening of extremely large libraries (>1 billion compounds) using our novel GPU-accelerated cheminformatics platform to identify molecules with defined bioactivity. Overall, we will show that GPU computing represents an effective and inexpensive architecture to develop, employ, and validate a new generation of cheminformatics methods and tools ready to process billions of compounds.
11:45am-11:50am Concluding Remarks
CINF: Chemical Information Skills: The Essential Toolkit for Chemical Research— A Joint CINF-CSA Trust Symposium
9:00am - 12:40pm
Wednesday, August 19

Room 104A - Boston Convention & Exhibition Center
Grace Baysinger, Jonathan Goodman, Organizing
Grace Baysinger, Jonathan Goodman
Financially supported by: CSA Trust, Presiding
9:00am-9:05am Introductory Remarks

9:05am-9:25am
CINF 118: Chemical Information Sources Wikibook - the open source created by chemical information professionals for chemical information professionals
View Session Detail

Charles Huber1 , huber@library.ucsb.edu
1 Davidson Library, University of California, Santa Barbara, California, United States

Based on the landmark book by Gary Wiggins, Chemical Information Sources became a wikibook under the leadership of Ben Wagner. Now entering its next stage, the CIS Wikibook is designed to be an open access source of resources for a wide range of chemical information research and teaching. The talk will cover the current content of the CIS Wikibook, plans for its future, and how you can get involved.

9:25am-9:45am
CINF 119: Soft skills of chemical research: Academic integrity and research ethics

Donna Wrublewski1 , dtwrub@caltech.edu, Michelle Leonard2 , Amy Buhler2 , Neelam Bharti2 , neelambh@ufl.edu
1 Caltech Library MC 1-43, California Institute of Technology, Pasadena, California, United States; 2 Marston Science Library, University of Florida, Gainesville, Florida, United States

The foundation of any scientist should be ethical behavior. Like many of these so-called 'soft skills', talking about ethics is something that is
assumed to be picked up on the fly, and instruction is minimal when present, and more often rolled into nebulous clauses such as 'honor codes'. Although not specific to chemistry, librarians teaching these skills is a growing phenomenon in academic libraries. This talk will look at academic integrity instruction as it underpins many of the skills and tools that will be discussed elsewhere in this session, and will describe specific outreach on academic integrity to chemists.

9:45am-10:05am
CINF 120: Integrating bibliographic management tools in chemical information literacy instruction

Svetla Baykoucheva1 , sbaykouc@umd.edu, Joseph Houck2
1 White Chemistry Library, University of Maryland, College Park, Maryland, United States; 2 Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, United States

Even at early stages of their college education, students are often required to write papers, cite literature, and create bibliographies. They are not provided, though, with systematic training to be able to do it efficiently. This paper presents a strategy for integrating a bibliographic management program (EndNote Online) in a chemical information literacy instruction program at the University of Maryland College Park. While learning how to search chemistry databases for literature and property information, students in several undergraduate chemistry courses also acquired basic knowledge of how to use EndNote to manage citations. Based on the results from an online assignment and the feedback from students, this approach proved to be very successful not only in teaching students how to use the chemistry resources, but also in making this instruction an interesting and rewarding experience.

10:05am-10:25am
CINF 121: Replacing the traditional graduate chemistry literature seminar with a chemical information literacy course

Vincent Scalfani3 , vfscalfani@ua.edu, Stephen Woski1 , Patrick Frantom2
1 Dept of Chemistry Box 870336, Univ of Alabama, Tuscaloosa, Alabama, United States; 2 Department of Chemistry, University of Alabama, Tuscaloosa, Alabama, United States; 3 University Libraries, University of Alabama, Tuscaloosa, Alabama, United States

Chemistry graduate students at The University of Alabama (UA) are required to complete a literature seminar in their second year of residence. The literature seminar includes a public presentation and a written research report on a primary literature chemistry topic. Traditionally, students prepared the seminars independently, though seeking advice and help from their advisors, chemistry faculty and peers was encouraged. As there was little formal instruction in place for the literature seminar, the style, quality and content of the seminars varied widely. For the past two years the University Libraries has partnered with the Department of Chemistry to help advance the literature seminar program. We developed and piloted a new semester long course, CH584−Literature and Communication in Graduate Chemistry. CH584 replaced the traditional literature seminar program and provided structured formal instruction to all second year chemistry graduate students. As a result, students received equal instruction and a consistent experience during the preparation of their literature seminars. The main focus of the course was information literacy; that is, effective retrieval and critical analysis of the chemical literature. Briefly, the course topics included chemistry information resources, critical analysis of the literature, scientific writing and scientific presentations. This presentation will discuss our experiences and challenges with the new chemical information literacy course at UA.
10:25am-10:40am Intermission

10:40am-11:00am
CINF 122: Chemical information skills: A searcher’s perspective

Elaine Cheeseman1 , echeesema@cas.org
1 Science IP, Chemical Abstracts Service, Newtown Square, Pennsylvania, United States

A scientific searcher uses their knowledge of science and the various information systems to retrieve requested information. Generally, the requests received are more challenging than those that can be easily retrieved by less experienced searchers and less sophisticated systems. For a scientific searcher, it is important to have a solid science background and the ability to master various search techniques and tools. Search systems should be well understood by the searcher including knowledge of coverage and currency, as well as search strategy design. The searcher should collaborate with the requestor to understand the information needed and then provide search results that deliver on the expectations.

Understanding the science is important throughout the search project: initially, to collaborate with the requestor and design an effective search strategy, then, to evaluate the results and refine the query and finally, to interpret the information and report the findings.

But the information cannot be obtained without proficiency of the various scientific information systems available to the searcher. General information search products designed for the masses are typically insufficient to fulfill requests for scientific information submitted to search professionals. Systems powerful enough to address such complex requests are comparably sophisticated. Search professionals must be familiar with the information within the system as well as proficient with its functionality to effectively extract what’s needed. Finally, the searcher needs to stay knowledgeable about new features and products in this ever evolving field.

11:00am-11:20am
CINF 123: Withdrawn

11:20am-11:40am
CINF 124: Patents - the essential multifunctional tool for science, business, and intellectual property information

Edlyn Simmons1 , edlyns@earthlink.net
1 Simmons Patent Information Service, LLC, Fort Mill, South Carolina, United States

Like the Swiss Army knife, patents can perform the functions of many different tools. They provide access not only to scientific information, but also to information about the activities of businesses and research institutions around the world and, their historically most important function, to warn against infringement of intellectual property rights.
The patent specification is required to disclose the way to make and use scientific and technical innovations, and a single innovation is often described in a family of documents that provide their own translations. Patent claims specify exactly what activities are to be avoided in order to avoid infringement lawsuits. The first page of a patent document gives the names of individuals who researched and developed the disclosed invention and of the organizations that own the inventions and employ the inventors. That information can be a useful tool for competitive intelligence, for tracking trends in research and product development, and for job hunters and hiring managers looking for insights about where to work or who to employ.

11:40am-12:00pm
CINF 125: Career information resources for graduate students and postdocs

Grace Baysinger1 , graceb@stanford.edu
1 Swain Chem & Chem Eng Library, Stanford University Libraries, San Jose, California, United States

Graduate students and postdocs are often unfamiliar with career options and what resources are available to help them make a more informed decision. An Individual Development Plan is a tool that can help benchmark skills needed before entering the workforce. Learning about career pathways, about potential employers, about the cost of living and salaries, and about preparing for an interview are a few of the practical information needs students have when considering career options. Complementing career development services on campus or from professional societies, libraries acquire books that include career information. Library guides also include job resources. Librarians can help provide search tips on how to find out more about potential employers in databases. This talk will present a summary of library resources and search strategies to help graduate students and postdocs be more informed when looking for a job.
CINF: Find the Needle in a Haystack: Mining Data from Large Chemical Spaces
1:00pm - 4:30pm
Wednesday, August 19

Room 104B - Boston Convention & Exhibition Center
David Deng, Organizing
David Deng, Presiding
1:00pm-1:05pm Introductory Remarks

1:05pm-1:30pm
CINF 149: Data driven multi-object optimization (MOO) in drug design
View Session Detail

Shahar Keinan1 , skeinan@cloudpharmaceuticals.com, Elizabeth Hobbs1 , Elizabeth Hatcher-Frush1
1 Cloud Pharmaceuticals, Inc., Durham, North Carolina, United States

A high quality drug must exhibit a balance of many properties, including potency, ADME and safety. Achieving this balance of often-conflicting requirements is a major challenge in drug discovery. We report here testing several Multi-Object Optimization (MOO) algorithms for optimizing molecular structures in drug design environment. These included Genetic Algorithm (GA), Inverse Design (ID, using compound property) and filter cascades. Visualization of multi-parameter data is difficult, and in many cases does not represent all data values, or considers the relative importance of each property or the uncertainty in the data. The time it takes to calculate specific properties can vary widely, from several hours, as is the case in predicting protein affinity (calculated with QM/MM) to several seconds (such as solubility). Our testing initially compared the algorithms' behavior when optimizing only two properties. Based on the results, we subsequently tested optimization of multiple properties. We compared both the optimized molecules (found in the Pareto surface), as well as the number of needed calculations to reach the optimized structure. We concluded that ID MOO should be used when a compound property is possible. When a compound property is not possible, GA MOO should be used, although the number of required calculations can be quite high. Adding a filtering step decreases the number of required calculations while still keeping the same Pareto front.

1:30pm-1:55pm
CINF 150: Multiobjective transformation based de novo design: A case study of surfactants

Christos Kannas1 , chriskannas@gmail.com, Warren Read23 , Noel Ruddock3 , Martyn Fletcher4 , Tom Jackson4 , Robert Stevens2 , Jerry Winter3 , Peter Willett1 , Val Gillet1
1 Information School, University of Sheffield, Sheffield, United Kingdom; 2 School of Computer Science, University of Manchester, Manchester, United Kingdom; 3 Unilever R&D, Port Sunlight, United Kingdom; 4 Cybula Ltd., York, United Kingdom

De novo molecular design involves the computation search of an immense space of feasible molecules to select those with the highest chances of becoming suitable products. One approach, transformation-based de novo design, exploits synthetic chemistry knowledge in the design process, and in this presentation we describe a transformation-based de novo algorithm to design low cost, structurally diverse surfactant molecules.

Surfactants are amphiphilic compounds, i.e., they contain both hydrophobic (oil soluble) and hydrophilic (water soluble) components. The starting ingredients for surfactant design include molecules that are typically either hydrophobic or hydrophilic but not necessarily both. The implementation of the algorithm includes an initial step where molecules, in SMILES, and transformations, in SMIRKS, are retrieved from an online semantic repository. A prediction model is used to predict the surfactant profile of the molecules and they are divided into two populations; one for molecules predicted to be surfactants and the other for molecules predicted to be non-surfactants. The former is from where the solutions of the algorithm are selected, and the latter is the pool of molecules which might be useful in designing new surfactant-like molecules. In subsequent steps, molecules are selected from both populations and combined to form new molecules using the retrieved transformations. The selection of molecules for both populations is based on fitness proportional Pareto ranking and structural diversity. The surfactant prediction model requires a pharmacophoric analysis of the molecules to be performed to determine information about their hydrophilic groups and hydrophobic groups. The algorithm’s fitness calculation focuses on the two biological properties that predicted surfactant molecules have (a/w surface tension and critical micelle concentration) and their production cost. Our initial experiments indicate that the algorithm produces interesting surfactant-like solutions that combine substantial structural diversity with reasonably low production-costs.

1:55pm-2:20pm
CINF 151: Mapping chemical data with Diversity Genie

Igor Filippov2 , igor.v.filippov@gmail.com, Iwona Weidlich1
1 CODDES LLC, Rockville, Maryland, United States; 2 VIF Innovations, LLC, Gaithersburg, Maryland, United States

We present new capabilities for data visualization and data mining developed in the most recent version of Diversity Genie - a software tool for chemical dataset analysis and manipulation. Diversity Genie allows to map molecular properties - calculated or user-supplied - onto a spacial representation of the set of molecules. Similar molecules are co-located together without any a priori presumption about the number of clusters or similarity thresholds. Diveristy Genie also enables easy data filtering, set sorting, merging, and conversion for millions of molecules on a desktop computer. It estimates the diversity of the dataset and facilities comparison of different sets.


A set containing three labeled subsets - fluoroquinolones, cephalosporins, and penicillins.
2:20pm-2:30pm Intermission

2:30pm-2:55pm
CINF 152: Extraction of structure-activity relationship information from activity cliff clusters

Dilyana Dimova1 , dimova@bit.uni-bonn.de, Dagmar Stumpfe1 , Jürgen Bajorath1
1 Life Science Informatics, University of Bonn, B-IT, Bonn, Germany

Activity cliffs are formed by structurally similar compounds having large differences in potency, represent the extreme form of structure-activity relationship (SAR) discontinuity, and are thought to be rich in SAR information. Although the original definition of activity cliffs focuses on compound pairs, the vast majority of cliffs are formed in a coordinated manner by series of analogs with large potency variations. In network representations, coordinated activity cliffs emerge as clusters. Activity cliff clusters formed by bioactive compounds were systematically identified and their topologies were analyzed. In addition, different computational methods for the extraction of SAR information from activity cliff clusters have recently been developed. Using these methods, activity cliff clusters can be prioritized for further exploration and SAR information associated with clusters can be analyzed.

2:55pm-3:25pm
CINF 153: Withdrawn
3:25pm-3:35pm Intermission

3:35pm-4:00pm
CINF 154: Drug discovery tool pipeline - the best of all worlds

Carsten Detering1 , detering@biosolveit.com
1 BioSolveIT Inc, Bellevue, Washington, United States

Drug discovery is like fighting the Hydra: One challenge mastered means 9 more to come. No one software can handle them all. However, workflow systems like KNIME® offer you a single interface to a myriad of tools and help you to organize, standardize and streamline your approach and be best prepared for the battle.
Especially large virtual chemical libraries can be easily handled and explored, giving the scientist the opportunity to exploit in house chemistry with the touch of a button.
The talk will highlight a number of highly efficient strategies implemented this way.

4:00pm-4:25pm
CINF 155: 3D characteristics of efficient protein-protein interactions inhibitors: A big data analysis

Melaine KUENEMANN12 , melaine.kuenemann@univ-paris-diderot.fr, Laura M. L. Bourbon12 , Céline M. Labbé12 , Bruno O. Villoutreix12 , Olivier Sperandio12
1 UMRS 973, Université Paris Diderot, Sorbonne Paris Cité, Paris 75013, France, France; 2 U973, Inserm, Paris 75013, France

The specific properties of protein–protein interactions (PPI) (often described as flat, large and hydrophobic) make them harder to tackle with low-molecular-weight compounds. Yet, learning from the properties of successful examples of PPI interface inhibitors (iPPI) at earlier stages of developments, has been pinpointed as a powerful strategy to circumvent this trend.
To this end, a previous study made on iPPI bioactive conformations has highlighted four 3D characteristics1. Those properties describe either the shape of the compounds (globularity) or the 3D distributions of the hydrophobic and hydrophilic interacting regions of the compounds (IW4, EDmin3, CW2: VolSurf descriptors2). More specifically the most essential property revealed in the analysis (EDmin3) illustrates how iPPI manage to bind to the hydrophobic patches often present at the core of PPI targets. Interestingly, the absence of correlation of such properties with neither the hydrophobicity nor the size of the compounds opens new ways to design potent iPPI with better pharmacokinetic features.
The newly identified properties were further confirmed as characteristic of iPPI on larger dataset. The “active” dataset (i.e iPPI) was extracted from iPPI-DB3 and TIMBAL4 databases. The inactive dataset was made from the complete available commercial dataset (ZINC: 17 millions of compounds), a dataset of compound with bioactivity data (ChEMBL20: 1.7 millions of compounds) and a dataset of compounds in biological testing, preclinical or clinical phase (MDDR: 200,000 compounds). The study was made with statistical and chemoinformatics analysis in order, to compare all datasets. This “big data” analysis has not only confirmed our previous results but also allowed us to estimate the proportion of the purchasable chemical space that is compatible with PPI targets modulation. Conversely, it now gives us the possibility to identify existing drugs that could be used also as modulators of PPI target in the context of a drug repositioning program. It finally provided us with the chemical tools to determine privileged substructures favorable to PPI target modulation.
1. Kuenemann, M. A.; & al, Journal of chemical information and modeling 2014, 54 (11), 3067-79.
2. Cruciani, G.; & al, European journal of pharmaceutical sciences 2000, 11 Suppl 2, S29-39.
3. Labbe, C. M.; & al, Drug discovery today 2013, 18 (19-20), 958-68.
4. Higueruelo, A. P.; & al, Chemical biology & drug design 2009, 74 (5), 457-67.
4:25pm-4:30pm Concluding Remarks
CINF: Chemical Information Skills: The Essential Toolkit for Chemical Research— A Joint CINF-CSA Trust Symposium
1:30pm - 5:15pm
Wednesday, August 19

Room 104A - Boston Convention & Exhibition Center
Grace Baysinger, Jonathan Goodman, Organizing
Grace Baysinger, Jonathan Goodman
Financially supported by: CSA Trust, Presiding
1:30pm-1:35pm Introductory Remarks

1:35pm-1:55pm
CINF 141: So I have an SD File...what do I do next?

Rajarshi Guha1 , rajarshi.guha@gmail.com, Noel O'Boyle2 , baoilleach@gmail.com
1 NCATS, Manchester, Connecticut, United States; 2 NextMove Software, Cambridge, United Kingdom

Cheminformatics tasks cover a wide range of topics, from manipulating chemical structure file formats to predicting properties of chemical structures. The common theme underlying all these tasks is the handling of chemical structures. Yet frequently key aspects of structural information are lost, altered or ignored during even the most routine of processing tasks either through a misunderstanding of how tools work, limitations of the tools used or unfamiliarity with the features (or lack thereof) of particular chemical file formats.

Here we present a compendium of the “Dos and Don’ts of cheminformatics”. Using examples drawn from over a decade of involvement with open source cheminformatics toolkits [1] [2] and a variety of cheminformatics applications, as well as from recent commentaries on chemical structure databases, we illustrate some misconceptions regarding how chemistry data is stored, propose best practices for preserving chemical information intact, and end with a cautionary suggestion: “don’t trust, but verify”.

References:
[1] Steinbeck, C. et al., J.Chem. Inf. Comput. Sci., 2003, 43, 493-500
[2] O’Boyle, N.M. et al., J. Cheminf., 2011, 3, 33

1:55pm-2:15pm
CINF 142: Chemical literacy for the ages: Essential skills in 2D chemical representation

Leah McEwen1 , lrm1@cornell.edu, Evan Hepler-Smith2
1 Clark Library, Cornell University, Ithaca, New York, United States; 2 Program in History of Science, Princeton, Princeton, New Jersey, United States

2D representation of chemical structures is the lingua franca of chemistry. 2D representations enable scientists to communicate and link data across a broad diversity of fields and chemical questions, to search the literature and other databases for useful chemical information, and to explore research pathways. 2D structural formulas have been around for 150 years, and for nearly as long, chemists have been developing internationally sanctioned standards for the use of graphical formulas, linear notation, nomenclature, and machine representations. Standards like IUPAC rules of nomenclature provide the grammar and dictionary of this lingua franca and the development and curation of such standards is an ongoing process whose trajectory is shaped by decisions made in the context of the chemical knowledge and information technologies of the past. Like the grammar and dictionaries of normal language, if properly understood, they can generate a deeper insight into the underlying ideas that the representations express. When ignored or treated as rules to be memorized and followed thoughtlessly, they can lead to frustration, misunderstanding, and error. Training in critical literacy in 2D representation will help chemists improve their skills in presentation, information retrieval, modeling, and publishing, and connect these activities into laboratory workflow. Incorporating a historical component into this training will help students gain appreciation for how we came to organize chemistry this way and why 2D structural representation and nomenclature are still the most fundamental information literacy skills for chemists to master today. The work presented here is part of the OnLine Chemistry Course in Cheminformatics, organized by the ACS CHED Committee on Computers in Chemical Education (http://olcc.ccce.divched.org/).

2:15pm-2:35pm
CINF 143: From lab to the libraries: A new journey

Neelam Bharti1 , neelambh@ufl.edu
1 Marston Science Library, University of Florida, Gainesville, Florida, United States

A research community requirements are more research focused and subject specific from libraries. Chemistry librarians are expected to have thorough familiarity with subject and publishers update, expected to take initiative for new subject material purchase, and should have a sufficient scientific background to know technical language, vocabulary and nomenclature. Concept to articulate in this presentation will be, how a researcher or a scientist can emerge as a library professional and how lab experience can be a given advantage to serve as a subject librarian. While at the same time it will highlight what professional challenges it brings when a scientist turned librarian deal with the subject community and academic library system.

2:35pm-2:55pm
CINF 144: Experiments with chemists and information

Jonathan Goodman1 , jmg11@cam.ac.uk
1 Dept of Chemistry, Cambridge, United Kingdom

Courses on chemical information have been run in the chemistry department of the University of Cambridge since the end of the last century. From the beginning, the amount of data available seemed overwhelming. The centre of gravity of the course is gradually shifting from paper journals towards open data. The students are still introduced to unfamiliar ways of handling information, but now have more expertise in handling data in social media than the faculty. The challenge for the future is to direct the students' skills and knowledge in information management towards improving their chemical abilities and enlightening their teachers.



2:55pm-3:10pm Intermission

3:10pm-3:30pm
CINF 145: ChemData: A web application for learning chemical informatics

Stuart Chalk1 , schalk@unf.edu
1 Department of Chemistry, University of North Florida, Jacksonville, Florida, United States

Chemistry students taking a chemical informatics course typically have no background in information science, ChemData is a prebuilt website application framework that provides a resource that students can download and install to use, learn about, and contribute to chemical informatics. Lectures on metadata, datatypes, REST, API's, unique identifiers, searching, relational databases, XML, JSON etc. can be taught using the application. The framework can also be used as a final project in the course to test their skills where students are required to; find a dataset, download and import it into Excel, clean up the data, organize and annotate the data prior to ingestion into a MySQL database.
This presentation will talk about the educational and technical approaches to implementing the framework and a review of its potential to increase those that go into chemical informatics.

3:30pm-3:50pm
CINF 146: Improving geographically distributed research with real time collaboration

Andras Stracz1 , astracz@chemaxon.com, Aurora Costache2
1 ChemAxon, Budapest, Hungary; 2 ChemAxon, Cambridge, Massachusetts, United States

Research and development becomes more and more de-centralized in all chemistry related fields and so do meetings that are essential for sharing and developing ideas.
In this presentation we look at a biotech company in the Boston area, involved in small molecule research. Teams in this organization are spread across sites and development projects often involve multiple companies in different locations. Sharing and discussing chemical ideas easily and efficiently among members of these projects is essential for their success.
Marvin Live, a novel application from ChemAxon, is transforming the way chemists can approach their brainstorming sessions, project meetings and discussions with external partners. Everyday issues are tackled through automatically capturing novel ideas (ensuring that no important bits of information get lost as the discussion moves along), by connecting other cheminformatics applications relevant to the project (e.g. calculations, models, databases), shortening preparation times, and by automating the otherwise time-consuming meeting documentation.
We will demonstrate through examples how this approach changed common collaboration practices, and we will discuss findings yet to be addressed.

3:50pm-4:10pm
CINF 147: Chemical research toolkit: An end-to-end solution

Joshua Bishop1 , josh.bishop@perkinelmer.com, Phil McHale1 , Pierre Morieux1
1 Informatics, PerkinElmer, Waltham, Massachusetts, United States

Chemical research, much like all other hard sciences, starts with a question: What if…? What if I could make this compound? What if I could employ this catalyst in a new transformation? What if I could do this process without a transition metal? The What if… questions have a single identifier in common: they produce an avenue to fill a knowledge gap. Understanding the current chemical information relative to the What if… question is paramount to starting a research endeavor. But behind only that, is the ability to capture all of the new information that is generated. A complete research informatics platform is key to generating, understanding and applying new chemical information. This presentation will focus on the tools available to today’s chemical researchers and how they are applied in a modern research facility.

4:10pm-4:30pm
CINF 148: ELN, RegMol and inventory: From synthesis to registration to inventory

Rajeev Hotchandani1 , hotchandani@yahoo.com
1 Scilligence, Watertown, Massachusetts, United States

Scilligence is a leading innovator of cross-platform, mobile cheminformatics and bioinformatics solutions. We supply pharmaceutical, biotech, and chemical industries with state of the art informatics tools to improve their R&D bottom line. With its web-based, single sign-on and interconnected knowledge management platform of ELN, RegMol and Inventory, researchers will be able to preserve the workflow in an efficient manner.
Scilligence ELN, an electronic lab notebook for chemistry and biology will help keep track of the experimental details and associated files. The molecular entity can then be registered directly, with one click into RegMol, a registration system that is also an assay database. The registered entity can be then pushed to the Inventory system with yet another click, which keeps track of various lots & packages on an inticate level.



4:30pm-4:35pm Concluding Remarks
CINF: Computational Toxicology: From QSAR Models to Adverse Outcome Pathways
1:30pm - 5:10pm
Wednesday, August 19

Room 103 - Boston Convention & Exhibition Center
Mohamed AbdulHameed, Organizing
Mohamed AbdulHameed
Cosponsored by: AGRO, COMP, ENVR and MEDI, Presiding
1:30pm-1:35pm Introductory Remarks

1:35pm-1:55pm
CINF 156: Differential network analysis of chemical-mediated cancer induction

Francesca Mulas1 , fra.mulas@gmail.com, Daniel Gusenleitner1 , gusef@bu.edu, Stefano Monti12 , smonti@bu.edu
1 Department of Medicine, Section of Computational Biomedicine, Boston University, Boston, Massachusetts, United States; 2 Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States

Introduction
Methods for inference and comparison of networks are emerging as powerful tools that allow identifying groups of tightly connected genes whose activity may be altered during disease progression or due to chemical agents. In this work we compared networks obtained from wild type liver samples and from samples collected after the exposure to chemicals with varying carcinogenicity and genotoxicity.
Methods
The following strategy was applied to two rat-based microarray datasets: i) a network is reconstructed for the wild-type and for each compound by measuring pairwise correlation among the gene expression profiles in the corresponding samples; ii) the correlation matrix is used to extract clusters (gene modules); iii) modules inferred from a reference network (e.g. from the wild type) are scored in terms of gain and loss of connectivity across different compounds; iv) subgroups of compounds are obtained by measuring the similarity of their network structure; v) differentially connected modules obtained from the wild type network or from groups of similar compounds are further studied to investigate their biological meaning though pathway enrichment analysis.
Results
In general, the network-based approach provided an intuitive representation of gene interactions, focusing on aggregate differences that would be difficult to capture with standard differential analysis methods. A systematic analysis of networks obtained from the same tissue showed that the method produces reliable modules and scores. When applied to compare networks from wild type with those obtained from different compounds, the enrichment analysis highlighted modules with significantly altered connectivity in carcinogenic compounds that are involved in cell cycle in DNA repair. Remarkably, the comparison of the networks allowed to identify groups of compound with the same carcinogenic/genotoxic profiles and with a similar pharmacological action as recorded in Drugbank. Furthermore, the method was able to identify a gain of connectivity in modules related to lipid metabolism as a result of exposure to statins drugs, as well as a loss of connectivity in cell cycle related genes when chemotherapeutic drugs are used.

1:55pm-2:15pm
CINF 157: Massively orthogonal search engine for mechanism of action and toxicity studies

Douglas Selinger2 , douglas.selinger@novartis.com, Varun Shivashankar1 , Mustapha Larbaoui3 , Igor Mendelev1 , Michael Steeves1 , Stephen Litster1 , Philippe Marc3
1 NX (NIBR IT), Novartis Institutes for BioMedical Research, Cambridge, Massachusetts, United States; 2 PreClinical Safety Informatics, Novartis Institutes for BioMedical Research, Cambridge, Massachusetts, United States; 3 PreClinical Safety Informatics, Novartis Institutes for BioMedical Research, Basel, Switzerland

Better understanding of the mechanism of action & toxicity (MoA & MoT, respectively) of small molecules would lead to a more rational drug design and development process, and presumably also to safer and more efficacious drugs. Large amounts of data have been collected in service of this goal (biochemical activities, chemical proteomics, chemical genetics, toxicogenomics, etc.) however it remains difficult to assemble a 'bottom line' conclusion about which mechanism(s) the data most strongly supports. To address this need, we designed a search engine called MoA Central. Based on an initial small molecule query, we identify structurally and phenotypically related compounds, and their putative targets. Compound-compound similarities and compound-target links can come from any number of appropriate data sources, analyses, and in silico approaches. The resulting network, which we call a 'focal graph', can be analyzed by graph-theoretic techniques, including the PageRank algorithm made famous by Google™. Targets which are considered central to the graph represent putative mechanisms. 10s, 100s, or even orders of magnitude more data types could theoretically be used to build this graph, making it potentially a 'massively orthogonal' approach. The results are transparent and scientifically interpretable: supporting evidence for particular hypotheses (the edges that connect the query compound to a given target) can be easily read from the graph. Targets and compounds with similar supporting evidence can be grouped together by community-finding algorithms, similar to those used by social networking sites. Targets and compounds, from the whole graph or from individual communities, can be analyzed by standard set enrichment methods to identify overlaps with genes/compounds linked to various levels of biology: from the biophysical (protein families, domains, complexes) to signaling pathways, toxicities, adverse events, and therapeutic uses.

2:15pm-2:35pm
CINF 158: Combining predicted biological descriptors with chemical descriptors affords reliable hybrid QSAR models of rodent carcinogenicity

Regina Politi1 , reginap@email.unc.edu, Stephen Capuzzi1 , Sherif Farag1 , Alexander Tropsha1
1 UNC Eshelman School of Pharmacy, UNC Chapel Hill, Chapel Hill, North Carolina, United States

Previously, we have developed a hybrid QSAR approach that employs both chemical and biological descriptors of molecules. Chemical descriptors are computed from molecular structures in a conventional way, e.g., using DRAGON software. Biological descriptors represent the results of short-term biological assays, e.g., cytotoxicity. We have shown that such hybrid descriptors afford QSAR models of higher accuracy than using either chemical or biological descriptors alone. However, this approach has a natural limitation in that experimental studies are required to enable predictions for new molecules. To this end, we have employed conventional QSAR modeling to predict biological descriptor values and use those for hybrid QSAR models. We have employed a dataset of 312 compounds with known rodent carcinogenicity in vivo that were also tested in six cell viability in vitro assays. Conventional QSAR models have been built to predict outcome of each of the six assays. We have shown that hybrid models combining chemical and predicted biological descriptors afforded carcinogenicity models of at least the same accuracy as those developed with the standard hybrid QSAR modeling approach.

2:35pm-2:55pm
CINF 159: Mining big datasets to create and validate machine learning models

Alex Clark2 , Sean Ekins1 , ekinssean@yahoo.com
1 Collaborations Pharmaceuticals, Fuquay Varina, North Carolina, United States; 2 Independent, Montreal, Quebec, Canada

We have recently described a reference implementation of Bayesian model building using ECFP and FCFP-type fingerprints. We have now undertaken a large scale validation study in order to ensure that the technique generalizes to a broad variety of drug discovery and ADME/Tox datasets. To achieve this we have used the ChEMBL (version 20) and ToxCast databases, each of which consists of compounds and measurements with an assay and activity measurement. In order to test these datasets with a two-state Bayesian classification, we developed an automated algorithm for detecting a suitable threshold for active/inactive designation, which we applied to all data. With these datasets we were able to establish that our Bayesian model implementation is effective for the large majority of cases, and were able to quantify the impact of fingerprint folding on the ROC cross validation metrics. We were also able to study the impact that the choice of training/testing set partitioning has on the resulting recall rates. The datasets have been made publicly available to be downloaded, along with the corresponding model data files, which can be used in conjunction with the CDK toolkit and several mobile apps. The ability to score molecules across thousands of relevant datasets across organisms also may help to access desirable and undesirable off-target effects.

2:55pm-3:15pm
CINF 160: From QSAR to big data: Developing mechanism-driven predictive models for animal toxicity


Marlene Kim2 , Hao Zhu1 , hao.zhu99@rutgers.edu
1 Chemistry Department, Rutgers Univesity, Camden, New Jersey, United States; 2 Chemistry & Biochemistry Department, Rutgers, The State University of New Jersey, Pennsauken, New Jersey, United States

High Throughput Screening (HTS) studies provide the community with rich toxicology information that has the potential to be integrated into toxicity research. The information derived from the current toxicity data is so large and complex that it becomes difficult to process using available database management tools or traditional data processing applications. Furthermore, although the research of toxicity pathways has revealed some promising receptors as potential targets of chemical toxicants, most of the mechanisms of animal toxicity are still obscure. A general question raised from the current big data scenario is what is the usage of a toxicity bioassay (normally refers to a specific binding target) to the studies of more complicated toxicity phenomena (e.g. toxicity pathways, animal toxicity and etc). To answer this challenge, the goal of this project was to develop novel computational approaches that use data on a single bioassay as the probe to search and reveal all the relevant information in the public big data pool. All the relevant toxicity data, including the original assay data, were integrated by their mechanism relationships into computational models and eventually be used to predict and/or prioritize animal toxicants. Specifically, we used the bioassay data from the Antioxidant Response Element (ARE) pathway, which indicate the oxidative stress, to generate a mechanism profile for liver toxicants. The other bioassays were mined and selected from the daily updated chemical toxicity big data pool using our in-house automatic Chemical In vitro-In vivo Profiling (CIIPro) technique. Then we used the ARE assay together with the most relevant bioassays to evaluate various animal toxicity results (acute toxicity, hepatotoxicity and etc.). The resulting models not only showed superior predictivity to traditional Quantitative Structure-Activity Relationship (QSAR) models but also revealed relevant toxicity mechanisms. Compared to the existing data management and sharing projects (e.g. ToxCast), the approach developed in this study is the first to take the advantage of chemical toxicity big data updated on a daily basis and apply it to animal toxicity evaluations.
3:15pm-3:30pm Intermission

3:30pm-3:50pm
CINF 161: ChEMBL database and its application in toxicity assessment

Patricia Bento1 , patricia@ebi.ac.uk
1 EMBL-European Bioinformatics Institute, Cambridge, United Kingdom

ChEMBL is an Open Data database containing binding, functional, ADME and toxicity information for a large number of drug-like bioactive compounds [1]. Its content includes data extracted primarily from medicinal chemistry literature as well as from public toxicogenomics datasets, such as Open TG-GATEs [2] and DrugMatrix [3], and FDA Drug Approval Packages. The database provides researchers with free access to curated and standardised data that is valuable across a wide range of chemical biology and drug-discovery research activities. Applications of the data range from selection of tool compounds for probing targets or pathways of therapeutic interest to the identification of potential off-target effects which may pose safety concerns. This talk will cover an overview of the ChEMBL data in the context of toxicity assessment and its application as a profiling tool for potential adverse effects of a compound.

[1] Bento, A. P., Gaulton, A., Hersey, A., Bellis, L. J., Chambers, J., Davies, M., et al. (2014). The ChEMBL bioactivity database: an update. Nucleic Acids Research, 42, D1083–D1090.

[2] Uehara, T., Ono, A., Maruyama, T., Kato, I., Yamada, H., Ohno, Y., & Urushidani, T. (2010). The Japanese toxicogenomics project: application of toxicogenomics. Molecular Nutrition & Food Research, 54(2), 218–227.

[3] Ganter, B., Tugendreich, S., Pearson, C. I., Ayanoglu, E., Baumhueter, S., Bostian, K. A., et al. (2005). Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and action. Journal of Biotechnology, 119(3), 219–244.

3:50pm-4:10pm
CINF 162: Modeling ABC transporters as potential DILI targets

Matthew Segall1 , matthew.d.segall@gmail.com, Peter Hunt2 , Jon Tyzack2
1 R&D, Optibrium Limited, Cambridgeshire, United Kingdom; 2 Optibrium Ltd, Cambridge, United Kingdom

Predicting the interaction of compounds with targets associated with toxicity can provide inputs to hierarchical models integrating systems toxicology, physiologically-based pharmacokinetic (PBPK) models and organ simulations to predict compound interactions with adverse outcome pathways (AOP).
For example, MRP4 (Multi-drug resistance-associated protein 4 or ABCC4) mediates the transport of signalling molecules (such as cAMP and cGMP), prostaglandins and leukotrienes (PGE1, PGE2, LTB4) and can be inhibited by drugs such as Celecoxib, Probenecid, MK-571 and Sulfinpyrazone [1]. BSEP (Bile salt export pump or ABC11) is localised in the cholesterol rich canalicular membranes of hepatocytes and its function is to eliminate unconjugated/conjugated steroidal acids from the hepatocyte into the bile. The loss of this transporter function is seen in the genetic disease progressive familial intrahepatic cholestasis type 2. Inhibition of both of these transporters MRP4 and BSEP has been identified as a risk factor in the development of cholestatic DILI (drug-induced liver injury) [2].
We have used the publically available data from ChEMBL to build categorical and continuous quantitative structure-activity relationship (QSAR) models in order to determine the molecular properties which contribute to activity at these transporters and compare these features with known hepatotoxic compounds. We have compared the results from these models with predictions from the Derek Nexus approach for knowledge-based prediction of hepatotoxicity [3].
The resulting QSAR models, along with models of other toxicity-related targets, will form part of a hierarchy of molecular-, systems- and physiologically-based models to identify compounds with an increased risk of toxicity as part of the HeCaToS project [4].
Russel, F.G. et al. Trends Pharmacol. Sci. 29(4) pp. 200-7 (2008)
Kis, E. et al. Toxicol. in Vitro. 26(8), pp. 1294-9 (2012)
Greene, A. et al. SAR and QSAR in Environmental Research 10(2-3), pp. 299-314 (1999)
www.hecatos.eu

4:10pm-4:30pm
CINF 163: Addressing a key hurdle in translational research: Predicting mouse liver microsomal stability using machine learning

Alexander Perryman2 , alp168@njms.rutgers.edu, Sean Ekins13 , joel Freundlich42
1 Collaborations in Chemistry, Fuquay Varina, North Carolina, United States; 2 Medicine, Div. of Infectious Diseases, Rutgers-NJ Medical School, Newark, New Jersey, United States; 3 Collaborative Drug Discovery, Burlingame, California, United States; 4 Pharmacology & Physiology, Rutgers-NJ Medical School, Newark, New Jersey, United States

Efficacy studies in infected mice are a critical hurdle to advance translational research of potential therapeutic compounds for diseases such as tuberculosis, malaria, and cancer, before clinical trials in humans may be considered. For the vast majority of targets, a significant percentage of the therapeutic compounds must survive first pass clearance through the liver before having a chance of reaching their target(s). Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial cell-based model system to assess metabolic stability. It costs an average of $500/compound to have a Contract Research Organization perform MLM assays for a single compound. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. 99 sets of published assays on MLM half-life values were identified in PubChem, reformatted, and curated, to create a training set with 894 unique small molecules. These data were used to create machine learning models with Bayesian, Support Vector Machine, and Recursive Partitioning-Random Forest approaches. These models were assessed with internal cross-validation, followed by external tests with a published set of antitubercular compounds and then independent validation with an additional diverse set of 571 compounds (which we created from a further 78 sets of assays from PubChem on percent metabolism of a compound in the presence of MLM). Our Bayesian models displayed the best predictive power for identifying compounds that have a half-life ≥1 hour in MLM. This freely available data was thus used to create, test, and validate computational models that can increase the efficiency and reduce the costs of pursuing translational research in many therapeutic areas.

4:30pm-4:50pm
CINF 164: Using supervised Latent Direchlet Allocation for structure-activity relation modeling in Tox21 2014 data challenge

Iwona Weidlich1 , iweidlic@gmail.com, Igor Filippov2
1 CODDES LLC, Rockville, Maryland, United States; 2 VIF Innovations, LLC, Gaithersburg, Maryland, United States

Supervised Latent Direchlet Allocation (SLDA) is a machine learning method more frequently used for text analytics and image classification. SLDA attempts to recover hidden 'topics' when empoyed in the domain of document analysis and model the feature occurence based on the mixture of such topics. To the best of our knowledge this method has not been used for SAR/QSAR modeling before and we demonstrate this approach as applied to Tox21 machine learning challenge for a dataset based on Toxicology in the 21st Century initiative.


ROC plot for SLDA method for AR subset of Tox21 challenge.,



Molecular map for Active (red) vs. Inactive (blue) molecules in the training set for AR data in Tox21 challenge. Map created using Diversity Genie software.


4:50pm-5:10pm
CINF 165: Cheminformatics-based signal boosting for predicting drug adverse events

Andrew Fant1 , Andrew.Fant@fda.hhs.gov, Naomi Kruhlak1 , Keith Burkhart1
1 Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, Maryland, United States

Stevens-Johnson Syndrome (SJS) is a rare, but life-threatening adverse drug event (ADE) often identified as a post-market safety signal. Histologically, a key feature is keratinocyte apoptosis and death, resulting in epidermal necrosis and blistering seen on physical examination. SJS is thought to be an immune mediated phenomenon, closely linked to the Human Leucocyte Antigens (HLA) of the Major Histocompatibility Complex (MHC). Owing to the low incidence of SJS and the absence of accurate preclinical susceptibility assays, it is desirable to be able to make empirical predictions of a drug’s potential liability for inducing SJS. One promising method has been the combination of maximal common substructure searches with two and three dimensional chemical descriptors to suggest structural alerts which may link to a common mechanistic basis for the event. This approach has been especially successful for elucidating mechanisms of hepatotoxicity. Over 400 drugs have been identified as potential causative agents of SJS. A few SJS structural alerts have been derived, but not all suspect drugs match one or more of these alerts.
We have observed the presence of aryl amides among a number of SJS labeled drugs at the FDA. While this moiety is not particularly uncommon in approved drugs (being present in roughly 500 of 4000 small molecule products), current structural alerts taken from the published literature fail to identify a set of twenty potential inducers of SJS (extracted from an archive of adverse event reports) that were administered to patients immediately before a diagnosis of SJS. Because of the extreme imbalance in the ratio of active to inactive compounds among the 500 aryl amides, we have borrowed techniques from more traditional (Q)SAR methods to improve this balance and to define necessary and sufficient conditions for SJS inducers in terms of molecular structures and properties. These models are being used to as part of a prospective screen on real-time pharmacovigilance data and safety-related changes to drug labeling information. Ultimately, we seek to create predictive chemical models for rare adverse events that can be used prophylactically to identify areas of concern that may require additional attention during the review process.
CINF: General Papers
9:00am - 11:30am
Thursday, August 20

Room 104A - Boston Convention & Exhibition Center
Erin Davis, Organizing
Erin Davis, Presiding

9:00am-9:30am
CINF 166: CIIPro: An online cheminformatics portal for large scale chemical data analysis


Daniel Russo1 , danrusso@scarletmail.rutgers.edu, Wenyi Wang1 , Marlene Kim1 , Daniel Pinolini1 , Hao Zhu12
1 Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, United States; 2 Chemistry Department, Rutgers Univesity, Camden, New Jersey, United States

The massive amount of chemical data that currently exists in this big data era is difficult to extract and hard to rely on. To this end, we developed a public Chemical In vitro In vivo Profiling (CIIPro) portal that can automatically extract biological data from public resources (i.e., PubChem) for compounds based on user input. Unlike querying a typical chemical database, a novel algorithm in the portal allows users to query compounds with a target activity (e.g., specific animal toxicity testing results), extracts biological data based on the in vitro-in vivo correlation, and outputs the data in a format conducive to research. The resulting biological data for target compounds can be used for modeling purposes. For example, the CIIPro portal can identify the chemical and biological similarity between compounds based on their chemical structure and optimized biological profile. This portal was used to develop multiple novel predictive models for complex biological activities (e.g., complex animal toxicity endpoints). The CIIPro portal is free and accessible through the internet at ciipro.rutgers.edu.

9:30am-10:00am
CINF 167: Improving virtual screening performance through identification of molecular descriptor features sensitive to specific biological activities

Martin Vogt1 , martin.vogt@bit.uni-bonn.de, Jürgen Bajorath1
1 Life Science Informatics, University of Bonn, B-IT, Bonn, Germany

Ligand-based virtual screening performance generally depends on the molecular representation used, the search method employed, and the way molecular similarity is assessed. A molecular (descriptor) representation might be optimized for virtual screening by attempting to identify descriptor features that encode activity-specific information. Fingerprints are popular molecular representations for virtual screening, given their intuitive design and computational efficiency.
Here, we present a feature selection method for the identification of fingerprint feature combinations that are sensitive to compounds having a specific activity as well as high potency. On the basis of training instances, fingerprint features are identified that are characteristic of potent compounds sharing a given bioactivity, if available, and used as second generation descriptors to increase the recall of potent compounds in virtual screening trials. Small feature sets are often found to yield high compound class-directed search performance.

References
(1) Vogt, M.; Bajorath, J. Similarity Searching for Potent Compounds Using Feature Selection. J. Chem. Inf. Model. 2013, 53, 1613-1619.

10:00am-10:30am
CINF 168: “Graphical abstracts only”: The changing use of periodicals among early career chemists


Marianne NOEL1 , noel@ifris.org
1 LISIS & IFRIS, Université Paris-Est, Marne-la-Vallée Cendex 2, France

With the creation of the Internet, many of the assumptions which have underpinned the established scholarly communication system have been challenged (Borgman 2007). The digital availability of scholarly literature has transformed chemists’ research practices by creating an environment where they can easily search for journal articles and chemical information. The proposed paper is a follow-up of a 4-year collective study (ANR PrestEnce 2010-2013) where we studied the organizational construction of academic quality in high standing chemistry departments (Paradeise and Thoenig 2013, Paradeise et al., 2014).
Specifically, it focuses on early career researchers and is anchored in social studies of science, organizations studies as well as media studies. In this paper, 13 interview data are analysed thematically using NVivO qualitative analysis software. Results suggest that reading, writing and, to a lesser extent, publishing feed into a non-linear process where inputs (publications and results) are constantly revisited to “build up a story”. Questioned PhD candidates and postdocs evoked a large browsing experience, in some cases going through huge quantities of papers (up to 1000 per week) and focusing on graphical abstracts only. In many interviews, time was considered a crucial aspect. Surprisingly, interviewees did not relate this to the time pressures of the publishing process but rather to the epistemic nature of objects and techniques they used and studied.
This research aims to outline the changing use of periodicals and their role in defining “value” in scholarly communication. More generally I would question the emergence of a modern press culture and civilisation of periodicity.

10:30am-11:00am
CINF 169: QSPR/QSAR studies of antifouling/fouling-release surface coatings containing quaternary ammonium salts

Farukh Jabeen3 , farukh.jabeen@ndsu.edu, Bakhtiyor Rasulev2 , Martin Ossowski2 , Bret Chisholm1 , Shane Stafslien1 , Philip Boudjouk4
1 Center for Nanoscale Science and Engineering, North Dakota State University, Fargo, North Dakota, United States; 3 Center for Computationally Assisted Science and Technology, North Dakota State University, Fargo, North Dakota, United States; 4 Chemistry and Biochemistry, North Dakota State University, Fargo, North Dakota, United States

Polysiloxane coatings containing tethered quaternary ammonium salt (QAS) moieties were reported to be environmental-friendly coatings to control marine biofouling [1]. A number of coating materials were synthesized by varying the concentration of its main compositional components [1]. Surface properties and biological properties of the 75 compositionally unique coatings were studied, which were found to possess a range of antifouling (AF) and fouling-release (FR) activities.
Experimental data were received for 8 different properties of 75 (25 composition for each PDMS system – 2K, 18K and 49K) compositions of the coatings [1]. In order to develop a coating material with optimal biofouling properties, a Quantitative Structure-Property/Activity Relationship (QSPR/QSAR) for mixtures approach was applied, keeping in view the complex nature of coating systems. The descriptors were calculated for the individual components of coating system and then utilized to calculate mixture descriptors. QSAR/QSPR models were generated for each of the 8 endpoints of the coating systems.
4-D QSAR methodology was applied to connect in one QSAR model, the 3 different systems with molecular weight of PDMS of 2000 g/mol, 18000g/mol and 49000g/mol for coating materials. The developed QSPR/QSAR models showed a good predictive ability for all 8 endpoints significantly for 18000g/mol coating systems. The coatings containing the longest alkyl chain (18 carbons) exhibited the highest micro-roughness and was also found to be effective at inhibiting microbial biofilm formation. [1] The selected descriptors in the most effective QSAR models were able to explain the influence of components’ features that are responsible for good fouling-release activity.

Reference

[1] Majumdara, P.; Leea, E.; Patela, N.; Warda, K.; Stafsliena, S. J.; Danielsa, J.; Chisholma, B. J.; Boudjouk, P.; Callow, M. E.; Callow, J. A.; and Thompson, S. E.M. Biofouling. 2008, 24,185-200.


Acknowledgements
The research has been financially supported by the U.S. Department of Energy through Grant No. DE-SC0001717.
CINF: General Papers
1:00pm - 3:00pm
Thursday, August 20

Room 104A - Boston Convention & Exhibition Center
Erin Davis, Organizing
Erin Davis, Presiding

1:00pm-1:30pm
CINF 170: Experimental chemoinformatics study of tautomerism of commercial screening samples

Laura Guasch1 , lguasch@helix.nih.gov, Marc Nicklaus1
1 NCI-Frederick, NIH, Frederick, Maryland, United States

A compound exhibits tautomerism if it can be represented by two or more structures that are related by a formal intramolecular movement of a hydrogen atom from one heavy atom position to another. The existence of multiple tautomeric forms of the same molecule is still an important, and in many ways unsolved, issue in drug discovery and chemoinformatics. The enumeration of the appropriate tautomers is important for the registration and retrieval of small molecules in chemical databases. Here, we have conducted a tautomerism analysis in a large database of commercially available compounds to investigate how many cases we find of the same chemical being sold as different products (at possibly different prices); and to test the tautomerism definition of the widely used chemoinformatics toolkit CACTVS. We applied the CACTVS prototropic rules plus our new set of ring-chain rules to the publicly accessible Aldrich Market Select (AMS) database from ChemNavigator/Sigma-Aldrich, which comprises over 8 million unique chemicals available from hundreds of suppliers worldwide. We found thousands of cases where at least two products listed as different compounds in the AMS were declared by the CACTVS-based rules as tautomeric forms of the same compound. A set of 166 tautomeric pairs (or larger tuples) from the AMS, covering a range of tautomeric rules, was purchased. NMR spectra were recorded and compared between samples of each tautomeric conflict tuple. In the majority of cases, the spectra indicated that the different products were really the same compound. We discuss the potential need for better curation of commercial catalogs (and other databases) in terms of handling of tautomerism.

1:30pm-2:00pm
CINF 171: Which kinase to hit in NCI-60? From a selectivity problem to a multitarget solution

Oscar Méndez Lucio2 , oscarmen@comunidad.unam.mx, Aakash Chavan Ravindranath2 , Qurrat Ul Ain2 , Kristian Birchall3 , Chido Mpamhanga3 , Stefan Knapp1 , Andreas Bender2
1 SGC, Oxford, United Kingdom; 2 Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom; 3 Medical Research Council Technology, London, United Kingdom

The family of kinases comprises 518 proteins[1] encoded in the human genome and several of them are targeted by current drugs to treat e.g. cancer, inflammation, and parasitic infections. However, drug promiscuity remains a challenge due to the high degree of conservation of the ATP binding site targeted by most inhibitors. Currently there are 30 kinase inhibitors approved by the FDA and it is still a debate if the multitarget ability presented by some of these compounds could be responsible of their efficacy against different cancers.[2]
In order to analyse which could be the best kinase(s) to target we used 367 compounds tested in a biochemical assay to determine the percentage of inhibition at 1μM against a panel of 224 kinases. The same compounds were tested in vitro against the National Cancer Institute’s NCI-60 cell line panel to evaluate their potential to inhibit cell growth (pGI50). This data was used to generate proteochemometric (PCM) models, in which the biological space and chemical space is analyzed simultaneously. PCM allows to study the effect of all inhibitors on all cell lines using a single model.
Results from this analysis suggest that cell lines can be clustered based on their sensibility to kinase inhibitor. Models using random forest were generated for each of these clusters and one general model using all the data. The interpretation of these models suggests that the inhibition of NEK9 correlates the most with the phenotypic effect of compounds in the dataset. The model is also capable to predict combination of kinases that present a synergistic effect when are inhibited simultaneously.

[1] Manning, G., Whyte, D.B., Martinez, R., Hunter, T., Sudarsanam, S. The Protein Kinase Complement of the Human Genome. Science. 2002, 298, 1912-34.
[2] Morphy, R. Selectively Nonselective Kinase Inhibition: Striking the Right Balance. J. Med. Chem. 2010, 53, 1413-37.


Cell lines were clustered based on their sensitivity to 367 kinase inhibitors (A). Regression models were generated (B) for each cluster in order to predict the cell sensitivity to compounds with a particular kinase inhibition profile.

2:00pm-2:30pm
CINF 172: HackaMol: An object-oriented Modern Perl library for molecular hacking on multiple scales

Demian Riccardi1 , demianriccardi@gmail.com
1 Chemistry, Earlham College, Richmond, Indiana, United States

HackaMol is an open source, object-oriented toolkit written in Modern Perl that organizes atoms within molecules and provides chemically intuitive attributes and methods. The library consists of two components: HackaMol, the core that contains classes for storing and manipulating molecular information, and HackaMol::X, the extensions that use the core. The core is well-tested, well-documented, and easy to install across computational platforms. The goal of the extensions is to provide a more flexible space for researchers to develop and share new methods. In this talk I will describe the core classes and two extensions: HackaMol::X::Calculator, an abstract calculator that uses code references to generalize interfaces with external programs and HackaMol::X::Vina, a structured class that provides an interface with the AutoDock Vina docking program.

2:30pm-3:00pm
CINF 173: Programmatic access to chemical information in PubChem

Sunghwan Kim1 , kimsungh@ncbi.nlm.nih.gov, Paul Thiessen1 , Evan Bolton1 , Stephen Bryant1
1 NCBI / NLM / NIH, Warrenton, Virginia, United States

PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, developed and maintained by the U.S. National Institutes of Health (NIH). Since its launch in 2004, it has been rapidly growing in data size and contents, and has become a key resource of chemical information for the biomedical research community. Currently, PubChem contains more than 180 million depositor-provided chemical substance descriptions, 60 million unique chemical structures, and 225 million bioactivity assay results, covering more than nine thousand unique protein target sequences. Programmatic access to this vast amount of data is provided by several different systems, including the U.S. National Center for Biotechnology Information (NCBI)’s Entrez Utilities (E-Utilities or E-Utils), and the PubChem Power User Gateway (PUG) - a common gateway interface (CGI) that exchanges data through eXtended Markup Language (XML). Further simplifying programmatic access, PubChem provides two additional general purpose web services: PUG-SOAP, which uses the simple object access protocol (SOAP), and PUG-REST, which is a Representational State Transfer (REST)-style interface. These interfaces can be harnessed in combination to access the data contained in PubChem, which is integrated with the more than thirty databases available within the NCBI Entrez system. This presentation will provide a brief overview of these programmatic access tools.

2015 Officers and Functionaries

Chair
Rachelle Bienstock
National Institute of Environmental
Health Sciences
rachelleb1@gmail.com

Chair-Elect
see Chair

Past-Chair
Judith Currano
University of Pennsylvania
currano@pobox.upenn.edu

Secretary
Leah McEwen
Cornell University
lrm1@cornell.edu

Treasurer
Rob McFarland
Washington University
rmcfarland@wustl.edu

Councilor
Bonnie Lawlor
chescot@aol.com

Councilor
Andrea Twiss-Brooks
University of Chicago
atbrooks@uchicago.edu

Alternate Councilor
Charles Huber
University of California, Santa Barbara
huber@library.ucsb.edu

Alternate Councilor
Guenter Grethe
Scientific Research Consultant
ggrethe@att.net

Archivist/Historian
Bonnie Lawlor
See Councilor

Audit Committee Chair
TBD

Awards Committee Chair
Andrea Twiss-Brooks
See Councilor

Careers Committee Co-Chairs
Pamela Scott
Pfizer
pamela.j.scott@pfizer.com

Sue Cardinal
University of Rochester
scardinal@library.rochester.edu

Communications and Publications
Committee Chair
David Martinsen
American Chemical Society
d_martinsen@acs.org

Constitution, Bylaws & Procedures
Susanne Redalje
University of Washington
curie@u.washington.edu

Education Committee Chair
Grace Baysinger
Stanford University
graceb@stanford.edu

Finance Committee Chair
Rob McFarland
See Treasurer

Fundraising Committee Chair
Phil Heller
Thieme Publishers
phillip.heller@thieme.com

Membership Committee Chair
Donna Wrublewski
Caltech Library
dtwrub@caltech.edu

Nominating Committee Chair
see Past-Chair

Program Committee Chair
Erin Bolstad
Cambridge Crystallographic Data Centre
erinbolstad@gmail.com

Tellers Committee Chair
Susan Cardinal
see Careers Committee Chair

Chemical Information Bulletin Editor
Summer and Winter
Svetlana Korolev
University of Wisconsin, Milwaukee
skorolev@uwm.edu

Chemical Information Bulletin Editor
Fall and Spring

Vincent F. Scalfani
The University of Alabama
vfscalfani@ua.edu

Chemical Information Bulletin Assistant Editors
Teri Vogel
UC San Diego Library
tmvogel@ucsd.edu

David Shobe
Patent Information Agent
avidshobe@yahoo.com

Webmaster
Patti McCall
University of Central Florida
patti.mccall@ucf.edu

Fall 2015 CINF Bulletin Contributors

Articles and Features
Rachelle Bienstock
Bonnie Lawlor
Robert E. Buntrock
Vincent F. Scalfani
Teri Vogel

Sponsor Information
Graham Douglas
Phil Heller

Technical Program
David Martinsen

Production
Vincent F. Scalfani
Teri Vogel
Patti McCall
Erja Kajosalo
David Martinsen
Bonnie Lawlor
Wendy A. Warr

Future ACS Meetings

 

251st

Mar. 13–17

2016

San Diego, CA

Computers in Chemistry

252nd

Aug. 21–25

2016

Philadelphia, PA

Chemistry of the People, by the People, and for the People

 

253rd

Apr. 2–6

2017

San Francisco, CA

   TBD

254th

Aug. 20–24

2017

Washington, DC

   TBD

255th

Mar. 18–22

2018

New Orleans, LA

   ״

256th

Aug. 19–23

2018

Boston, MA

   ״

257th

Mar. 31–Apr. 4

2019

Orlando, FL

   ״

258th

Aug. 25–29

2019

San Diego, CA

   ״

259th

Mar. 22–26

2020

Philadelphia, PA

   ״

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Download the pdf