Vol. 67, No. 1: Spring 2015

Chemical Information Bulletin

A Publication of the Division of Chemical Information of the ACS

Volume 67 No. 1 (Spring) 2015

Vincent F. Scalfani, Editor
The University of Alabama
vfscalfani@ua.edu

cover
Cover image is courtesy of Vincent F. Scalfani.The picture was taken in Rocky Mountain National Park, CO.

ISSN: 0364-1910
Chemical Information Bulletin,
©Copyright 2015 by the Division of Chemical Information of the American Chemical Society.

Message From the Chair

This is my first chance to address you as chair in the Chemical Information Bulletin.

I hope that I will personally have the opportunity to meet each of you during my term as CINF chair. I would like to extend a thanks to my predecessor, Judith Currano, for serving our division so well as chair over the past year, and to extend hearty congratulations to her and her husband Peter, on the birth of their second son, Bill. Also congratulations to Edward on becoming a big brother! Hopefully we will see the family—now with two little boys—at our future meetings!

chair

We are all looking forward to the upcoming spring meeting in Denver.  I want to thank some of our newer members for stepping up, and for so quickly becoming such active participants in our division: Michael Qiu for all his assistance working with ACS in arranging rooms and refreshments for our meetings and social events for the upcoming Denver meeting, and Donna Wrublewski for agreeing to serve as our membership committee chair. She has already been working hard on putting together a new membership brochure. We welcome the participation of new members and there are always opportunities available, from assisting with meeting logistics, to technical programming suggestions and symposium chairing, to membership and fundraising.  Don’t hesitate to email me or any of the other CINF board members if you have an interest, or if you are in Denver early on Saturday prior to the start of the national meeting, please contact me about attending some of our open planning sessions.

I guess you can think of the Denver program as “Small Is Beautiful.”  While the CINF program is a little smaller, in terms of numbers of papers, than usual at national meetings, it is rich in quality. You can look forward to sessions on:

Research Results Reproducibility, Reporting, Sharing & Plagiarism         

Molecular & Structural 2D & 3D Chemical Fingerprinting Computational Storing Searching & Comparing Molecular & Chemical Structures 

Getting to the Best Reaction Tools for Finding a Needle in a Haystack

Development & Use of Data Format Standards for Cheminformatics      

Defining Value in Scholarly Communications: Evolving Ways of Evaluating Impact on Science

Aside from the technical program please remember our Welcoming Reception on Sunday evening, and Harry’s Party on Monday evening. Also, I hope you will join us for our interesting luncheon speaker on Tuesday: David Thomas, (Of Mines & Beer! 150+ Years of Brewing in Colorado and The Craft Maltsters’ Handbook), who will speak on 19th & 21st century malting and brewing. I am sure he will share information with us on who brews the best beer in town!  Please purchase tickets in advance through the ACS website or see me for tickets, or just email me and let me know you are coming and I will save a ticket for you at the door. If you are a first time attendee to a CINF event in Denver, please make sure to come up and introduce yourself to me. 

Also please note that the program for the Fall Boston meeting is already online http://www.acs.org/content/acs/en/meetings/abstract-submissions/acsnm250/division-of-chemical-information.html. Please submit those abstracts by the March deadline!

We welcome suggestions on ways to improve our programming and make our division membership more beneficial for you.  We are always looking for committee members and those would like to be more involved and engaged.  Please don’t hesitate to contact me or other CINF officers if you have any suggestions for programming, webinars, or other ways CINF could offer something more to you as a member. 

I am looking forward to meeting you and hope to see you at a CINF social and/or technical event in Denver!

Rachelle Bienstock,
Chair, ACS Division of Chemical Information
Rachelleb1@gmail.com

Letter from the Editor

Thank you for reading this issue of the ACS Chemical Information Bulletin (CIB). I would first like to thank all of the contributors for their time and effort in writing great articles for the CIB. We have a variety of feature articles in this issue. In the book review column, Bob Buntrock reviews Computational Organic Chemistry, 2nd edition (Bachrach, S.M.) and Patti McCall reviewed The Future of Chemical Information (Eds. McEwen, L.R. and Buntrock, R.E.). We included two wonderful tributes to the late Paul von Ragué Schleyer and Frank H. Allen. Bob Buntrock provided a captivating story of his interactions with Schleyer as a student at Princeton University. And Wendy Warr contributed a report of the CINF Herman Skolnik Award Symposium awarded to Frank H. Allen in 2003. I hope you enjoy these tributes as much as I did and it brings you some joy in an otherwise sad event. Lastly, in this spring 2015 issue of the CIB, Donna Wrublewski continues her new member profile column with CINF member Martin Walker.

I won’t be able to make it to the ACS meeting in Denver this spring, but I just submitted my abstract for the Fall ACS meeting in Boston. I’ll see y’all in the fall. In the meantime, I’m always looking for interesting content to include into the CIB, so if you have any ideas, please get in touch with me.

 

Vincent F. Scalfani, Editor
The University of Alabama
vfscalfani@ua.edu

 

Social Networking Events

CINF Social Networking Events at the Spring 2015 ACS Meeting

ACSlogo       CINFlogo

Please join us at these Division of Chemical Information Events!

The ACS Division of Chemical Information is pleased to host the following social networking events at the Spring 2015 ACS National Meeting in Denver, Colorado.

Sunday Welcoming Reception

6:30-8:30pm, Sunday, March 22nd – Embassy Suites Denver, Silverton Ballroom 3
Reception co-sponsored by Bio-Rad Laboratories, Journal of Cheminformatics (Springer)

Journal of Chemical Information and Modeling, PerkinElmer,

AAAS/Science, CRC Press and Thieme Chemistry.

Harry’s Party

5:00-8:00pm Monday, March 23rd – Sheraton Denver Downtown, The Presidential Parlor 793
Sponsored exclusively by ACS Publications. * Shuttle Route 2

Tuesday Luncheon (Ticketed Event – Contact Division Chair, Rachelle Bienstock)
12:00-1:30pm Tuesday, March 24th – Embassy Suites Denver, Silverton Ballroom 3
Sponsored exclusively by the Royal Society of Chemistry.

Speaker: David A. Thomas - 19th & 21st Century Malting & Brewing; Author of: Of Mines & Beer - 150+ Years of Brewing in Colorado

Image

After 32 years, the author retired from Coors Brewing Company as a traveling brewmaster. He now is Brewer Emeritus at Dostal Alley Brewpub in Central City, Colorado, writes for The Brewer & Distiller International (UK) and consults with Ecolab and the Colorado Malting Co. He is also on the Board of Trustees of the Gilpin County Historical Society.

Committee Meetings

Saturday, March 21: 1-3pm

Education Committee, Colorado Convention Center, Room 604

Program Committee: Colorado Convention Center, Room 606

Awards Committee: Colorado Convention Center,  Room 608

Saturday, March 21: 3-6 pm

Executive Committee: Colorado Convention Center, Room 604

 

Awards & Scholarships

Image

Chemical Structure Association Trust

Applications Invited for CSA Trust Grant for 2015

The Chemical Structure Association (CSA) Trust is an internationally recognized organization established to promote the critical importance of chemical information to advances in chemical research. In support of its charter, the Trust has created a unique Grant Program and is now inviting the submission of grant applications for 2015.

Purpose of the Grants: 

The Grant Program has been created to provide funding for the career development of young researchers who have demonstrated excellence in their education, research or development activities that are related to the systems and methods used to store, process and retrieve information about chemical structures, reactions and compounds.  One or more Grants will be awarded annually up to a total combined maximum of ten thousand U.S. dollars ($10,000). Grants are awarded for specific purposes, and within one year each grantee is required to submit a brief written report detailing how the grant funds were allocated. Grantees are also requested to recognize the support of the Trust in any paper or presentation that is given as a result of that support.

Who is Eligible?

Applicant(s), age 35 or younger, who have demonstrated excellence in their chemical information related research and who are developing careers that have the potential to have a positive impact on the utility of chemical information relevant to chemical structures, reactions and compounds, are invited to submit applications.  While the primary focus of the Grant Program is the career development of young researchers, additional bursaries may be made available at the discretion of the Trust.  All requests must follow the application procedures noted below and will be weighed against the same criteria.

Which Activities are Eligible?

Grants may be awarded to acquire the experience and education necessary to support research activities, for example, for travel to collaborate with research groups, to attend a conference relevant to one’s area of research, to gain access to special computational facilities, or to acquire unique research techniques in support of one’s research.

Application Requirements: 

A letter that details the work upon which the Grant application is to be evaluated as well as details on research recently completed by the applicant;

The amount of Grant funds being requested and the details regarding the purpose for which the Grant will be used (e.g., cost of equipment, travel expenses if the request is for financial support of meeting attendance, etc.). The relevance of the above-stated purpose to the Trust’s objectives and the clarity of this statement are essential in the evaluation of the application;

A brief biographical sketch, including a statement of academic qualifications; 

Two reference letters in support of the application.  Additional materials may be supplied at the discretion of the applicant only if relevant to the application and if such materials provide information not already included in items 1–3. 

Three copies of the complete application document must be supplied for distribution to the Grants Committee.

Deadline for Applications: 

The deadline for applications for the 2015 Grant is March 13, 2015. Successful applicants will be notified no later than May 2, 2015.

Address for Submission of Applications: 

Three copies of the application documentation should be forwarded to:  Bonnie Lawlor, CSA Trust Grant Committee Chair, 276 Upper Gulph Road, Radnor, PA 19087, USA.  If you wish to enter your application by e-mail, please contact Bonnie Lawlor at chescot@aol.com prior to submission so that she can contact you if the e-mail does not arrive.

Chemical Structure Association Trust:  Previous Grant Awardees

2014 – Dr. Adam Madarasz

Institute of Organic Chemistry, Research Centre for Natural Sciences, Hungarian Academy of Sciences. He was awarded a Grant for travel to study at the University of Oxford with Dr. Robert S. Paton, a 2013 CSA Trust Grant winner, in order to increase his  experience in the development of computational methodology which is able to accurately model realistic and flexible transition states in chemical and biochemical reactions.

2014 – MJosé Ojeda Montes

Department of Biochemistry and Biotechnology, University Rovira i Virgili, Spain. She was awarded a Grant for travel expenses to study for four months at the Freie University of Berlin to enhance her experience and knowledge regarding virtual screening workflows for predicting therapeutic uses of natural molecules in the field of functional food design.

2014 – Dr. David Palmer

Department of Chemistry, University of Strathclyde, Scotland.  He was awarded a Grant to present a paper at the fall 2014 meeting of the American Chemical Society on a new approach for representing molecular structures in computers based upon on ideas from the Integral Equation Theory of Molecular Liquids.

2014 – Sona B. Warrier

Departments of Pharmaceutical Chemistry, Pharmaceutical Biotechnology, and Pharmaceutical Analysis, NMIMS University, Mumbai. She was awarded a Grant to attend the International Conference on Pure and Applied Chemistry to present a poster on her research on inverse virtual screening in drug repositioning.

2013 – Dr. Johannes Hachmann

Department of Chemistry and Chemical Biology at Harvard University, Cambridge, MA.   He was awarded the Grant for travel to speak on “Structure-property relationships of molecular precursors to organic electronics” at a workshop sponsored by the Centre Européen de Calcul Atomique et Moléculaire (CECAM) that took place October 22 – 25, 2013 in Lausanne, Switzerland.

2013 – Dr. Robert S. Paton

University of Oxford, UK.  He was awarded the Grant to speak at the Sixth Asian Pacific Conference of Theoretical and Computational Chemistry in Korea on July 11, 2013. Receiving the invitation for this meeting provided Dr. Paton with an opportunity to further his career as a Principal Investigator.

2013 – Dr. Aaron Thornton

Material Science and Engineering at CSIRO in Victoria, Australia. He was awarded the Grant to attend the 2014 International Conference on Molecular and Materials Informatics at Iowa State University with the objective of expanding his knowledge of web semantics, chemical mark-up language, resource description frameworks and other online sharing tools.  He also visited Dr. Maciej Haranczyk, a prior CSA Trust Grant recipient, who is one of the world leaders in virtual screening.

2012 – Tu Le

CSIRO Division of Materials Science & Engineering, Clayton, VIV, Australia. Tu C. was awarded the Grant for travel to attend a cheminformatics course at Sheffield University and to visit the Membrane Biophysics group of the Department of Chemistry at Imperial College London.

2011 – J. B. Brown

Kyoto University, Kyoto, Japan. J.B. was awarded the Grant for travel to work with Professor Ernst Walter-Knappat the Freie University of Berlin and Professor Jean-Phillipe Vert of the Paris MinesTech to continue his work on the development of atomic partial charge kernels.

2010 – Noel O’Boyle

University College Cork, Ireland. Noel was awarded the grant to both network and present his work on open source software for pharmacophore discovery and searching at the 2010 German Conference on Cheminformatics.

2009 – Laura Guasch Pamies

University Rovira & Virgili, Catalonia, Spain.  Laura was awarded the Grant to do three months of research at the University of Innsbruck, Austria.

2008 – Maciej Haranczyk

University of Gdansk, Poland. Maciej was awarded the Grant to travel to Sheffield University, Sheffield, UK, for a 6-week visit for research purposes.

2007 – Rajarshi Guha

Indiana University, Bloomington, IN, USA. Rajarshi was awarded the Grant to attend the Gordon Research Conference on Computer-Aided Design in August 2007.

2006 – Krisztina Boda

University of Erlangen, Erlangen, Germany. Krisztina was awarded the Grant to attend the 2006 spring National Meeting of the American Chemical Society in Atlanta, GA, USA.

2005 – Dr. Val Gillet and Professor Peter Willett

University of Sheffield, Sheffield, UK.  They were awarded the Grant for student travel costs to the 2005 Chemical Structures Conference held in Noordwijkerhout, the Netherlands.

2004 – Dr. Sandra Saunders

University of Western Australia, Perth, Australia. Sandra was awarded the Grant to purchase equipment needed for her research.

2003 – Prashant S. Kharkar

Institute of Chemical Technology, University of Mumbai, Matunga, Mumbai. Prashant was awarded the Grant to attend the conference, Bioactive Discovery in the New Millennium, in Lorne, Victoria, Australia (February 2003) to present a paper, “The Docking Analysis of 5-Deazapteridine Inhibitors of Mycobacterium avium complex (MAC) Dihydrofolate reductase (DHFR).”

2001 – Georgios Gkoutos

Imperial College of Science, Technology and Medicine, Department of Chemistry. London, UK. Georgios was awarded the Grant to attend the conference, Computational Methods in Toxicology and Pharmacology Integrating Internet Resources, (CMTPI-2001) in Bordeaux, France, to present part of his work on internet-based molecular resource discovery tools.

2015 CINF Officers and Functionaries

Chair
Rachelle Bienstock
National Institute of Environmental
Health Sciences
rachelleb1@gmail.com

Chair Elect
see Chair

Past Chair
Judith Currano
University of Pennsylvania
currano@pobox.upenn.edu

Secretary
Leah McEwen
Cornell University
lrm1@cornell.edu

Treasurer
Rob McFarland
Washington University
rmcfarland@wustl.edu

Councilor
Bonnie Lawlor
chescot@aol.com

Councilor
Andrea Twiss-Brooks
University of Chicago
atbrooks@uchicago.edu

Alternate Councilor
Charles Huber
University of California, Santa Barbara
huber@library.ucsb.edu

Alternate Councilor
Guenter Grethe
Scientific Research Consultant
ggrethe@att.net

Archivist/Historian
Bonnie Lawlor
See Councilor

Audit Committee Chair
TBD

Awards Committee Chair
Andrea Twiss-Brooks
See Councilor

 

Careers Committee Co-Chairs
Pamela Scott
Pfizer
pamela.j.scott@pfizer.com

Sue Cardinal
University of Rochester
scardinal@library.rochester.edu

Communications and Publications Committee Chair
David Martinsen
American Chemical Society
d_martinsen@acs.org

Constitution, Bylaws & Procedures
Susanne Redalje
University of Washington
curie@u.washington.edu

Education Committee Chair
Grace Baysinger
Stanford University
graceb@stanford.edu

Finance Committee Chair
Rob McFarland
See Treasurer

Fundraising Committee Chair
Phil Heller
Thieme Publishers
phillip.heller@thieme.com

Membership Committee Chair
Donna Wrublewski
California Institute of Technology
dtwrub@caltech.edu

Nominating Committee Chair
see Past Chair

Program Committee Chair
Erin Bolstad
John McNeil and Co, Inc.
erinbolstad@gmail.com

Tellers Committee Chair
Susan Cardinal
see Careers Committee Chair

Chemical Information Bulletin Editor Summer and Winter
Svetlana Korolev
University of Wisconsin, Milwaukee
skorolev@uwm.edu

Chemical Information Bulletin Editor Fall and Spring
Vincent F. Scalfani
The University of Alabama
vfscalfani@ua.edu

Chemical Information Bulletin Assistant Editors
Teri Vogel
UC San Diego Library
tmvogel@ucsd.edu

David Shobe
Patent Information Agent
davidshobe@yahoo.com

Webmaster
Patti McCall
University of Central Florida
patti.mccall@ucf.edu

Spring 2015 CINF Bulletin Contributors, Articles and Features
Rachelle Bienstock
Bonnie Lawlor
Patti McCall
Robert E. Buntrock
Donna T. Wrublewski
Wendy A. Warr

Sponsor Information
Graham Douglas
Phil Heller

Technical Program
David Martinsen

Production
Vincent F. Scalfani
Teri Vogel
Patti McCall
Erja Kajosalo
David Martinsen
Bonnie Lawlor
Wendy A. Warr

Book Reviews

A Future of the History of Chemical Information

A Future of the History of Chemical Information (ACS Symposium Series) American Chemical Society, McEwen, Leah Rae (Editor), Robert E. Buntrock (Editor)  2014, ISBN 9780841229457

This book is a must read for science librarians, information professionals, and researchers who need a primer on the chemical information landscape. Each chapter in this book covers a different area of chemical information and its evolution including patents, mobile device apps the evolution of chemical databases (CAS, and Reaxys for example), open access databases and resources, chemical ontology, the semantic web, and even spectral data. Some common themes emerge from many of the chapters. For example, although the delivery of chemical information has shifted from print to electronic resources, the need to know where to find such information and how to formulate useful search strategies remains key. Researchers should still spend time keeping current with the literature in their area and beyond. Print may be becoming obsolete but the need to browse the literature is not. As the saying goes (or words to that effect), a few hours in the library can save a few weeks in the laboratory. Although researchers may not walk into a physical library space, they do need to consult the right information-based tools whether they are databases accessed from the lab or applications used while waiting in the airport. 

As a science librarian, I found Judith Currano’s chapter on teaching chemical information especially informative and helpful. I too have spent far too much time showing students how to navigate databases without challenging them how to think about the actual information they are seeking. She discusses some key principles that students need to learn in order to effectively search for chemical information such as understanding the scope and organization of resources and realizing that not all information resources are created equal. Some authors in this book hint that librarians in the field may be outliving their usefulness but after reading Currano’s chapter it is clear that librarians are here to stay.

Another interesting theme that arose is the need for open access articles, data, electronic lab notebooks and other related resources that can promote collaboration and corroboration of the data.  It seems to me that much of this talk of open access and open data is quite idealistic in a field that is seemingly resistant to such concepts (at least in my limited experience). Who will put pressure on Elsevier and ACS?  Who will share data and what data will they share?  I would love to see a future title that seriously examines the obstacles and resistance both from publishers and researchers themselves that open access information and open data face. 

Patti McCall
University of Central Florida
patti.mccall@ucf.edu

Computational Organic Chemistry

Book Reviews
Computational Organic Chemistry, 2nd Edition; Bachrach, Steven M.; John Wiley & Sons, Hoboken, NJ, 2014: pp 1-632 + xiii, ISBN 978-1-118-29192-4 (hardcover), $125.

Even though he reported on the field via a blog in the 7 years since the first edition was published, the author decided to publish a welcome second edition. The key concept to both is the application of Quantum Mechanics (QM) for description of chemical reactions and properties. Use of these principles, aided of course by ever increasing computing power, is being used to determine details in both reactivity and structure determination of chemical compounds.  Much of the material has been updated including two new chapters on spectroscopy and enzymes.

The Preface is probably the best review of both the book and the field of computational organic chemistry, beginning with the history and rapid evolution of the field. For molecules the size and complexity of organic molecules, the fundamental Schrödinger equation (if solvable, the key to all molecular properties) cannot be solved exactly so a number of approximations are necessary to facilitate the process  For simplicity only ab initio methods (from basic principles) are considered which, given the continual increase in computing power, become increasingly practical. The ready availability of the number of computer programs also facilitates growth and practice.

The book is aimed both at existing or potential practitioners of computational chemistry and the latter can include both prospective occasional users or graduate students seeking an entry into the field. According to the author, prior expertise in quantum chemistry is not necessary to read the book but the QM taught in a typical undergraduate physical chemistry course should suffice. Chapter 1 is an introduction to the field, its concepts and definitions of the myriad abbreviations used. It can be used to better the understanding of subsequent chapters. Chapter 2 is on spectroscopy and the ability to use calculated spectra in structure determination. Chapter 3 is a brief introduction to several concepts of organic chemistry—including isomerism, acidity, and aromaticity—amenable to the application of computational studies.  Chapter 4 covers pericyclic reactions including my favorite, the Diels-Alder reaction. Radicals, and carbenes are covered in Chapter 5 and carbanions as well as organic catalysis in Chapter 6. Solvent effects are the subject of Chapter 7 and dynamic effects in reactions in Chapter 8. Although many biochemical molecules are too large for effective computation, the smaller molecules involved are covered in Chapter 9, the other new chapter.

The author emphasizes the personal aspects of the field. The first edition contained, at the end of several chapters, six interviews with a computational chemist working in that particular aspect, including the late Paul Schleyer. These are reprinted in this second edition and three additional interviews have been added. The book is lavishly illustrated, references are at the end of each chapter, and the index is extensive. The author maintains an associated website (www.comporgchem.com) to supply supporting information. In addition, the related blog (www.comporgchem.com/blog) will provide updates to the material in the book and is intended to serve a two-way function with reader comments welcome.

The reviewer’s educational background—50 years ago—and a math background that plateaued with an incomplete perception of matrices provides additional challenges for full comprehension. Nevertheless, I’ve developed an appreciation for what QM and computational chemistry can do for our understanding of chemistry.  I was especially pleased when, in both editions, the author described the controversy generated by Dewar 30 years ago that the mechanism of the Diels-Alder reaction was not concerted (i.e., that the formation of the new bonds was sequential and not simultaneous). However, Dewar was using semiempirical calculations whereas ab initio methods confirm other studies that the reactions are concerted and synchronous.  Other pericyclic reactions are also covered but I’m curious whether any computational work has been done on the related Ene reaction.

Why is this book being reviewed in the CIB?  It is a treatise in the broader field of chemical information, of value to others than practitioners.  Also the number of venues for book reviews in both chemical information and computational chemistry has become increasingly limited.  An additional disclaimer: I’ve known the author for 20 years and have found his contributions to our field of chemical information to be valuable. Recommended for the audiences described in the third paragraph.

Robert E. (Bob) Buntrock
Orono, ME
buntrock16@roadrunner.com

CINF Member Profile

Martin Walker
by Donna T. Wrublewski

 

Image

 

Who are you?

I’m a chemist who grew up in northern England and worked in the chemical industry. In 1992 I moved to the US, and since receiving my PhD from Brandeis University in 1998, I’ve worked in higher education. 

What do you do? (Institution, position, job description/duties)

I’m Professor of Chemistry at the State University of New York (SUNY) College at Potsdam. I teach organic chemistry, at the introductory level (lecture & labs) and advanced.  I also teach an online course on sustainability, and I chair our campus’s distance learning committee.

Why are you in the chemical information field? (Your background, what led you to chemical information, etc.)

My interest began when I worked in industry, where I did the literature searches for the R&D department, both via paper stacks of Chemical Abstracts, and using STN messenger.  My PhD adviser (James B. Hendrickson) was one of the pioneers of synthesis design by computer, and although I worked in the “wet lab” I always took a strong interest in the cheminformatics work of the group, and I attended the CINF national symposium at UVM in 1994 where I first got to experience the World Wide Web!

In 2004 I began working on Wikipedia as a hobby, and I soon realized the potential of the site for dissemination of chemical information. As well as writing articles, I also enjoyed working behind the scenes on issues like standards, article validation and assessment – similar to what we see in chemical information. I’ve tried to promote Wikipedia in the chemistry community, explaining how it works and to encourage contributions. I’ve also tried to foster good relationships between Wikipedia and other information providers, such as the Royal Society of Chemistry (RSC), Chemical Abstracts Service (CAS), and the International Union of Pure and Applied Chemistry (IUPAC).

I was also involved in setting up the RSC’s Learn Chemistry wiki, which sought to bring the power of ChemSpider into an education website – something I really enjoyed.  I’ve also begun to contribute to a joint CINF and CHAS project on using cheminformatics methods for hazard evaluation in the lab. For the future, I’d like to find ways to use open collaborative tools in chemistry and education, and for us to develop novel ways to share and disseminate scientific data.

What makes CINF valuable to you? (Include anything relevant, but especially any committees or projects in which you are involved.)

I’m not a librarian or a cheminformatics specialist, yet I’ve always felt more at home in CINF than in any other division. I usually feel like an enthusiastic amateur among professionals, but I think I can also contribute to the division by giving the perspective of the working chemist or the chemistry educator.  At the same time, I take away lessons from CINF sessions that I can apply in my teaching.  I’m finding the CINF/CHAS hazard evaluation project to be fun, and it has a lot of potential; in May I will be co-presenting on the project at the SLA DCHE/CINF conference.  In 2011 I organized a symposium to honor James B. Hendrickson, where it was good to bring together synthetic organic chemists with chemical information specialists.  I’m also a co-organizer for a session at this year’s ACS meeting in Boston on “Wikipedia and Chemistry: Collaborations in Science and Education.”     

Download the PDF

Fall 2015 ACS CINF Call for Abstracts

Fall 2015 ACS National Meeting Call for Abstracts Division of Chemical Information
Abstracts are Due March 13, 2015.

Submit abstracts here: http://tinyurl.com/CINF-Fall-2015abs

Program Chair: E. Davis, 5118 Palatine Ave. North, Seattle, WA 98103, (406) 546-8047, erinbolstad@gmail.com

Applications of Cheminformatics to the Diverse World of Natural Products. R. Schenck, rschenck@cas.org; A. Williams, tony27587@gmail.com

Chemical Information Skills: The Essential Tool Kit for Chemical Research. G. Baysinger, graceb@stanford.edu; J. Goodman, jmg11@cam.ac.uk

Chemogenomics: Cheminformatics in the Genetic World. R. Bienstock, rachelleb1@gmail.com

CINF Scholarships for Scientific Excellence: Student Poster Competition. G. Grethe, ggrethe@att.net

CINFlash: Workflow Tools Lightning Round. E. Davis

Crowdsourcing Public Scientific Communication: Wikipedia Contribution in Chemistry Classrooms. M. Walker, walkerma@potsdam.edu; Y. Li, liye@umich.edu

Current Topics in Chemical Safety Information (Cosponsored with CHED & CHAS). R. Stuart, ralph.stuart@keene.edu; L. McEwen, lrm1@cornell.edu

Enabling Machines To “Read” the Chemical Literature: Techniques, Case Studies & Opportunities. D. Lowe, daniel@nextmovesoftware.com

Find the Needle in the Haystack: Dealing with Large Chemical Spaces. D. Deng, dengw2@gmail.com

General Papers. E. Davis

The Growing Impact of Big Data in the World of Chemical Information. S. Ekins, ekinssean@yahoo.com; R. Potenzone, rudy@ichemlabs.com; A. Williams

The Growing Impact of Openness in Chemistry: A Symposium in Honor of J. C. Bradley. A. Lang, asidlang@gmail.com; A. Williams

Herman Skolnik Award Symposium. L. McEwen; R. Bienstock

Innovations in Clinical Data. A. Twiss-Brooks, atbrooks@uchicago.edu

Retrosynthesis, Synthesis Planning, Reaction Prediction: When Will Computers Meet the Needs of the Synthetic Chemist? D. Evans, david.evans@reedelsevier.ch

Scientific Integrity: Can We Rely on the Published Scientific Literature? W. Town, bill_town@mac.com; J. Currano, currano@pobox.upenn.edu

Substance Identifiers: Addressing the Challenges Presented by Chemically Modified Biologics—The Role of InChI & Related Technologies. K. Taylor, keith.taylor@laderaconsultancy.com; S. Heller, steve@hellers.com

Visualizing Chemistry Data to Guide Optimization. M. Segall, matt@optibrium.com; E. Davis

Workflow Tools & Data Pipelining in Drug Discovery. T. Dudgeon, tdudgeon@informaticsmatters.com; E. Davis

Future ACS Meetings

 

250th

Aug. 16–20

2015

Boston, MA

Innovation from Discovery to Application

 

251st

Mar. 13–17

2016

San Diego, CA

Computers in Chemistry

252nd

Aug. 21–25

2016

Philadelphia, PA

Chemistry of the People, by the People, and for the People

253rd

Apr. 2–6

2017

San Francisco, CA

   TBD

254th

Aug. 20–24

2017

Washington, DC

   TBD

255th

Mar. 18–22

2018

New Orleans, LA

   ״

256th

Aug. 19–23

2018

Boston, MA

   ״

257th

Mar. 31–Apr. 4

2019

Orlando, FL

   ״

258th

Aug. 25–29

2019

San Diego, CA

   ״

259th

Mar. 22–26

2020

Philadelphia, PA

   ״

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In Memory of Frank H. Allen

 

Dr Frank H. Allen passed away on November 10, 2014, aged 70. Colin Groom of the Cambridge Crystallographic Data Centre (CCDC) reported: “Frank joined the Chemical Crystallography Group at the University of Cambridge in 1970 and played a pivotal role in the establishment of the Cambridge Structural Database. He went on to become the Scientific Director and then the Executive Director of the Cambridge Crystallographic Data Centre. Following his retirement in 2008, Frank remained with the CCDC as an Emeritus Research Fellow, enabling him to continue to indulge his passion for structural chemistry. Frank’s research involved collaboration with many scientists around the world, resulting in over 200 papers. He was also a wonderful teacher, supervising more than 20 doctoral students and introducing many more to structural chemistry through workshops over many years. His contributions to other influential organizations, his vigorous editorship of Acta Crystallographica, the numerous conferences he organized and presentations he made meant Frank was known to and respected by crystallographers the world over. Frank has long been a leading figure in international crystallography, and was a wonderful colleague, becoming a friend to all those who worked with him. He will be sadly missed.”

An obituary has been published: Taylor, R. Acta Cryst. 2014, B70, 1035-1036 doi:10.1107/S2052520614026201 http://scripts.iucr.org/cgi-bin/paper?S2052520614026201. Frank was the ACS CINF Herman Skolnik Awardee in 2003. A detailed biography, written in that year, and thus rather out of date, appears at http://www.acscinf.org/content/2003-herman-skolnik-award-memoriam-frank-allen. As a tribute to Frank, I have reproduced a section of my report on the relevant ACS meeting for this issue of the Chemical Information Bulletin.

 

Wendy A. Warr

 

 

RE-PRINTED EXTRACT FROM
CHEMICAL INFORMATION AND COMPUTATION 2003, NUMBER TWO

 

 

226TH ACS NATIONAL MEETING AND EXPOSITION
NEW YORK, NEW YORK, SEPTEMBER 7-11, 2003

 

 

 

 

 

 

A report by Dr. Wendy A. Warr
Wendy Warr & Associates
February 2004

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dr Wendy A. Warr
Wendy Warr & Associates,
6 Berwick Court Holmes Chapel,
Cheshire CW4 7HZ,
England Tel/fax +44 (0)1477 533837
wendy@warr.com  
http://www.warr.com

 

 

 

 

 

  American Chemical Society
ImageDivision of Chemical Information  Image

Herman Skolnik Award Symposium

Crystallographic Databases
and their Applications

Tuesday 9 September 2003

In recognition of the presentation of the Herman Skolnik Award for 2003 to

Frank H. Allen

Cambridge Crystallographic Data Centre, Cambridge,UK
 

Herman Skolnik Award Symposium

Crystallographic Databases and their Applications

Tuesday 9 September 2003

In recognition of the presentation of the Herman Skolnik Award for 2003 to

Frank H. Allen

Cambridge Crystallographic Data Centre, Cambridge,UK

 

 

Image

2261  ACS National Meeting
Jacob Javits Convention Center, New York, NY

 

The Herman Skolnick Awardee 2003

Image

 

Dr. Frank H. Allen
Cambridge Crystallographic Data Centre, Cambdridge UK

FrankAllen is Executive Director of the Cambridge Crystallographic Data Centre (CCDC) and is responsible to the Board of Governors for the overall operation of the CCDC. He has been with CCDC since 1970, following undergraduate and graduate studies (BSc, ARCS, DIC, PhD) at Imperial College, London. UK and postdoctoral work at the University of British Columbia, Vancouver,Canada.

He has beeninvolved in most majordevelopments at the CCDC, indudingcreation of the Cambridge Structural Database (CSD) of organic and metal-organic crystal structures, and software development for structure validation.chemical indexing,database searching and numericaldata analysis.A particular interest has beentheapplication of the accumulated CSD data for research purposes. He has published more than 200 papers in crystallography, chemistry and chemical informatics,and has edited 15 reference books and conference proceedings volumes

Honours and professional activities include:Fellow of the Royal Society of Chemistry (FRSC). 1992; RSC Siver Medal and Prize for Structural Chemistry, 1994; Vice-President, British CrystallographicAssociation, 1997-2001; CouncilMember, European CrystallographicAssociation, 1997- 2001; Editor, Acta Crystallographica, Section B, 1994-2002 (IUCr);Chair,IUCr Committee on Crystallographic Databases 1999-. Editorial Boards: Chemical Communications, Structural Chemistry, Croatica Chimica Acta, Crystallography Reviews. He was appointed Visiting Professor of Chemistry at the University of Bristol in 2002.

Symposium  Programme

Crystallographic Databases and their Applications

8:30               Introductory Remarks
Frank H.Allen (CCDC, Cambridge, UK)

8:40               The Cambridge Structural Database (CSD) and its research applications in structural chemistry
Frank H.Allen (CCDC, Cambridge, UK)

9:20               Data mining of crystallographic databases as an aid to drug design

Robin Taylor (CCDC, Cambridge, UK)

10:00              Intermission

10:20             The evolution of the Protein Data Bank
Helen M. Berman, J.D. Westbrook, PE. Bourne, GL. Gilliland,
J.L. Flippen-Anderson and the PDB Team (Rutgers U., NJ, and SDSC, San Diego, CA, and NIST, Washington DC, USA)

11:00              The Protein Data Bank (PDB) as a research tool
Philip E. Bourne, J.D. Westbrook, Helen M. Berman, GL. Gilliland,
J.L. Flippen-Anderson and the PDB Team (SDSC, San Diego, CA, and Rutgers U., NJ, and NIST, Washington DC, USA)

11:40              Lunch

2:00               When can fractional crystallisation be expected to fail?
Information from the Cambridge Structural Database
Carolyn P. Brock (University of Kentucky, Lexington, KY, USA)

2:40               Applications of the Cambridge Structural Database to molecular inorganic chemistry
A. Guy Orpen (University of Bristol, Bristol, UK)

3:20               Intermission

4:00               Materials informatics: knowledge acquisition for materials design
John R.Rodgers (Toth Information Systems Inc., Ottawa. Canada)

4:40                First principles calculated databases for the prediction of intermetallic structures
Gerd Ceder, S. Curtaro/o, D. Morgan. J.R. Rodgers (MIT, Cambridge, MA, and Toth Information Systems Inc., Ottawa, Canada)

5:20               Close

 

Image

 

Applications. Chemistry, Biology and Drug Design

 

The Cambridge Structural Database (CSD) and its research applications in structural chemistry. Frank H. Allen, Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, United Kingdom, Fax: 44-1223-336033, allen@ccdc.cam.ac.uk

The Cambridge Structural Database (CSD) contains X-ray and neutron diffraction analyses, for single crystal and refined powder studies of organic and organometallic compounds.  The data came mostly from the open literature, although about 1% it is from private communications. Each structure forms a CSD entry, identified by a Reference Code. The contents of CSD are text and numerical data, 2D chemical structures and experimentally determined 3D structures. The 2D structure is mapped onto the 3D one. CSD has grown enormously form 1970 to 2000. As of August 14, 2003, it contained 297,507 structures. The prediction is that it will contain more than 500,000 structures by the end of 2010. Allen gave a diagram of the software supplied with CSD, for converting data into knowledge:

Image

 

 

ConQuest searches text and numerical data, structures in 2D and 3D, and intermolecular, non-bonded contacts. It retrieves a database subset and a user-defined set of geometric parameters for each structure located. Structures are visualized in Mercury. ConQuest, Mercury and VISTA permit structural chemistry in the CSD to be mined from the raw data. This is knowledge mining not data mining. Crystallographic knowledge, and intramolecular and intermolecular structural knowledge are mined.

 

Allen discussed intramolecular structural knowledge first. The Cambridge Crystallographic Data Centre (CCDC) published tables of standard bond lengths in J. Chem. Soc. Perkin Trans. 1987, S10-S19 and J. Chem. Soc. Dalton Trans. 1989, S1-S83. These form the standard “bible” of bond lengths. Conformational preferences can be determined by computational methods which give energies for model compounds in vacuo, can give multiple minima, and are less well developed, for example, for metal complexes. Condensed phase crystal data has no direct energy estimates. It is quick to use, high quality experimental data, which can be used to validate computational results. There remains, however, the question of whether conformations are affected by crystal packing. Histograms, scattergrams and multivariate methods can be employed.

Allen’s first example was cyclopropyl carbonyls. The structure

Image

 

is a key one in pyrethroid insecticides. What is the conformational relationship of the carbonyl group to the ring? It is found by searching CSD for the structure, calculating the O1-C9-C1-X (2,3) torsion to describe the conformer, and displaying in VISTA/Mercury. Allen showed a polar histogram in VISTA and a mercury plot (optionally, ball and stick).

 

Conformer mapping of benzophenones was published by Rappaport in J. Am. Chem. Soc. 1990, 112, 7742. The crystal conformers populate low energy regions of the 2D potential energy hypersurface. A low energy valley in the hypersurface shows the conformational preferences of the rings. 1’-aminoribofuranoside has five torsion angles. The main conformers are C1 endo and C3 endo. A general definition of ring puckering co-ordinates was published in Cremer, D.; Pople, J. A. J. Am. Chem. Soc. 1975, 97(6), 1354-1358. Mapping of crystal structure data using CP phase angles for cyclooctane rings (Acta. Cryst. 1996, B55, 882-889) showed that the twist-boat-chair is the preferred conformation.

 

A study on the effects of crystal packing on conformation was reported in J. Comput.-Aided Mol. Des. 1996, 10, 247-254. CSD torsion distributions were generated for 12 common fragments. Energy profiles were calculated at 6-31G* level. Allen showed the CSD torsion distributions versus potential energy curves and indicated the anti and gauche conformations. It was concluded that torsion angles with strain energies greater than 1 kcal/mole are rare in crystal structures. Taken over many structures, the effects of crystal packing on conformation seem to be the exception rather than the rule. Thus, crystal structure observations are good guides to the conformational preferences of isolated molecules.

 

Next, Allen discussed intermolecular structural knowledge. Such knowledge is useful in supramolecular synthesis, crystal engineering, crystal growth, structure determination, drug design, drug delivery, ab initio crystal structure polymorph prediction, and protein folding. The types of internal interaction that occur can be studied by crystallography or spectroscopy. Their geometric characteristics are studied by crystallography. Crystallography can also be used to find out if the interactions are directional. Ab initio calculations are needed to find out how strong they are.

 

An example is N-H…O (amide) hydrogen bonding. Extended crystal structures are searched in ConQuest for:

C=O…H < vdW +0.4
=O…H-N 90.0-180.0°
crystal R factor < 5%.
The following geometry is calculated:
O…H distance
O…H-N angle
C=O…H angle
Angle between N-H vector and amide plane.

 

Allen showed a typical CSD hit in Mercury and a parameter spreadsheet in VISTA. He also showed VISTA histograms for the O…H distance and a scatterplot for N-H…O versus O…H. Shorter hydrogen bonds correspond to the more linear N-H…O angles and tend to approach the O-acceptor in the plane of the >C=O system. The use of CSD in conjunction with ab initio calculations known as intermolecular perturbation theory (IMPT) is reported by Hayes and Stone in Mol. Phys. 1984, 53, 84-98. The interaction energy for fixed mutual orientations of small (model) molecules was calculated. CSD was used to indicate the preferred mutual orientations for exploration of the energy hypersurface using 6-31-G** basis sets. The total energy was calculated as a sum of individual components calculated by the IMPT procedure.

 

Hydrogen bonding at the C=S acceptor is reported in Acta Cryst. 1997, B53, 680. The electronegativities of C and S are both about 2.5 but thiourea is dominated by C=S…H bonding. Why is this? The dipole moment of C=S in CH2=S is small and in the opposite direction to that of C=O in CH2=O. However, when H is changed to NH2 in urea, the dipole moment of C=S is reversed, and S becomes a medium strength H-bond acceptor. Allen compared energy versus angle curves for IMPT energies of >C=S…H-O and >C=O…H-O.

 

Next, Allen outlined some highlights of hydrogen bond research. CSD has been used in establishing lone pair directionality and in studying resonance-assisted and resonance-induced hydrogen bonding, hydrogen bonding motifs and probabilities of their formation, and weak hydrogen bonds. Weak hydrogen bonds were a new subject in the 1980s. A book on the subject by Desiraju and Steiner was published by Oxford University Press in 1999. Examples of weak hydrogen bonds are:

 

-C-H…O,N,Cl
-C≡C-H, C=C-H and C-C-H as hydrogen bond donors
O,N-H…π and C-H… π bonds.

 

The C-H…O saga is illustrated by Sutar, D. J. J. Chem. Soc. 1963, 1105, which showed that short C-H…O bonds can be described as hydrogen bonds and Donohue, J. Structural Chemistry and Molecular Biology, published by Freeman in 1968, where it is said that the C-H…O “hydrogen bond” is a close contact. The years 1968-1982 were the dark ages of weak hydrogen bonds. In J. Am. Chem. Soc. 1982, 104, 5063, Taylor and Kennard proved that these bonds could be called hydrogen bonds. This is the 60th most highly cited paper in J. Am. Chem. Soc.

 

Allen briefly discussed some interactions not mediated by hydrogen, namely CO-CO interactions. He mentioned the anti-parallel motifs in the structures of BUCHAI and BAGTIM (the codes for two structures in CSD), the two dipoles and IMPT results for the anti-parallel motifs. It has been shown that when this bond can form it is quite a strong interaction. It can have a significant effect on protein secondary structure.

In conclusion, Allen quoted from Poincaré: “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house”. Informatics converts an accumulation of facts into fundamental structural knowledge with myriad applications. Every crystal structure is valuable and contributes to the creation of this knowledge. Unfortunately, the ConQuest-VISTA-Mercury process takes time and is not well integrated with the crystallographic or modelling software. In future there will be improved integration. Structural knowledge must be rapidly accessible and readily available to other groups. Many thousands of crystal structures are not being published; something must be done about this.

 

In a separate paper in the technical program of the Division of Inorganic Chemistry, Allen enlarged upon this problem. CCDC is developing rapid routes for placing new crystal structures into the public domain. Every crystallography machine can produce 700 structures a year. By the end of 2003 there may be 300,000 structures in CSD but there should be more than this in the high throughput era. Increasing numbers of new structures are never published in journals. For some laboratories the proportion may be as high as 75%.

 

The situation can only get worse. The logjam has moved to the publication process. The scientific community is losing valuable data resources. The accountability of crystallographers is compromised. The instruments are provided with public money and the data should be made public. Allen gave a 2003 variation on a quotation of Bernal’s: the growing abundance of crystal structure data and the time required to place them into the public domain act as a brake, or an element of friction, to the progress of science.

 

He listed some of the “brakes”. First there is the pressure of time: the process is labor-intensive. Who owns the data: the chemist who made the compound or the crystallographer who determined the structure? Who is responsible for publication? When the structure turns out to be not as expected there is often a lack of interest in publication. If the chemistry is rejected by referees some good crystal structures that go with it may never get published. There is a need for academic recognition for publication of structures.

 

Submission of pre-publication electronic data to CCDC is now required by an increasing number of journals: about 70 to date. The CCDC deposition number is printed in the published paper. CCDC has an archive of about 120,000 Crystallographic Information Files (CIFs) which are freely available at the CCDC Web site. CSD has been opened up to private communications; there are 2054 so far, 1648 of them (80%) submitted since 1997. Electronic journals are another possible solution to the problem of unpublished data: Acta Cryst. E was started in 2001 and published about 1000 structure in 2003.

 

Allen suggests that for high speed publication, structures should be directly deposited with CCDC for immediate entry to the distributed version of CSD, or for holding in the secure archive of CIFs with automatic publication to be allowed after, say, three years. In the future, Allen foresees automatic data harvesting by CCDC via a GRID route, accessing files placed in specific locations. He closed with some questions. What is a “publication”? Do CSD entries constitute sufficient recognition in their own right? Can data be separated from words and pictures? What is a database? Is it a secondary source of information or is it a primary one? What about quality control and refereeing, and adding a validation report to the CIF?

 

Data mining of crystallographic databases as an aid to drug design
Robin Taylor, Cambridge Crystallographic Data Centre, 12, Union Road, Cambridge CB2 1EZ, United Kingdom, Fax: 44 1223 336033, taylor@ccdc.cam.ac.uk

 

The requirements for crystallographic databases have changed over the last 5-10 years. User expectations have vastly increased: people expect easy answers. ConQuest search is not a one-step process but people want the information more easily. This is a challenge. User needs have changed vastly. Virtual high throughput screening has raised the bar: many, many more molecules are being studied. Crystal data continues to be valuable, because of the exquisite knowledge it gives us, but it must be instantly accessible. Taylor considered four levels of sophistication: processing raw data to make it easier to assimilate; coupling processed data to an application program; processing raw data into a knowledge base that can be coupled to any third party application; and processing raw data into objects for manipulation in an object-oriented script language.

 

The IsoStar database of intermolecular interactions can be used at the first level. It has information about non-bonded contacts, coming mainly from the Cambridge Structural Database (CSD) but to some extent from the protein data bank, PDB. CSD (or PDB) is searched for structures containing the desired contact and the hits are superimposed. Taylor used an example of least squares overlay of ketone groups. The results can be displayed as a scatter plot. Taylor showed such plots for ketone…OH and ether…OH. Because the ketone has mm symmetry, all the contacts are in one quadrant. The plot tells the user how frequently the contact occurs and what geometry it has. Ketone is quite common and forms many contacts to OH, implying (of course) that the contact is energetically favorable. There is a lack of hydrogen bonds along the C=O direction but there are more such bonds in the sp2 lone pair direction. Taylor showed how the plots can be rotated in 3D, indicating the hydrogen bonds in the lone pair plane for ether…OH.

 

He gave three examples of contacts to phenyl rings: Cl-, O and CH. He showed that the electronegative chloride ions and oxygen atoms tend to cluster around the edges of the ring with the weakly electropositive CH groups sitting above the pi electron density. IsoStar can be used to give quick answers to straightforward questions; to identify which groups will hydrogen bond and what directional preferences they show; to establish precedents for an interaction; and to generate ideas, suggesting novel ways to achieve non-covalent bonding. An example of this last use is suggesting ways in which bonding might be achieved to the indole ring of a tryptophan ring system. Hydrogen bonding to an NH is obvious but there are other strategies, for example as in the NH…π hydrogen bond in the CSD structure coded FIZWOA01. This could be used in a ligand design strategy.

 

At the second level of sophistication, the program SuperStar (the name of which indicates “IsoStar plus superimposition”) is used to find binding points on proteins using IsoStar data. Goodford’s GRID program does this based on molecular mechanics calculations but the SuperStar approach is based on experimental information. SuperStar calculates maps that depict the propensity for a functional group (probe group) to bind at different positions around a protein binding site or small molecule. SuperStar allows users to calculate interaction maps as 3D distributions (contour surfaces). IsoStar plots can be congested. The scatter plots are embedded in a grid, and after counting and contouring, an interaction surface is produced. Dividing the observed density by the expected density puts the plots on a meaningful scale.

 

For example, the stoichiometry of the crystal structure is examined and this information is used to see how many OH groups are expected at random. The actual number observed at a specific volume element can be divided by the random expectation value to give a propensity for contacts to occur at that position in space. A propensity less than one means a non-favorable interaction; a propensity greater than, or equal to, one means a favorable interaction. Taylor gave an example of ionized carboxylate with amino groups around it, contoured at a contour level of six, i.e., six times more than expected.

 

The procedure is as follows. Prepare the template molecule (e.g., protein binding site). Select a probe atom. The probe atom is a specific atom in the probe group for which the propensity will be calculated. It is usually an atom in an IsoStar contact group, e.g. carbonyl oxygen. Place the template molecule on a suitable three-dimensional grid. Analyze the template molecule and break it into fragments for which data are available in IsoStar. (The fragments correspond to IsoStar central groups). Overlay the IsoStar scatter plots onto the corresponding parts of the template molecule. In this way, all IsoStar information is projected onto the template molecule. Convert each transformed scatter plot to a density map, and scale the density to propensity; all maps are on the same propensity scale after performing this step. Combine overlapping maps by multiplication. Contour the final map and display.

 

Taylor displayed a SuperStar map for glutathione transferase (1glp) and results for CSD and PDB. The answers were much the same. Intra- and inter-molecular data can be combined. Taylor showed an IsoStar display of the distribution of carbonyl oxygens around an OH group, and a H-C(C)-O-H histogram, where SuperStar shows the OH in a secondary alcohol spinning. These two collections of information can be combined to indicate the preferred positions of carbonyl oxygen around a secondary alcohol, taking into account that the hydroxyl group can rotate to optimize its hydrogen-bonding interaction.

 

At the third level of sophistication, such distributions can be made available to other programs. Mogul provides intramolecular geometry data to people or client programs. It gives extremely rapid access to information on the preferred values of bond lengths, valence angles and acyclic torsion angles, using data derived from the CSD. Input to Mogul is a complete molecule, not a substructure. Given the instruction to retrieve data for a particular feature in that structure, e.g. a valence angle, Mogul will automatically derive a search query and use it to find the relevant CSD entries. The resulting statistics, such as the mean and median valence-angle values can then be passed via an ASCII file interface to other programs.

 

There are three libraries under Mogul, containing bond length, valence angle and torsion angle fragments generated from every entry in CSD. Every fragment is classified by evaluation of keys. Fragments are grouped together so that all fragments with the same set of key values are assigned to the same distribution. The distributions are accessible by searching a tree indexed on key values. Thus, evaluation of key values for a query fragment, followed by traversal of the tree, will find the distribution containing CSD fragments with the same key values. Taylor showed the keys used (about 20 of them) for a typical angle fragment.

Image

Tree search (traversing the tree based on keys) takes less than 1 second. Mogul is interactive and easy to use. For example, the user drags in a structure, clicks on three atoms and displays a histogram for that type of bond angle. The human element can even be removed and third party software can interact directly with Mogul. The program can be used in modelling for conformation validation (e.g., for filtering docking solutions) or in conformation generation. It can be used in crystallography for geometry validation and creation of restraint data and ligand dictionaries.

 

Finally, Taylor considered the fourth level of sophistication and Reliscript for manipulation of protein and ligand objects in the Python scripting language. Taylor presented a diagram in which all the items are objects.

Image

He showed a dubious docking where carboxylate formed some ugly looking contacts. Suppose that the user wants to access information from PDB to see if this carboxylate contact is likely. The procedure is as follows. Find all ligand carboxylates. Find all carboxylate-protein contacts. Determine the percentage of hydrophobic and polar contacts in PDB which are in the same or a worse environment than in the docking solution. Taylor showed some sample code with a SMILES search object in it. The result was that about 6% of carboxylates were classified as in unfavorable environments. The docking pose was in the worst 3%. This docking solution could thus be excluded. There are many ways in which the script could be extended, e.g., resolution limit, including crystal packing etc.

 

Crystallographic database providers must respond to the changing needs of users and they are doing so. This involves a change away from traditional structure search towards pre-processed data, application programming interfaces and the use of scripting languages.

 

The evolution of the Protein Data Bank
Helen M. Berman1, John D. Westbrook1, Philip E. Bourne2, Gary L. Gilliland3, Judith L. Flippen-Anderson1, and PDB Team4. (1) Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, NJ 08854, berman@rcsb.rutgers.edu, (2) San Diego Supercomputer Center, University of California, San Diego, (3) National Institute of Standards and Technology, Center for Advanced Research in Biotechnology, (4) Rutgers, SDSC/UCSD, CARB/NIST

 

Community discussions leading up to the formation of the Protein Data Bank (PDB) began in the late 1960s and early 1970s. A meeting of protein crystallographers at Cold Spring Harbor Laboratory preceded the establishment of the PDB at Brookhaven in October 1971. At that date, it contained seven structures. During the 1980s the number of structures increased. There were discussions about requiring depositions, following which the International Union of Crystallography (IUCr) guidelines were established. Thereafter, the number of structures deposited increased. Independent biological databases such as the Nucleic Acid Database (NDB) were established.

 

In the 1990s, the macromolecular Crystallographic Information File (mmCIF) project (another IUCr initiative) was completed. The mmCIF format expands the CIF dictionary by including data items relevant to the macromolecular crystallographic experiment. Eight years of work culminated in the establishment of a data dictionary and an ontology. During the same period, structural genomics was born. The PDB moved to RCSB (Research Collaboratory for Structural Bioinformatics) and is now managed by Rutgers, the State University of New Jersey, the San Diego Supercomputer Center at the University of California, San Diego, and the Center for Advanced Research in Biotechnology of the National Institute of Standards and Technology. International participants in data deposition and processing include the European Bioinformatics Institute Macromolecular Structure Database group (UK) and the Institute for Protein Research at Osaka University (Japan).

 

The mission of the PDB is to provide the most accurate, well-annotated data, in the most timely and efficient way possible to facilitate new discoveries and advances in science. Challenges are the growth in the number structures and the increase in their complexity. There are new methods for structure determination such as NMR cryoelectron microscopy. Users are demanding more complex queries: they do not just request co-ordinates but expect analysis. They are also requiring more annotation and integration with other genomic and proteomic information. The community of users is much larger and more diverse.

David Goodsell at The Scripps Research Institute has developed some lovely graphics for the types of structures in PDB. The database has a rich assortment of molecules. In 1995 there were about 5000 structures; now there are more than 24,000. Berman showed a growth curve. Types range from myoglobin in 1972 to the ribosome in the 1990s. Berman displayed a cityscape showing the growth in complexity. Structures had 1-2 chains in the 1970s; in 2003, some structures had 30-50 chains. Berman also illustrated the change in the number of new folds as a percentage of total PDB depositions. In 1980, 60% were unique folds; the percentage was less than 10% in 2001. Only 14% used to have less than 30% similarity; this number is now 5%. Berman tabulated some statistics, and then gave a data processing workflow diagram.

 

 

1993

1998

2003

Total structures

1727

8942

23,792

Number of structures deposited per year

792

2178

4,831

Average number of Web hits per day

N/A

57,000

188,000

Image

 

Image

The data processing system is based upon the CIF editor ADIT.  Different dictionaries can be put underneath ADIT without any software changes. Both functionality and content of ADIT can be simply customized. The data processing system automatically scales with changes in content. The data can be distributed to multiple deposition sites. There were many more items of data content in the 1990s than there were in the 1970s. Nowadays there are 350 data items per structure on average; in the early days there were only 200.

 

Berman gave a schematic diagram of the current query system and showed the structure explorer page from the Web. There are links to CATH structure classification, PDB Sum, a summary of the PDB structure, and SCOP, the structural classification. The site acts as a portal to other databases.

 

In order to achieve data uniformity, all the data files had to be reprocessed, and the data had to be validated and corrected, before integrated mmCIF files could be produced and loaded into a relational database management system. High quality data is needed for reliable query results. The processing has led to a greatly enhanced search capability extending from the biological assemblies down to atom level, and improved portability to other database efforts. Work is well advanced on the design of a new PDB with a three-tier architecture. Berman gave a diagram of the new query functionality, and a data flow diagram.

Image

Image

(See also Greer, D. S.; Westbrook, J. D.; Bourne, P. E. An Ontology Driven Architecture for Derived Representations of Macromolecular Structure. Bioinformatics 2002, 18, 1280-1281.) A more recent publication is Bourne, P. E.; Addess, K. J.; Bluhm, W. F.; Chen, L.; Deshpande, N.; Feng, Z.; Kramer Green, R.; Merino-Ott, J. C.; Townsend-Merino, W.; Weissig, H.; Westbrook, J.; Berman, H. M. The Distribution and Query Systems of the RCSB Protein Data Bank. Nucleic Acids Research 2004, 32, D223-225. In future the structure should be the user interface so the user can find information such as what other structures contain the same ligand, or what other structures have chains with >90% sequence identity directly from looking at a particular entry.

 

Berman next talked about the new challenges of structural genomics. She showed a flowchart

 

Target selection → crystallomics → data collection → structure solution → structure refinement → functional annotation → publication

 

A target registration database has been constructed (http://targetdb.pdb.org), containing 49,000 sequences, all downloadable in XML. The scope of TargetDB is to provide timely status and tracking information on the progress of the production and solution of structures. The targets are downloaded from 16 centers weekly. PDB entry sequences have been integrated. Targets can be searched by sequence (with FASTA), project target ID, project site, status (selected, cloned, expressed, ... in PDB etc.), update date, protein name and source organism. Reports of results can be constructed in HTML, FASTA and XML formats. Almost 600 structures in the PDB are from the structural genomics projects. The next stage beyond this will be a protein expression, purification and crystallization database, PEPCdb. It will include all the information about targets including the protocols for protein production.

 

 

The PDB (Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Research, 2000, 28, 235-242) has world-wide mirror sites, which means that users get the same structural information from anywhere in the world.

 

The Protein Data Bank (PDB) as a research tool

Philip E. Bourne1, John D. Westbrook2, Helen M. Berman2, Gary L. Gilliland3, Judith L. Flippen-Anderson2, and PDB Team4. (1) San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, bourne@sdsc.edu, (2) Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, (3) National Institute of Standards and Technology, Center for Advanced Research in Biotechnology, (4) Rutgers, SDSC/UCSD, CARB/NIST

 

One of the PDB’s stated goals is “through timely distribution to enable complete analysis of macromolecular structure data”. One way of doing this is to ensure Web delivery, which gives the user access regardless of geographic location. Integration of crystallographic databases with applications is now critical. There are three classes of users: educators and students; structural biologists and chemists; and computational chemists and biologists. The PDB is already an important research tool, containing 22,333 structures (as of September 2, 2003) comprising 765 discrete folds, 2,164 protein families, and 20,907 structures containing proteins, 6,647 at the level of 90% sequence identity. Lots of structures have post translation modifications. Access 24 hours a day, seven days a week, is critical to a world-wide audience.

 

In future there will be further structure diversification. Bourne gave a graph illustrating the impact of structural genomics.

Image

He also showed a cityscape of fold distribution: the number of folds versus SCOP fold id. The graph illustrated the major types of folds as found in the PDB and as predicted for all structural genomics targets and for certain model organisms, including H. sapiens and E. coli. Some folds are over represented in the PDB representing a bias, but this is compensated by an under representation in the targets, that is, structures being attempted. PDB is the primary source of information from which secondary and tertiary sources, and value-added services, are made. By application of reductionism to PDB followed by further action, secondary sources (protein families, genomics, protein-protein interactions, dynamics, modeling) are made. How is the PDB facilitating this? In 1998 there was little software; in 2003 there were standard toolkits; in 2008 there will be extensive software. In 2000, PDB offered links, in 2003, Web services. In 2008, it plans to offer “PDB-in-a-box” and “MyPDB”.

New query and related features added recently are a sequence homology fitter, and XML (enabled by the macromolecular Crystallographic Information File, mmCIF). The “biological unit” is now handled. Features in alpha test include: query on PubMed abstracts; integration of data from other sources (e.g., SwissProt); improved ligand descriptions; review of a biologically active molecules; further experimental detail; relationships to disease; better classification of structures (compound/chain/ligand); and relationship to cellular location, molecular function and biochemical process.

 

Bourne passed a comment on the importance of usability. Biology suffers from the “high noon syndrome”. This is like the “12:00” symbol flashing on the video machine because people cannot be bothered to program the video: the barrier to entry is too high. People will only input data if the system is easy to use.  This has prompted PDB to offer better navigation of site content; more dynamic and intuitive access; better keyword access; query by example; better molecular visualization using a new toolkit; “MyPDB”; and use of Web services and CORBA.

 

Bourne gave an example of the research impact of all this. A search for “apoptosis” gave 103 hits recently but search for “apoptosis” on the new site gives 168 hits (SwissProt has been added) and annotation is better. Searching and displaying results takes less than half the time it used to take. A new visualization toolkit is being developed: see http://mbt.sdsc.edu. Local file and remote data loaders for PDB, mmCIF, and FASTA have been developed. 2D and 3D views are coupled. A rich and extensible API is being developed. Portable and Web deployable Java is used. Bourne also mentioned the molecular biology toolkit (MBT) structure, sequence and tree viewers. MBT is a flexible toolkit from which a variety of applications can be built. Applications are delivered via the Web as Java applets or run as stand-alone programs and allow integration and visualization of a variety of biological data types, most notably sequence and structure.

 

Thus, the human user has been considered but applications must also be enhanced. There has been a paradigm shift in the way that people work. Web services are finding favor. Nowadays people download the data and do the analysis locally but Web services will overcome this problem and allow use of up-to-date data, with applications described as “even I can do it”. Bourne gave an example of a Perl Web services client, showing some code from a small Perl program to access all PubMed abstracts containing the word “ferritin”. Each month, the query is updated automatically. This is also easy to do in Java.

 

Bourne discussed two scenarios for each of the three classes of users: educators and students, structural biologists, and chemists. The first concerned the educator searching for “ferritin”. The aim is to offer him new keyword search techniques: a “PDB Google”; new navigation tools; and the facility to search the database and Web content. Bourne showed literature references with a hyperlink to display, and a 4-helical bundle as the basic unit and then the biological molecule consisting of 24 such units displayed.

 

The second scenario concerned the structural biologist or chemist asking the following questions. What are the components of this quaternary complex of protein kinase A? What else can I learn about protein kinase A? With what diseases is it associated? In the new system there is a ligand viewer. Components of the structure are now well described. Users can tabulate and search ligand/chain/residue. Again, this is an effort to tackle the “high noon effect”. Current links are maintained to over 60 Web sites. In the disease browser, the user can browse by disease name, view the numbers of associated PDB structures, search for structures; and search for a disease name.

 

In five years’ time it is hoped that PDB will offer detailed descriptions of macromolecule-ligand interactions; a better description of stereochemistry, a better description of the overall contents of the database; relationships to genomic sequence descriptions; query by user type; and visual queries, for example, by molecule.

 

 

 

 

 

 

In Memory of Paul von Ragué Schleyer

In the early ‘60s, Paul von Ragué Schleyer was already making a name for himself. He had already while still at Harvard serendipitously discovered the Lewis Acid reagent used to perform rearrangements of cyclic hydrocarbons and his Princeton group continued to explore the possibilities. Specifically, rearrangement of hydrogenated cyclopentadiene with aluminum chloride produced adamantane in yields of 30-40%, far greater than yields of previous methods. This reaction and subsequent improvements made adamantane readily available. The research was extended to additional polycyclic (diamandoid) compounds.  Because most of Schleyer’s research involved cyclic hydrocarbons, he purchased one of the first Hewlett-Packard A-60 NMR spectrometers.  Use was not limited to the Schleyer group but if another researcher had use for an NMR spectrum they could run spectra, with instruction from Ray Fort, one of Schleyer’s grad students. My research, both as an undergraduate and graduate student, involved hydrogen-poor heterocycles so I rarely needed NMR. However, the A-60 was a great improvement over the huge, vacuum tube A-40 we had at the University of Minnesota. Both of Schleyer’s research and adjuncts like NMR were often discussed in lectures and seminars. The entire Department became well versed in the history, theory, and uses of this research even if they were not working in those areas or using that equipment.

Another feature often encountered by visiting the Schleyer labs was the camphoraceous aroma of adamantane if a preparation had been recently run.  A mound of glistening white adamantane crystals would be air drying on brown paper on a lab bench.  I once asked Ray Fort why he processed the run that way since adamantane readily sublimes. He answered that was the best way to dry it and not that much was lost. The Schleyer labs were also filled with the sound of classical music, much appreciated by many of us who also maintained radios for listening to classical music on our labs. In Princeton we had access to at least three stations broadcasting the classics from New York and Philadelphia so we were never without our favorites.

Schleyer was a Princeton undergraduate and proud of it (he had been on the swimming team). That, coupled with his general demeanor, led to his nickname, not necessarily in his presence, of “Big Gruff Tiger”.  He rarely smiled but when he did it was more of a patented leer, but he was fair in all of his dealings and was one of the best teachers I ever had. 

Princeton, through geography and reputation, was able to attract a large number and wide variety of guest speakers for seminars. The more senior professors typically sat in the front row and made cogent comments both during the presentation and the Q&A period at the end.  The venerable Hugh Scott Taylor, largely responsible for the excellent reputation of Princeton in Physical Chemistry, would make very cogent comments and politely ask pointed questions, especially on Physical Chemistry subjects.  Schleyer, on the other hand, would also ask probing questions but on occasion would literally assault the blackboard, grabbing the chalk, shouting, “No, that’s not correct.”  He had proceed to write what he thought was correct citing, “chapter and verse,” the references for the basis of his argument.  He was more civil to the more senior speakers, but he demanded intellectual honesty from everyone.  He had a near photographic memory for references on a wide variety of chemistry topics.

Courses weren’t a specific requirement for Princeton chemistry graduate students. We were strongly “encouraged” to take some, often for credit, but always advised  to audit any course.  In our first semester, we “organikers” were encouraged to take courses in organic chemistry and instrumental methods.  The latter was team taught, the subjects including NMR, taught by Schleyer, and other spectroscopic methods.  References and texts included Bellamy, The Infrared Spectra of Complex Molecules; Sliverstein and Bassler, Spectrometric Identification of Organic Molecules; and Jackman, Nuclear Magnetic Resonance Spectroscopy.

The second semester featured Special Topics in Physical Organic Chemistry, also taught by Schleyer.  The topic was reactive intermediates.  We covered the gamut of physical organic.  A single text was not required but we used at least three: Gould, Mechanism and Structure in Organic Chemistry; Hine, Physical Organic Chemistry; and Alexander, Ionic Organic Reactions.  We also used those texts in a subsequent course on Physical Organic Chemistry.  I still have those texts and use them on occasion.  Most of what I ever learned about physical organic stemmed from that course (and the next) with Schleyer as a great teacher.

We spent about a week on free radicals, another week or so on carbanions, but then got to the meat of the course: carbonium ions. The non-classical carbonium ion (NCCI) controversy was warming up.  Featured in most of the controversy was the norbornyl cation (see Nonclassical Ions by P.D. Bartlett for an excellent collection of commentary and reference reprints).  NCCIs were postulated to be intermediates in the solvolysis reaction of norbornyl halides as having an indefinite rather than a discrete structure, with the positive charge smeared over more than one carbon atom.  The warring camps were exemplified by Saul Winstein et al. in favor of NCCIs and H.C. Brown against.  Schleyer was somewhere in the middle, employing and favoring rigorous investigation of the intermediates in these solvolysis reactions which included both his own research and critiquing the work of others.  In addition to our physical organic texts, Schleyer taught the course out of the current literature and correspondence.  During a break, he paid a visit to Winstein’s lab at UCLA and talked to one of Winstein’s post-docs, Chris Foote.  He came back with a preprint of Foote’s research on the correlation of solvolysis rates of a wide variety of cyclic hydrocarbons with the IR carbonyl stretching frequency of the corresponding ketones.  This allowed prediction of solvolysis rates, the stability of the carbonium ions, and therefore the implication of NCCIs.

Only two of us were taking the course for credit.  At the end of the course, Schleyer said he would send word on the final which would be a take home exam.  A couple of days later, a note appeared in our mailboxes: “See me – Paul.”  When we did we were given a copy of a two page letter to Schleyer from H. C. Brown.  The last paragraph was circled, “Anyone postulating the existence of non-classical carbonium ions should provide valid evidence for their existence.”  The assignment: “answer the letter,” specifically the last paragraph.  We were given a week.  Since we in the course were collectively convinced that NCCIs did exist, I wrote an essay essentially regurgitating the NCCI contents of the course in detail, including references. I turned in the paper and waited for the results.  A day or so later a similar note appeared in my mailbox.  He had marked my paper with a B+ but then discussed the merits of what I had done.  Any demerits were minor and he said I did a good job, but he then justified the grade as a B+ rather an A because “you should have answered the letter in the form of a letter which you did not.”  That was accompanied by one of his grins.  I had to agree with him. Ever since whenever I tell this story to anyone they usually say, “that wasn’t fair.”  I still disagree with that, it was fair.

Besides adamantane and related compounds, NCCIs, and other research, Schleyer was also an expert in concepts like stereochemistry, hyperconjugation, hydrogen bonding, and aromaticity, the last being a primary concern later in his computational chemistry era.  Early on, NMR was a favored analytical tool for aromaticity studies. Hydrogen bonding cropped up a year or so later when Schleyer gave a presentation at the monthly Princeton Section ACS meeting on “The Ubiquitous Hydrogen Bond.”  However some of the publicity went out as “The Ubiquitous Hydrogen Bomb.”  Some undergraduate must have picked up on that and scrawled on the upper corner of the blackboard of the main lecture hall, “Ban the Bomb.”  Schleyer, always the humorist, began with observing that slogan and then launched into his presentation (as usual, without notes).  After hearing him advocating hydrogen bonding in studies of properties and reaction mechanisms, some of us were surprised to hear that, once again, Schleyer was in the middle and proposed that hydrogen bonding was often postulated incorrectly.  At the end, he walked over to the slogan, crossed out “Bomb” and substituted “Bond” saying maybe we should often ban the bond.

I received my PhD in February 1967 and Schleyer was on my orals board. We tapped a keg for my Orals party that night of course, and I still remember him energetically discussing some chemistry topic with another faculty member, beer in hand. He left for Germany soon after but we kept crossing paths via the literature.  After 5 years in two lab jobs in pesticide synthesis I switched to my second love: chemical information.  In the course of navigating the online information age as a user, I and others discovered the intricacies of searching for authors by name.  As I recall, one could get by with searching for just “Schleyer” in chemistry databases but that often wasn’t sufficient or accurate. Not many people realized that Schleyer’s middle name was “von Ragué” and instead it was often linked to his last name as a compound name with regal implications. Various inaccurate listings include von R Schleyer, v R Schleyer, von Schleyer, etc.  Similar confusion probably also exists for William von Eggers Doering but I’ve never researched that.  Due to inaccurate editing by journal editorial staffs and general misunderstanding, Schleyer had trouble with this early on.  I still remember the posting on his bulletin board (the de facto newsletter of the Department with news, chemistry, and humor thrown in) of a copy of a paper from J. Chem. Soc. with a citation to one of Schleyer’s papers where the names of all five authors were totally garbled.  This was followed by Schleyer’s letter to the offending authors (all Brits) addressing them with garbled names.  Appended to that was their reply acknowledging receipt “of your particularly obnoxious letter” and that a correction had been sent to the journal editor.

When we were teaching chemists and others to do effective online searching, many of us information specialists used Schleyer as the prime example of how to and how not to search for authors. Schleyer’s name exists in about a dozen forms in abstracts of his papers in Chemical Abstracts, even after their editing process.  Journal reference citations, as in the Science Citation Index and Web of Science, are in even worse shape.  As far as we know, in the CA file at least, all variants are listed under Schleyer in inverted form, last name first.  Since there seems to be only one Schleyer, P publishing in chemistry, searching for that phrase in the Author field with a truncation symbol after the P should suffice to retrieve all of his >1000 publications.  I’m told that SciFinder does some automatic grouping of author name variants but since I don’t use that service I don’t know if the retrieval is complete.

Both Schleyer and I attended the dedication of the new Frick Chemical Laboratory three years ago and I was able to have one last conversation with him.  He will be profoundly missed by all who had contact with him.

Robert E. Buntrock
Orono, ME
buntrock16@roadrunner.com

 

Notes

An incentive for me (Robert E. Buntrock) to write a memorial of my contact with Paul Schleyer was that the majority of the obituaries did not cover the Princeton years to any extent. The exception appeared later, an obituary by Henry Schafer, a colleague of Schleyer’s at the University of Georgia (Nature, 517, p. 22, Jan. 1, 2015, doi:10.1038/517022a).  In the online comments to that obituary, links are given to three posts in Chemiotics II by former Princeton undergraduates who did research under Schleyer.

The Wikipedia biography of Schleyer has been updated with a death notice. Additional obituaries include those in ChemViews (Nov. 24, 2014), and C&ENews (92, 49, p, 50, Dec. 8, 2014). In addition, Martin Saunder’s letter to the editor of C&ENews (“Missing a Colleague and a Friend,” 93, 1, p. 5-6, Jan. 5, 2015) and Jorgensen, W. H. A Reflection on Paul von Ragué Schleyer. J. Chem. Theory Comput. 2015, 11, 1.

 

Notes From Our Sponsors

Image

 

Image

Division of Chemical Information Sponsors Spring 2015

The American Chemical Society Division of Chemical Information is very fortunate to receive generous financial support from our sponsors. Their support allows us to maintain the high quality of the Division’s programming and to promote communication between members at social functions at the ACS spring 2015 National Meeting in Denver, CO, and to support other divisional activities during the year, including scholarships to graduate students in chemical Information.

The Division gratefully acknowledges contributions from the following sponsors:

Gold: ACS Publications
   
Silver:

Bio-Rad Laboratories

  Royal Society of Chemistry
   
Bronze: Journal of Chemical Information and Modeling
  Journal of Cheminformatics (Springer)
  PerkinElmer
   
Contributors: AAAS/Science
  CRC Press
  Thieme Chemistry
   
   
   

Opportunities are available to sponsor Division of Chemical Information events, speakers, and material. Our sponsors are acknowledged on the CINF web site, in the Chemical Information Bulletin, on printed meeting materials, and at any events for which we use your contribution. For more information please review the sponsorship brochure at http://www.acscinf.org/PDF/CINF_Sponsorship_Brochure.pdf. Please feel free to contact me if you would like more information about supporting CINF.

Phil Heller
Chair, Fundraising Committee  
Email: sponsorship@acscinf.org
Tel: 917-450-4591
The ACS CINF Division is a non-profit tax-exempt organization with taxpayer ID no. 52-6054220.

 

PerkinElmer News

PerkinElmer is excited to be a part of the 249th American Chemical Society (ACS) Meeting, from March 22-26 in Denver, Colorado!  Scientists who design and synthesize compounds and measure and analyze their chemical and biological properties are an integral part of making our world a healthier and safer place.  The informatics division of PerkinElmer has a host of products that help scientists easily record data in an electronic lab notebook, analyze and visualize their data in Spotfire, draw chemical reactions, analyze data with statistics, collaborate with colleagues in the cloud, and work with outside collaborators like CROs.  We are looking forward to showcasing our new products at our booth #1106, as well as during a talk. Our talk is “Mining Electronic Lab Notebooks for Synthetic Needles”, presented by Josh Bishop, Ph.D., on Sunday March, 22, 2015 from 11:20 AM - 11:45 AM in Room 110 Convention Center.

Here are a few of the products that PerkinElmer will be showcasing:

ChemDraw + SciFinder  Image

This year, ChemDraw celebrates its 30th anniversary!  It is the drawing tool of choice for chemists and biologists to help create publication-ready drawings for use in electronic lab notebooks, databases and publications.  Now you can search the SciFinder database directly from ChemDraw without worrying about cutting and pasting.  As well as the desktop version we will also be showcasing ChemDraw for iPad and the new JavaScript sketcher ChemDraw Direct.

Image
Collaborating in the Cloud--Elements

Come by booth #1106 to preview Elements - our cloud-based collaboration platform.  Join us as we demonstrate how to plan and capture experiments, draw chemical reactions with ChemDraw, upload images, organize data and share research with colleagues. Check out a preview video here.

Image

 

Spotfire for Scientists

Come and see how the TIBCO Spotfire data visualization and analysis platform (example pictured above) can help you achieve insights faster.  Spotfire is a dynamic, collaborative tool that assimilates data such as chemical structures, text, numbers, images, chemical properties and biological assays from multiple sources.  It empowers scientists to perform complex analyses and create easy-to-use visual dashboards. Josh Bishop, Ph.D. will present a talk about optimizing reaction conditions by using ENotebook-based screening tools, followed by queries of results with Spotfire, Datalytix and Lead Discovery on Sunday, March, 22, 2015 from 11:20 AM - 11:45 AM at Room 110 - Colorado Convention Center 

PerkinElmer E-Notebook

Come by our booth # 1106 to see how our E-Notebook has helped chemists and biologists, across multiple industries.  Learn how to quickly record experiments, draw chemical reactions with ChemDraw, upload documents, integrate with Microsoft Office, protect intellectual property, organize data, and share research with colleagues or external collaborators such as CROs. 

Text mining for chemistry and the CHEMDNER track

A recently published supplement of Journal of Cheminformatics (http://bit.ly/CHEMDNER) describes the outcome of the first community challenge task on chemical natural language processing, carried out as part of the BioCreative initiative (www.biocreative.org). The supplement contains an overview of the results obtained for indexing articles with chemical compounds and identifying automatically the exact mentions of chemicals in the text. The methods, resources and features used as well as details on the manually labeled text corpus used are described in this supplement. Thirteen systems description articles out of the 27 participating teams provide additional characteristics on the cutting-edge methodology used for recognizing chemical entities in text, as well as further improvements over the initially implemented strategies.

Image

ChemTexts: First articles published

The first textbook journal worldwide recently published the first articles. All articles are free to access in 2015 and 2016. ChemTexts (http://bit.ly/chemtexts) imparts contemporary knowledge in all subdivisions of chemistry to students at an exceptionally high didactic level. ChemTexts can be used by students for learning, by lecturers for teaching, or by researchers and professionals as a recap of essential knowledge. On the pedagogical level, the journal primarily supports bachelor and master programs, but material at higher level is also considered. Typically, each text consists of a self-consistent treatment of a topic which could be part of a textbook. Beyond informative illustrations, the texts may also include supplementary material such as animated presentations or videos.

Image

Chemistry@Springer: LinkedIn group welcomed 2000th member

This group was started in January 2012 (http://bit.ly/SpringerChem) and now has grown beyond a membership of 2000 scientists and professionals. Members can talk to Springer chemistry editors and to other members about their work and find out everything they want to know about scientific publishing. In addition, the group provides information about conferences, latest publications and everything else of interest about chemistry at Springer.

OnChemistry: re-launched blog for Chemistry Central

Springer’s platform for open access content in chemistry has recently relaunched its blog (http://bit.ly/OnChemistry). The OnChemistry blog showcases the high-quality and thought-provoking open access research published across Springer’s chemistry portfolio. As well as writing about the published research, the blog is also a place to read about open access developments, conferences, awards and highlights from the field. Most of the posts are written by staff, but readers also hear from Chemistry Central Editorial Board Members and a variety of guest bloggers.

Steffen Pauly, Editorial Director Chemistry

ORCID iD: http://orcid.org/0000-0001-9768-9315

 

New look for chemistry journals SYNTHESIS and SYNLETT

The international chemistry journals SYNTHESIS and SYNLETT, published by Thieme Publishing Group, feature a brand new design. Both journals  appear with a new cover design and a more reader-friendly, clear layout.

The new design updates the look of SYNTHESIS and SYNLETT, making them visually more appealing. Both the printed journals and the online versions feature color-coded sections to help readers find articles and topics of interest even more quickly. The tables of contents include larger graphical abstracts which are repeated at the beginning of the respective articles. Graphics and tables within the articles are highlighted more clearly. Both journals also receive modern covers with new and distinctive title logos. Each issue highlights one scientific article with an associated graphic on the cover.

“We want to deliver our content in a better, more consistent and easy-to-navigate way to improve the reading experience for our users,” says Susanne Haak, Managing Editor of the Thieme Chemistry journals. “With the redesign we are also taking into account that nowadays most of our subscribers prefer reading our journals electronically. The new layout greatly improves online readability, for example through a header with the journal logo at the top of each article for better orientation.”

The first printed issues of SYNTHESIS and SYNLETT to feature the new design were published in January 2015.  

SYNTHESIS and SYNLETT, both published in English, report the latest scientific developments in synthetic chemistry. The articles are selected, reviewed, and edited by an internationally renowned editorial board to guarantee chemists access to the latest research insights in their respective fields. The subscription journals are published semimonthly in printed and electronic format by Thieme Publishing Group. Articles and primary data can be accessed online at http://www.thieme-connect.de/ejournals, most of them prior to print publication. For more information and how to register for an institutional trial access visit: www.thieme.com/chemistry-journals.

About Thieme

Thieme Publishing Group is a medical and scientific publishing house employing more than 900 staff and maintaining offices in seven cities, including New York, Delhi, Rio and Stuttgart. Founded in 1886, the Thieme name has become synonymous with high quality and excellence in online and print publishing. Thieme publishes 150 peer-reviewed journals and more than 450 new books annually. The company also has a rapidly growing array of web-based products in medicine and science. Popular online products include Thieme E-Journals and the Thieme Electronic Book Library, which are accessible via www.thieme-connect.com, Thieme’s platform for electronic products.

 

 

 

 

 

 

Technical Program

Symposia

ACS Chemical Information Division (CINF)
249th ACS National Meeting, Fall 2014
Denver, CO (March 22-26, 2015)

CINF Symposia

E. Bolstad, Program Chair

S M T W T Session title
A         Getting to the Best Reaction: Tools for Finding a Needle in a Haystack
  P       Defining 'Value' in Scholarly Communications: Evolving Ways of Evaluating Impact on Science
  A P A     Research Results: Reproducibility, Reporting, Sharing & Plagiarism
  E       Sci-Mix
    P     Molecular & Structural 2D & 3D Chemical Fingerprinting: Computational Storing, Searching, & Comparing Molecular & Chemical Structures
      A P   Development & Use of Data Format Standards for Cheminformatics

 

Legend:
A = AM, P = PM, D = AM/PM, E = Evening
*Cosponsored symposium with primary organizer shown in parenthesis; located with primary organizer.
**Primary organizer of cosponsored symposium.

See also: Complete Program

Technical Program Listing

ACS Chemical Information Division (CINF)
249th ACS National Meeting, Spring 2015
Denver, CO (March 22-26, 2015)

CINF Symposia

Erin Bolstad, Program Chair

[Created Sat Mar 21 2015, Subject to Change]

CINF: Getting to the Best Reaction: Tools for Finding a Needle in a Haystack
10:00am - 11:50am
Sunday, March 22

Room 110 - Colorado Convention Center
Roger Schenck, Organizing
Roger Schenck, Presiding
10:00am-10:05am Introductory Remarks

10:05am-10:30am
CINF 1: Automated design of realistic organometallic complexes and catalysts

*Vidar Jensen, Vidar.Jensen@kj.uib.no


Marco Foscato1 , Giovanni Occhipinti1 , Vishwesh Venkatraman2 , Bjørn Alsberg2 , Vidar Jensen1

Abstract

10:30am-10:55am
CINF 2: Different needles for different tailors: How specialized reaction search algorithms support scientists working in various research areas

*Valentina Eigner Pitto, ve@infochem.de


Valentina Eigner Pitto1 , Josef Eiblmaier1 , Hans Kraut1 , Heinz Saller1 , Peter Loew1

Abstract

10:55am-11:20am
CINF 3: Classification of scientific journal articles for the NIST Thermodynamic Research Center

*Alden Dima, alden.dima@nist.gov


Alden Dima2 , Yuanyuan Feng3 , Sharief Youssef2 , Kenneth Kroenlein1

Abstract

11:20am-11:45am
CINF 4: Mining electronic lab notebooks for synthetic needles (or gems)

*Philip McHale, phil.mchale@perkinelmer.com


Philip McHale2
2 PerkinElmer, Menlo Park, California, United States


Abstract
11:45am-11:50am Concluding Remarks
CINF: Defining 'Value' in Scholarly Communications: Evolving Ways of Evaluating Impact on Science
1:00pm - 4:35pm
Sunday, March 22

Room 110 - Colorado Convention Center
Sara Rouhi, Teri Vogel, Organizing
Sara Rouhi, Teri Vogel, Presiding

1:00pm-1:25pm
CINF 5: Withdrawn

1:25pm-1:50pm
CINF 6: Dynamic evaluation of impact for scholarly communications in the field of thermophysical properties

*Robert Chirico, robchirico@comcast.net


Robert Chirico1 , Vladimir Diky1 , Joseph Magee1 , Ala Bazyleva1 , Chris Muzny1 , Kenneth Kroenlein1

Abstract

1:50pm-2:15pm
CINF 7: Impact of crystal structures over the last, and next, 50 years

*Suzanna Ward, ward@ccdc.cam.ac.uk


Suzanna Ward1 , Ian Bruno1 , Colin Groom1

Abstract

2:15pm-2:40pm
CINF 8: Give me kudos for taking responsibility for self-marketing my scientific publications and increase impact

*Antony Williams, tony27587@gmail.com


Antony Williams1,2 , Will Russell3 , Melinda Kenneway4 , Louise Peck4

Abstract
2:40pm-2:55pm Intermission

2:55pm-3:20pm
CINF 9: How do you define the value of something if it’s free? Observations on Caltech’s Institutional Repository

*Donna Wrublewski, dtwrub@caltech.edu


Donna Wrublewski1 , George Porter1

Abstract

3:20pm-4:10pm
CINF 10: Redefining value: Alternative metrics and research outputs

*Kiyomi Deards, kiyomideards@gmail.com


Kiyomi Deards2 , Raychelle Burks1 , Sara Rouhi3 , William Gunn4

Abstract
CINF: Research Results: Reproducibility, Reporting, Sharing & Plagiarism
8:30am - 11:50am
Monday, March 23

Room 110 - Colorado Convention Center
Martin Hicks, Organizing
Martin Hicks, Presiding
8:30am-8:35am Introductory Remarks

8:35am-9:05am
CINF 11: Addressing researcher Incentives for publishability over accuracy

*Sara Davis Bowman, sed8n@virginia.edu


Sara Davis Bowman1 , Brian Nosek1,2

Abstract

9:05am-9:35am
CINF 12: Ethics in publishing: Editorial and related experiences

*Paul Weiss, psw@cnsi.ucla.edu


Paul Weiss1

Abstract

9:35am-10:05am
CINF 13: Data management and the research record in research misconduct investigations

*Kenneth Busch, kbusch@nsf.gov


Kenneth Busch1

Abstract
10:05am-10:20am Intermission

10:20am-10:50am
CINF 14: Irreproducibility in the scientific literature or: How often do scientists tell the truth, the whole truth and nothing but the truth?

*Robert Bergman, rbergman@berkeley.edu


Robert Bergman1,2

Abstract

10:50am-11:20am
CINF 15: Interplay of prior information and new data in high-throughput small-molecule studies

*Paul Clemons, pclemons@broadinstitute.org


Paul Clemons1

Abstract

11:20am-11:50am
CINF 16: STRENDA – proposing minimum information for reporting functional enzymology data

*Carsten Kettner, ckettner@beilstein-institut.de


Carsten Kettner1 , Martin Hicks1

Abstract
CINF: Research Results: Reproducibility, Reporting, Sharing & Plagiarism
1:30pm - 4:45pm
Monday, March 23

Room 110 - Colorado Convention Center
Martin Hicks, Organizing
Carsten Kettner, Presiding

1:30pm-2:00pm
CINF 17: Reproducibility in organic synthesis

*Rick Danheiser, danheisr@mit.edu


Rick Danheiser1

Abstract

2:00pm-2:30pm
CINF 18: Data and models, models and data

*Timothy Clark, Tim.Clark@fau.de


Timothy Clark1 , Christian Kramer2

Abstract

2:30pm-3:00pm
CINF 19: Reproducibility and the quality of chemical probes

*aled edwards, aled.edwards@utoronto.ca


aled edwards1

Abstract
3:00pm-3:15pm Intermission

3:15pm-3:45pm
CINF 20: MIRAGE – the minimum information required for a glycomics experiment: Rationale and progress

*William York, will@ccrc.uga.edu


William York2 , Carsten Kettner1 , Rene Ranzinger2

Abstract

3:45pm-4:15pm
CINF 21: Reporting and reuse of crystal structure data and knowledge

*Ian Bruno, bruno@ccdc.cam.ac.uk


Ian Bruno1 , Suzanna Ward1 , Colin Groom1

Abstract

4:15pm-4:45pm
CINF 22: Reproducibility and variance of literature compound structure and bioassay data

*John Overington, jpo@ebi.ac.uk


John Overington1

Abstract
CINF: Sci-Mix
8:00pm - 10:00pm
Monday, March 23

Hall C - Colorado Convention Center

8:00pm-10:00pm
CINF 11: Addressing researcher Incentives for publishability over accuracy

*Sara Davis Bowman, sed8n@virginia.edu


Sara Davis Bowman1 , Brian Nosek1,2

Abstract

8:00pm-10:00pm
CINF 1: Automated design of realistic organometallic complexes and catalysts

8:00pm-10:00pm
CINF 20: MIRAGE – the minimum information required for a glycomics experiment: Rationale and progress

*William York, will@ccrc.uga.edu


William York2 , Carsten Kettner1 , Rene Ranzinger2

Abstract

8:00pm-10:00pm
CINF 23: Chemical literature: A comparison of most important databases for searching the chemical literature from an undergraduate perspective

*Neelam Bharti, neelambh@ufl.edu


Neelam Bharti1

Abstract

8:00pm-10:00pm
CINF 24: From lab to the libraries: A new route for chemistry librarianship

*Neelam Bharti, neelambh@ufl.edu


Neelam Bharti1

Abstract

8:00pm-10:00pm
CINF 25: 3Dmol.js: Simple visualization and sharing of 3D molecular data

*David Koes, dkoes@pitt.edu


David Koes1 , Nicholas Rego1,2

Abstract

8:00pm-10:00pm
CINF 26: Sharing and reproducibility/replication: An NIH view

*Philip Bourne, pebourne@gmail.com


Philip Bourne1

Abstract

8:00pm-10:00pm
CINF 34: Highly visual representation methods for comparison of chemical structures and related properties

*Jess Sager, jess.sager@yahoo.com


Jess Sager1 , Philip Mounteney1 , Curtis Snyder1 , Tamsin Mansley2

Abstract

8:00pm-10:00pm
CINF 35: Overview of the analytical Information markup language

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1

Abstract

8:00pm-10:00pm
CINF 36: Thermophysical property dissemination utilizing an XML-based standard

*Kenneth Kroenlein, kenneth.kroenlein@nist.gov


Kenneth Kroenlein1 , Robert Chirico1 , Vladimir Diky1 , Ala Bazyleva1 , Joseph Magee1 , Chris Muzny1

Abstract

8:00pm-10:00pm
CINF 37: Standard data format for computational chemistry: CSX

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1 , Neil Ostlund2 , Mirek Sopek3 , Bing Wang2

Abstract

8:00pm-10:00pm
CINF 3: Classification of scientific journal articles for the NIST Thermodynamic Research Center

8:00pm-10:00pm
CINF 44: Building a standard for standards: The ChAMP project

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1 , Antony Williams2

Abstract

8:00pm-10:00pm
CINF 4: Mining electronic lab notebooks for synthetic needles (or gems)

8:00pm-10:00pm
CINF 5: Withdrawn

8:00pm-10:00pm
CINF 6: Dynamic evaluation of impact for scholarly communications in the field of thermophysical properties

*Robert Chirico, robchirico@comcast.net


Robert Chirico1 , Vladimir Diky1 , Joseph Magee1 , Ala Bazyleva1 , Chris Muzny1 , Kenneth Kroenlein1

Abstract
CINF: Research Results: Reproducibility, Reporting, Sharing & Plagiarism
8:30am - 11:50am
Tuesday, March 24

Room 110 - Colorado Convention Center
Martin Hicks, Organizing
Martin Hicks, Presiding
8:30am-8:35am Introductory Remarks

8:35am-9:05am
CINF 26: Sharing and reproducibility/replication: An NIH view

*Philip Bourne, pebourne@gmail.com


Philip Bourne1

Abstract

9:05am-9:35am
CINF 27: Globalization of Big Data: Access, integration, and quality control issues

*Stephen Boyer, skboyer@gmail.com


Stephen Boyer1 , Evan Bolton2 , Richard Martin3 , Eric Louie3 , Thomas Griffin1 , Gang Fu2 , Bo Yu2

Abstract

9:35am-10:05am
CINF 28: Flagging and curating erroneous chemical and biological records using cheminformatics to ensure data reproducibility

*Denis Fourches, fourches@email.unc.edu


Denis Fourches1

Abstract
10:05am-10:20am Intermission

10:20am-10:50am
CINF 29: Increasing open communication to facilitate reproducibility

*Courtney Soderberg, courtney@cos.io


Courtney Soderberg1

Abstract
CINF: Molecular & Structural 2D & 3D Chemical Fingerprinting: Computational Storing, Searching, & Comparing Molecular & Chemical Structures
1:30pm - 4:25pm
Tuesday, March 24

Room 110 - Colorado Convention Center
Rachelle Bienstock, Organizing
Rachelle Bienstock, Presiding
1:30pm-1:35pm Introductory Remarks

1:35pm-2:00pm
CINF 30: Insights into molecular similarity from crystal structures

*Colin Groom, groom@ccdc.cam.ac.uk


Colin Groom2 , Suzanna Ward2 , Ian Bruno2 , Shyam Vyas1 , Neil Feeder2

Abstract

2:00pm-2:25pm
CINF 31: Do chiral fingerprints and descriptors work?

*S Joshua Swamidass, swamidass@gmail.com


S Joshua Swamidass2 , Grover Miller1 , Tyler Hughes2 , Jessica Hartman1 , Steven Cothren1

Abstract
2:25pm-2:40pm Intermission

2:40pm-3:05pm
CINF 32: Similarity to SAR - interactive navigation of similarity relationships to guide optimization

*Matthew Segall, matthew.d.segall@gmail.com


Matthew Segall1 , Edmund Champness1 , James Chisholm1 , Chris Leeding1 , Peter Hunt1 , Alex Elliott1 , Samuel Dowling1 , Hector Garcia1

Abstract

3:05pm-3:30pm
CINF 33: Database fingerprint clustering methods using KNIME

*Rachelle Bienstock, rachelleb1@gmail.com


Rachelle Bienstock1

Abstract

3:30pm-3:55pm
CINF 34: Highly visual representation methods for comparison of chemical structures and related properties

*Jess Sager, jess.sager@yahoo.com


Jess Sager1 , Philip Mounteney1 , Curtis Snyder1 , Tamsin Mansley2

Abstract
3:55pm-4:00pm Concluding Remarks
CINF: Development & Use of Data Format Standards for Cheminformatics
9:00am - 11:55am
Wednesday, March 25

Room 110 - Colorado Convention Center
David Martinsen, Organizing
David Martinsen, Presiding
9:00am-9:05am Introductory Remarks

9:05am-9:35am
CINF 35: Overview of the analytical Information markup language

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1

Abstract

9:35am-10:05am
CINF 36: Thermophysical property dissemination utilizing an XML-based standard

*Kenneth Kroenlein, kenneth.kroenlein@nist.gov


Kenneth Kroenlein1 , Robert Chirico1 , Vladimir Diky1 , Ala Bazyleva1 , Joseph Magee1 , Chris Muzny1

Abstract

10:05am-10:35am
CINF 37: Standard data format for computational chemistry: CSX

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1 , Neil Ostlund2 , Mirek Sopek3 , Bing Wang2

Abstract
10:35am-10:50am Intermission

10:50am-11:20am
CINF 38: Development of an ontology specific to computational chemistry

*Mirek Sopek, sopekmir@makolab.pl


Mirek Sopek1 , Stuart Chalk1 , Bing Wang1 , Louis Nardozi1 , Neil Ostlund1

Abstract

11:20am-11:50am
CINF 39: Importance of data standards for large scale data integration in chemistry

*Antony Williams, tony27587@gmail.com


Antony Williams1 , Valery Tkachenko2 , Alexey Pshenichnov3 , Ken Karapetyan1 , Carlos Coba4

Abstract
11:50am-11:55am Concluding Remarks
CINF: Development & Use of Data Format Standards for Cheminformatics
1:30pm - 4:25pm
Wednesday, March 25

Room 110 - Colorado Convention Center
David Martinsen, Organizing
David Martinsen, Presiding
1:30pm-1:35pm Introductory Remarks

1:35pm-2:05pm
CINF 40: InChI as the chemical data format standard for cheminformatics

*Stephen Heller, steve@hellers.com


Stephen Heller1

Abstract

2:05pm-2:35pm
CINF 41: Withdrawn
2:35pm-2:50pm Intermission

2:50pm-3:20pm
CINF 42: JCAMP-MOL: A JCAMP-DX extension to allow integrated delivery of structural models and correlated spectral data

*Robert Hanson, hansonr@stolaf.edu


Robert Hanson1 , Robert Lancashire2

Abstract

3:20pm-3:50pm
CINF 43: Communicating crystal structures: Successes, challenges, and opportunities

*Ian Bruno, bruno@ccdc.cam.ac.uk


Ian Bruno1 , Colin Groom1 , Suzanna Ward1

Abstract

3:50pm-4:20pm
CINF 44: Building a standard for standards: The ChAMP project

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1 , Antony Williams2

Abstract
4:20pm-4:25pm Concluding Remarks

Cosponsored Symposia

COMP: Drug Discovery
1:30pm - 4:45pm
Monday, March 23

Mile High Ballroom, 4E - Colorado Convention Center
Y. Jane Tseng, Scott Wildman, Organizing
Y. Jane Tseng, Scott Wildman
Cosponsored by: CINF and MEDI, Presiding

1:30pm-2:00pm
COMP 104: Methodology for machine learning in chemical design

*Seifu Chonde, sjc294@psu.edu


Seifu Chonde1 , Joey Storer2 , Karl Mueller3 , Soundar Kumara1

2:00pm-2:30pm
COMP 105: PubChem and big data

*Sunghwan Kim, kimsungh@ncbi.nlm.nih.gov


Sunghwan Kim1 , Gang Fu1 , Lianyi Han1 , Bo Yu1 , Lewis Geer1 , Asta Gindulyte1 , Siqian He1 , Paul Thiessen1 , Evan Bolton1 , Stephen Bryant1

2:30pm-3:00pm
COMP 106: Toward the ubiquitous use of cheminformatics: From the development of reliability boosters for structure-based molecular docking to the analysis and modeling of hyperdimensional HTS data

*Denis Fourches, fourches@email.unc.edu


Denis Fourches1
3:00pm-3:15pm Intermission

3:15pm-3:45pm
COMP 107: Evaluating structural toxicity alerts with metabolism and reactivity models

*Tyler Hughes, tyler@wustl.edu


Tyler Hughes3 , Grover Miller1 , S Joshua Swamidass2

3:45pm-4:15pm
COMP 108: Predicting regioselectivity and lability of cytochrome P450 metabolism using quantum mechanical simulations

*Jonathan Tyzack, jon@optibrium.com


Matthew Segall1 , Jonathan Tyzack1 , Peter Hunt1

4:15pm-4:45pm
COMP 109: In silico approaches to CYP P450 site-of-metabolism (SOM) and microsomal stability prediction

*Johannes Voigt, johannes.voigt@gilead.com


Johannes Voigt1 , Uli Schmitz1
COMP: Drug Discovery
8:30am - 11:45am
Tuesday, March 24

Mile High Ballroom, 4E - Colorado Convention Center
Y. Jane Tseng, Scott Wildman, Organizing
Y. Jane Tseng, Scott Wildman
Cosponsored by: CINF and MEDI, Presiding

8:30am-9:00am
COMP 134: Flexible CDOCKER: Development and application of a docking method incorporating non-rigid receptors within CHARMM

*Jessica Gagnon, gagnonj@umich.edu


Jessica Gagnon1 , Sean Law1 , Charles Brooks2

9:00am-9:30am
COMP 135: Small molecule design using single step free energy perturbation (SSFEP): Blinded validation against the relative binding affinities of inhibitors of p38 and ACK1 kinases

*Rajiah Denny, aldrin.denny@gmail.com


E. Prabhu Raman1 , Alexander Mackerell1,2 , Rajiah Denny3

9:30am-10:00am
COMP 136: Expert system for predicting different local structure-activity relationship environments using the concept of emerging chemical patterns

*Vigneshwaran Namasivayam, vnamasiv@uni-bonn.de


Vigneshwaran Namasivayam1 , Disha Gupta-Ostermann2 , Jenny Balfer2 , Juergen Bajorath2
10:00am-10:15am Intermission

10:15am-10:45am
COMP 137: Scoring doesn't work -- or does it?

*Carsten Detering, detering@biosolveit.com


Carsten Detering1

10:45am-11:15am
COMP 138: ProBiS-ligands: A web server for prediction of ligands by examination of protein binding sites

*Dusanka Janezic, dusanka.janezic@gmail.com


Dusanka Janezic1 , Janez Konc2

11:15am-11:45am
COMP 139: Movable type method applied to the biomolecules study

*Zheng Zheng, laozhengzz@gmail.com


Zheng Zheng2 , Melek Ucisik3 , Kenneth Merz1
COMP: Drug Discovery
1:30pm - 4:45pm
Tuesday, March 24

Mile High Ballroom, 4E - Colorado Convention Center
Y. Jane Tseng, Scott Wildman, Organizing
Y. Jane Tseng, Scott Wildman
Cosponsored by: CINF and MEDI, Presiding

1:30pm-2:00pm
COMP 164: Predicting melting points of drug-like molecules using free energy perturbation

*Rajiah Denny, aldrin.denny@gmail.com


Rajiah Denny1 , Saivenkataraman Jayaraman2 , Rayomand Unwalla3 , Mark Bunnage3

2:00pm-2:30pm
COMP 165: Biasing potential replica exchange multisite λ-dynamics: Toward scalable and simultaneous free energy calculations of more than 1000 compounds

*Garrett Goh, gbgoh@umich.edu


Garrett Goh1 , Kira Armacost1 , Charles Brooks III1,2

2:30pm-3:00pm
COMP 166: Thermodynamics of ligand binding - consequences of local environment for drug optimization

*Johan Ulander, ljaulander@yahoo.com


Johan Ulander1
3:00pm-3:15pm Intermission

3:15pm-3:45pm
COMP 167: Toward a complete, fully knowledge-driven pseudo force field for protein-ligand interactions

*Marcel Verdonk, marcel.verdonk@astx.com


Marcel Verdonk1

3:45pm-4:15pm
COMP 168: Hierarchy of density functional theory-based benchmarks for a chemical space relevant to drug discovery applications

*Art Bochevarov, art.bochevarov@schrodinger.com


Art Bochevarov1

4:15pm-4:45pm
COMP 169: Rational design of potent factor VIIa inhibitors using quantum chemical methods

*Daniel Cheney, cheneyd@bms.com


Daniel Cheney1 , Indawati Delucca1 , Peter Glunz1 , Wen Jiang1 , Vladimir Ladziata1 , Brandon Parkhurst1 , Yanfeng Zhang1 , Yan Zou1 , Jeffrey Bozarth1 , Joseph Luettgen1 , Alan Rendina1 , Luciano Mueller2 , Anzhi Wei2 , John Newitt2 , James Tamura2 , Dietmar Seiffert1 , Pancras Wong1 , Ruth Wexler1 , E. Priestley1
COMP: Drug Discovery
8:30am - 11:45am
Wednesday, March 25

Mile High Ballroom, 4E - Colorado Convention Center
Y. Jane Tseng, Scott Wildman, Organizing
Y. Jane Tseng, Scott Wildman
Cosponsored by: CINF and MEDI, Presiding

8:30am-9:00am
COMP 330: Halogen bonds in drug design

*Suman Sirimulla, ssirimulla@miners.utep.edu


Suman Sirimulla1

9:00am-9:30am
COMP 331: Highly visual workflow for designing, selecting and enumerating new compounds for assay

*Jess Sager, jess.sager@yahoo.com


Jess Sager1 , Tamsin Mansley2 , Philip Mounteney1
9:30am-9:45am Intermission

9:45am-10:15am
COMP 332: Discovery of new and diverse TLR9 receptor antagonists for regulating innate immune reactions

*Amiram Goldblum, amiram@vms.huji.ac.il


Amiram Goldblum1 , Anke Burger-Kentischer2 , Angela Mattes2 , Maria Zatsepin1

10:15am-10:45am
COMP 333: In silico design, synthesis, and assays of specific substrates and peptidomimetic inhibitors for proteinase 3

*Nathalie Reuter, nathalie.reuter@mbi.uib.no


Shailesh Narawane1,3 , Cedric Grauffel1,3 , Anne-Sophie Schillinger1,3 , Bengt Erik Haug2 , Nathalie Reuter1,3

10:45am-11:15am
COMP 334: Ligand based drug design of novel pyrimidine derivatives as Tankyrase inhibitors for the treatment of colorectal cancer

*Abhishek Patel, 13mph601@nirmauni.ac.in


Abhishek Patel1 , Hardik Bhatt2
COMP: Drug Discovery
1:30pm - 5:15pm
Wednesday, March 25

Mile High Ballroom, 4E - Colorado Convention Center
Y. Jane Tseng, Scott Wildman, Organizing
Y. Jane Tseng, Scott Wildman
Cosponsored by: CINF and MEDI, Presiding

1:30pm-2:00pm
COMP 360: Small molecule crystal structures in drug discovery and development

*Colin Groom, groom@ccdc.cam.ac.uk


Colin Groom2 , Suzanna Ward2 , Shyam Vyas1 , Ian Bruno2

2:00pm-2:30pm
COMP 361: QSAR modeling independent of input tautomers

*Robert Fraczkiewicz, robert@simulations-plus.com


Marvin Waldman1 , Robert Fraczkiewicz1 , Robert Clark1

2:30pm-3:00pm
COMP 362: General applicability of template CoMFA to prospective bioactivity prediction

*Richard Cramer, cramer@tripos.com


Richard Cramer1

3:00pm-3:30pm
COMP 363: Exploring conformational search protocols for ligand-based virtual screening and 3D QSAR modeling

*Woody Sherman, woody.sherman@schrodinger.com


Daniel Cappel1 , Steve Dixon1 , Woody Sherman1 , Jianxin Duan1
3:30pm-3:45pm Intermission

3:45pm-4:15pm
COMP 364: Experimentally derived interaction fields as a basis for ligand-based virtual screening

*Colin Groom, groom@ccdc.cam.ac.uk


Colin Groom1 , Jason Cole1 , Ilenia Giangreco1 , Oliver Korb1 , Scott Gothe2 , Ian Bruno1

4:15pm-4:45pm
COMP 365: Alignment of diverse ligands for a protein: a solved problem?

*Tim Cheeseright, tim@cresset-bmd.com


Tim Cheeseright1 , Paolo Tosco1 , Mark Mackey1

4:45pm-5:15pm
COMP 366: Exhaustive pairwise overlays: the gold standard for molecular alignment?

*Paul Hawkins, phawkins@eyesopen.com


Paul Hawkins1 , Robert Tolbert1

Technical Program with Abstracts

ACS Chemical Information Division (CINF)
249th ACS National Meeting, Spring 2015
Denver, CO (March 22-26, 2015)

CINF Symposia

Erin Bolstad, Program Chair

[Created Sat Mar 21 2015, Subject to Change]

CINF: Getting to the Best Reaction: Tools for Finding a Needle in a Haystack
10:00am - 11:50am
Sunday, March 22

Room 110 - Colorado Convention Center
Roger Schenck, Organizing
Roger Schenck, Presiding
10:00am-10:05am Introductory Remarks

10:05am-10:30am
CINF 1: Automated design of realistic organometallic complexes and catalysts

*Vidar Jensen, Vidar.Jensen@kj.uib.no


Marco Foscato1 , Giovanni Occhipinti1 , Vishwesh Venkatraman2 , Bjørn Alsberg2 , Vidar Jensen1
1 Department of Chemistry, University of Bergen, Bergen, Norway; 1 Department of Chemistry, University of Bergen, Bergen, Norway; 1 Department of Chemistry, University of Bergen, Bergen, Norway; 2 Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway; 2 Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway

Automation of molecular modeling allows for in silico evaluation of huge numbers of candidate compounds without manual intervention. While this strategy is standard in drug design, the more complex nature of metal-containing compounds, e.g., with respect to coordination and oxidation numbers, has so far limited application of such methods to organometallic chemistry and catalysis. To overcome the limitation of existing tools primarily developed for organic, drug-like molecules, and boost the application of in silico screening and de novo design to organometallic and transition metal chemistry, we have developed a new method for designing such compounds [1-3].

New, realistic organometallic compounds are built by assembling molecular fragments that are largely taken from crystallographic data of existing compounds [2]. The method enables precise control of the kind of connections formed between fragments, thus offering ample structural variation at the same time as restraining the combination of fragments to realistic and synthesizable molecules only. Three-dimensional (3D) molecular fragments are used directly in preparation of complete 3D molecular models of new candidates, thus bypassing the otherwise fault-prone conversion from low-dimensional representations (e.g., SMILES strings) to 3D prior to calculation of fitness (or, scoring) functions [3]. Direct building in 3D is seen to give accurate control of the stereochemistry and also opens for construction of molecules, such as transient intermediates, with geometrical features that may otherwise be very challenging to achieve in an automated fashion [3].

Applications of the de novo method to design of catalysts and other functional transition metal compounds will be presented.

References
(1) Chu, Y.; Heyndrickx, W.; Occhipinti, G.; Jensen, V. R.; Alsberg, B. K. J. Am. Chem. Soc. 2012, 134, 8885.
(2) Foscato, M.; Occhipinti, G.; Venkatraman, V.; Alsberg, B. K., Jensen, V. R. J. Chem. Inf. Model. 2014, 54, 767.
(3) Foscato, M.; Venkatraman, V.; Occhipinti, G.; Alsberg, B. K., Jensen, V. R. J. Chem. Inf. Model. 2014, 54, 1919.




10:30am-10:55am
CINF 2: Different needles for different tailors: How specialized reaction search algorithms support scientists working in various research areas

*Valentina Eigner Pitto, ve@infochem.de


Valentina Eigner Pitto1 , Josef Eiblmaier1 , Hans Kraut1 , Heinz Saller1 , Peter Loew1
1 InfoChem GmbH, Munich, Germany; 1 InfoChem GmbH, Munich, Germany; 1 InfoChem GmbH, Munich, Germany; 1 InfoChem GmbH, Munich, Germany; 1 InfoChem GmbH, Munich, Germany

What is the best reaction? The question has more than one answer depending on the application area. Researchers working in process chemistry, medicinal chemistry, or custom synthesis have totally different requirements concerning the desired reaction search outcome. For example, while medicinal chemists search for synthesis routes of possible new drug candidates in milligram scale, process chemists need to find new synthesis pathways for known molecules optimizing them in terms of efficiency (yield, reduced production costs, etc.).Various search tools and data repositories are needed to satisfy these highly diverse requirements.
In this talk we will give an overview and specific examples of how distinct reaction search approaches and algorithms can be applied to the different research fields, helping each scientist group find its best reaction.

10:55am-11:20am
CINF 3: Classification of scientific journal articles for the NIST Thermodynamic Research Center

*Alden Dima, alden.dima@nist.gov


Alden Dima2 , Yuanyuan Feng3 , Sharief Youssef2 , Kenneth Kroenlein1
1 National Institute of Standards and Technology, Boulder, Colorado, United States; 2 Information Technology Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States; 2 Information Technology Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States; 3 University of Maryland Baltimore County, Baltimore, Maryland, United States

Scientific literature is an important source of scientific data for research in many domains. As the reviewing and curation of such scientific data from journal articles is a tedious and time consuming task, efforts exist to curate community databases containing scientific reference data to reduce the burden for individual researchers. For example, the NIST Thermodynamic Research Center (TRC) in Boulder, Colorado curates scientific data obtained from journal articles to provide industry and academia with thermodynamic and thermophysical property tables and data in the form of electronic databases. The current classification process is manual and requires that a reviewer read all of the articles of a current journal issue in order to identify relevant articles. Despite the benefits of such efforts, the identification of documents containing relevant data for curation remains a difficult issue.
Automated document classification offers the promise of developing a classifier which can greatly reduce the burden of identifying documents relevant to a data curation effort provided that the classifier performs sufficiently well. We will present the results of an effort to explore the feasibility of using of document classification techniques implemented using open source software for use in the classification of scientific journal articles for data curation and to identify the effective classification techniques for a corpus.
For this work, we used text extracted from articles from several volumes of the Journal of Chemical and Engineering Data that were curated by the TRC. For the articles chosen for curation, we have XML files containing the curated data. The presence of these XML files serves as an indicator that the file is relevant to the TRC and, as a result, we have ground truth for this collection of articles.
A key issue is the generation of features from the text which can serve as the input to a classifier. Based on previous experience, we chose the document topic mixtures generated from topic modeling of the articles. These vectors of numeric features were used to train the classifiers. Initial results suggest that open source-based automated document classification of scientific journal articles is a useful tool for groups such as the TRC.

11:20am-11:45am
CINF 4: Mining electronic lab notebooks for synthetic needles (or gems)

*Philip McHale, phil.mchale@perkinelmer.com


Philip McHale2
2 PerkinElmer, Menlo Park, California, United States

The widespread acceptance and use of electronic lab notebooks (ELNs) in industry and increasingly in academia has led to increased efficiency, accuracy and data capture of valuable information about synthetic efforts – both successful and unsuccessful. This in turn has created large collections of data on syntheses, but exploiting this mass of data has been difficult with traditional data search, extraction, organization and analysis tools. In this paper we will describe methods using modern data analysis and visualization tools to rapidly sift through the accumulated data to quickly discern optimal synthetic routes and reaction conditions, thereby saving chemists time that can be more productively spent in the lab.
11:45am-11:50am Concluding Remarks
CINF: Defining 'Value' in Scholarly Communications: Evolving Ways of Evaluating Impact on Science
1:00pm - 4:35pm
Sunday, March 22

Room 110 - Colorado Convention Center
Sara Rouhi, Teri Vogel, Organizing
Sara Rouhi, Teri Vogel, Presiding

1:00pm-1:25pm
CINF 5: Withdrawn

1:25pm-1:50pm
CINF 6: Dynamic evaluation of impact for scholarly communications in the field of thermophysical properties

*Robert Chirico, robchirico@comcast.net


Robert Chirico1 , Vladimir Diky1 , Joseph Magee1 , Ala Bazyleva1 , Chris Muzny1 , Kenneth Kroenlein1
1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States

The Thermodynamics Research Center (TRC) at the National Institute of Standards and Technology (NIST) maintains an extensive database of published experimental thermophysical properties for pure compounds, binary and ternary mixtures, and chemical reactions. This database, in combination with expert system software (ThermoData Engine, TDE), allows on-demand (i.e., dynamic) critical evaluation of thermophysical property data. This combined system is at the core of a cooperation between NIST and key journals in this field, including the Journal of Chemical and Engineering Data, The Journal Chemical Thermodynamic, and Fluid Phase Equilibria, for rapid and substantive review of new experimental data in advance of publication. A measure of the immediate impact of new publications on the status of the field is a direct outcome of this analysis. Following publication, new experimental data are posted online in a structured format (ThermoML) for free download. Subsequent impact of new results can be assessed, in part, through access statistics, which provide a measure of interest outside of the traditional channel of citations in the academic literature. The overall system will be described with emphasis on impact assessment.

1:50pm-2:15pm
CINF 7: Impact of crystal structures over the last, and next, 50 years

*Suzanna Ward, ward@ccdc.cam.ac.uk


Suzanna Ward1 , Ian Bruno1 , Colin Groom1
1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom

Measuring the impact of scientific research is an important guide for fund distribution, individual researchers and those engaged in the communication of scientific output. Traditional metrics use journal article citations and impact scores; however there is increasing interest in alternative metrics that draw on other sources of data. How can scientific data repositories and databases help in understanding impact and what are the challenges associated with this?

This presentation, timed to coincide with the 50th anniversary of the Cambridge Structural Database, will explore how we can measure the impact of 750,000 published crystal structures. We will soon be in receipt of 100,000 structures per year. How can we measure the impact of structures individually and as a whole? What mechanisms can we put in place to help better track the reuse of data? And to what degree should we be concerned about quality vs quantity?

2:15pm-2:40pm
CINF 8: Give me kudos for taking responsibility for self-marketing my scientific publications and increase impact

*Antony Williams, tony27587@gmail.com


Antony Williams1,2 , Will Russell3 , Melinda Kenneway4 , Louise Peck4
1 Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina, United States; 2 ChemConnector, Wake Forest, North Carolina, United States; 3 Strategic Innovation Department, Royal Society of Chemistry, Cambridge, United Kingdom; 4 Kudos, Oxford, United Kingdom; 4 Kudos, Oxford, United Kingdom

The authoring of a scientific publication can represent the culmination of many tens if not 100s of hours of data collection and analysis. The authoring and peer-review process itself often represents a major undertaking in terms of assembling the publication and passing through review. Considering the amount of work invested in the production of a scientific article it is therefore quite surprising that authors, post-publication, invest very little effort in communicating the value and potential impact of their article to the community. Social networking has clearly demonstrated the ability to self-market and drive attention. At the same time, the increasing volume of literature (over a million new articles are published every year), requires authors to take on a more direct role in ensuring their work gets read and cited. This requirement may grow with the emergence of a range of metrics at the article level, shifting attention away from where a researcher publishes to the performance of their individual articles. Therefore, a separate platform to facilitate social networking and other discovery tools to communicate the value of published science to the community would be of value. In parallel the possibility to enhance an article by linking to additional information (presentations, videos, blog posts etc) allows for enrichment of the article post-publication, a capability not available via the publishers platform. This presentation will provide a personal overview of the experiences of using the Kudos Platform and how it ultimately benefits my ability to communicate an integrated view of my research to the community.
2:40pm-2:55pm Intermission

2:55pm-3:20pm
CINF 9: How do you define the value of something if it’s free? Observations on Caltech’s Institutional Repository

*Donna Wrublewski, dtwrub@caltech.edu


Donna Wrublewski1 , George Porter1
1 Caltech Library MC 1-43, California Institute of Technology, Pasadena, California, United States; 1 Caltech Library MC 1-43, California Institute of Technology, Pasadena, California, United States

Caltech’s Institutional Repository - the Caltech Collection of Open Digital Archives (Caltech CODA) - recently surpassed its 40,000th submission and has over five million downloads since July of 2008. CODA is comprised of several different repositories, including CaltechAUTHORS and CaltechTHESIS. CaltechAUTHORS encompasses the scholarly output of Caltech researchers, with items made freely available when licensing permits. Recently, the Caltech Media Relations department has been coordinating with the Library when articles of note generate press releases; the Library works to acquire permissions (when possible) to make the article available upon issuance of the press release. Doctoral graduates are required to deposit their thesis in CaltechTHESIS in order to graduate, with items being made freely available when permitted, or embargoed for a finite amount of time. Requests for embargoed theses are regularly received by the Library and permissions are sought (and often obtained) wherever possible. This talk will describe the contents of Caltech CODA, examine usage statistics and trends in terms of item type and coordinated publicity, describe case studies of how IRs can facilitate compliance with funder requirements for reporting and data sharing, and offer some observations and insights on how making information freely available may (or may not) translate into “impact”.

3:20pm-4:10pm
CINF 10: Redefining value: Alternative metrics and research outputs

*Kiyomi Deards, kiyomideards@gmail.com


Kiyomi Deards2 , Raychelle Burks1 , Sara Rouhi3 , William Gunn4
1 Doane College, Crete, Nebraska, United States; 2 Research and Instructional Services, University of Nebraska-Lincoln, Lincoln, Nebraska, United States; 3 Altmetric LLP, Washington, District of Columbia, United States; 4 Academic Outreach, Mendeley Ltd., Menlo Park, California, United States

Researchers and educators are constantly asked to demonstrate the value of their work and their professional reputation. In a world of web hits and download stats how can the impact and value of a scholarly article be measured? To accurately reflect the total impact of a researcher's work the value of all outputs must be assessed. Alternative metrics to be considered include: article usage, freely available online courses and handouts, educational materials, websites, videos, podcasts, blogs, social media accounts, workshops and outreach events must be assessed and described. This panel will focus on the use of alternative metrics to describe the value of traditional and alternative research outputs. Panelists will include a tenure track faculty member in the process of compiling their tenure packet, a post-doctoral research associate with a national reputation in science outreach, a member of the NISO Altmetrics Standards Project, and a representative from Altmertic LLP, a company devoted to measuring attention received by articles and data sets and delivering article level metrics for those outputs. Each panelist will speak briefly about their experiences demonstrating value using alternative metrics followed by a moderated discussion with the audience.
CINF: Research Results: Reproducibility, Reporting, Sharing & Plagiarism
8:30am - 11:50am
Monday, March 23

Room 110 - Colorado Convention Center
Martin Hicks, Organizing
Martin Hicks, Presiding
8:30am-8:35am Introductory Remarks

8:35am-9:05am
CINF 11: Addressing researcher Incentives for publishability over accuracy

*Sara Davis Bowman, sed8n@virginia.edu


Sara Davis Bowman1 , Brian Nosek1,2
1 Center for Open Science, Charlottesville, Virginia, United States; 1 Center for Open Science, Charlottesville, Virginia, United States; 2 University of Virginia, Charlottesville, Virginia, United States

The climate of academic scientific research requires publication for professional success and sustainability. Publication of novel, positive, and “exciting” research is emphasized over negative results and replications. Thus, researchers are incentivized to make design, analysis, and reporting decisions that promote positive results and ignore negative results. This results in a biased body of literature. When the incentive structure emphasizes novelty over replication, false results persist in the literature because they remain unchallenged. The accumulation of scientific knowledge is slowed by both of these factors. This talk touches on the evidence and challenges for reproducibility in scientific research, then delves deeper into initiatives to nudge incentives and norms toward practices that can improve reproducibility.

9:05am-9:35am
CINF 12: Ethics in publishing: Editorial and related experiences

*Paul Weiss, psw@cnsi.ucla.edu


Paul Weiss1
1 MC 722710, California NanoSystems Inst. UCLA, Los Angeles, California, United States

At ACS Nano, as editors, we have encountered a variety of ethical issues. These came not only from inexperienced authors, but from all corners. Sitting at the crossroads of many fields gives us an interesting perspective on the approaches taken across fields and borders. A selection of these will be discussed.

9:35am-10:05am
CINF 13: Data management and the research record in research misconduct investigations

*Kenneth Busch, kbusch@nsf.gov


Kenneth Busch1
1 National Science Foundation, Arlington, Virginia, United States

In a January 2011 revision of proposal preparation instructions, the National Science Foundation (NSF) added a requirement for a data management plan to be submitted with each NSF proposal. The Federal and widely-used definition for data is “…the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” Read broadly, data includes not only measurements and observations, but experimental protocols, methods, and the software or computer code required for replication. Data management, including processes for archiving and sharing data, is an integral part of the proper conduct of research. Examination of the research record is central to research misconduct (RM) investigations conducted by NSF OIG. Definitions of fabrication and falsification under the NSF RM regulation (45 C.F.R. part 689) specifically reference 'data or results,' and the allied definition of plagiarism defines plagiarism as appropriation of another person's 'results' without providing appropriate credit. Although there is general agreement about the need for proper data management, it has been reported that about 25% of RM investigations encounter problems with research records. These problems required extra work to resolve during the investigation, or in some cases, reportedly cannot be resolved. Examples of data management issues encountered during NSF OIG RM investigations will be presented, including investigations that involve detailed examination of laboratory notebooks, instrument records, manuscript preparations and revision, and composition of NSF proposals. Several recent case examples highlight growing areas of concerns about data sharing, including sharing data across international borders.
10:05am-10:20am Intermission

10:20am-10:50am
CINF 14: Irreproducibility in the scientific literature or: How often do scientists tell the truth, the whole truth and nothing but the truth?

*Robert Bergman, rbergman@berkeley.edu


Robert Bergman1,2
1 Chemistry, University of California, Berkeley, Berkeley, California, United States; 2 Chemical Sciences, Lawrence Berkeley National Laboratory, Berkeley, California, United States

This lecture will address the apparent increase in scientific misconduct that has been reported in recent years from the point of view of an active worker in organic and inorganic chemistry. After a discussion of primary scientific fraud, such as plagiarism and data fabrication, the general question of scientific data reproducibility will be considered. The talk will assess the level of reproducibility of most research in the synthetic chemical literature, and then focus on two journals, Organic Synthesis and Inorganic Synthesis, which are among the only organs that provide a source of direct information about the reproducibility of submitted experiments. The talk will consider the question of whether trying to insure reproducibility necessarily acts as a effective means of protecting science from fraud, and will discuss “semi-fraud”: the conscious and unconscious manipulation of data.

10:50am-11:20am
CINF 15: Interplay of prior information and new data in high-throughput small-molecule studies

*Paul Clemons, pclemons@broadinstitute.org


Paul Clemons1
1 Broad Institute, Cambridge, Massachusetts, United States

Interplay of prior information and new data in high-throughput small-molecule studies. Recently, reproducibility of scientific results has come under scrutiny, though some assumptions made in such critiques are not necessarily well-founded ones. For example, some observers conflate lack of consilience with expected results, or lack of extensibility between model systems (e.g., species) with a true lack of reproducibility. Further, some observers expect unrealistically simple interpretations to emerge from high-throughput datasets, rather than embracing potential complexities such experiments are uniquely poised to uncover. Of course, the relationship of new results with expected ones is important epistemologically, since requiring too few connections to expectations could result in poorly controlled science, while requiring too many connections could stifle truly novel discoveries. In our work using high-throughput small-molecule screening and profiling data, we combine prior information, new experimental data, and computational methods to generate and prioritize novel and specific hypotheses for testing. Using enrichment-based methods, we allow known connections to strengthen our hypotheses, without requiring that each expected connection be precisely re-discovered. We present real examples focused on target identification, structure-activity relationships, and uncovering new pathway dependencies in cancer cells.

11:20am-11:50am
CINF 16: STRENDA – proposing minimum information for reporting functional enzymology data

*Carsten Kettner, ckettner@beilstein-institut.de


Carsten Kettner1 , Martin Hicks1
1 Beilstein Institut, Frankfurt, Germany; 1 Beilstein Institut, Frankfurt, Germany

Large amounts of published data are available on the behaviour of individual enzymes, but even a cursory examination of the literature will reveal that these were often collected under quite disparate conditions, of pH, temperature, ionic strength etc. Furthermore, full details of the assay conditions that were used are often lacking. This causes difficulties when data move between researchers whose data are supplied by laboratories that use different methods, and can, in the worst cases, lead to misinterpretation of laboratory findings.
The STRENDA Commission (Standards for Reporting Enzymology Data) made up of experts from the enzyme chemistry community and supported by the Beilstein-Institut, addresses the improvement of functional enzyme data quality (www.strenda.org) and has drawn up the STRENDA guidelines after many rounds of extensive consulting sessions. Today, more than 30 biochemical journals already recommend authors to refer to these guidelines when reporting enzyme kinetics data.
To enable scientists to easily prepare data for manuscripts, the STRENDA Commission has developed a web-based portal for the direct electronic submission of data by the authors prior to publication. This portal called STRENDA DB provides an assessment tool with which authors, journals’ editors and reviewers can check whether the reporting of experimental data is compliant with the STRENDA guidelines and thus matches the instructions for authors from the journals. The data entered are stored in STRENDA DB and will be made publically accessible after they have been published in a journal.
CINF: Research Results: Reproducibility, Reporting, Sharing & Plagiarism
1:30pm - 4:45pm
Monday, March 23

Room 110 - Colorado Convention Center
Martin Hicks, Organizing
Carsten Kettner, Presiding

1:30pm-2:00pm
CINF 17: Reproducibility in organic synthesis

*Rick Danheiser, danheisr@mit.edu


Rick Danheiser1
1 Massachusetts Inst of Tech, Cambridge, Massachusetts, United States

A fundamental principle in the field of synthetic organic chemistry states that a synthetic chemist skilled in the art should be able to repeat a synthetic transformation with the same results as those described in published work from another laboratory. Unfortunately, all too often this is not the case. Why do many procedures prove not to be reproducible? Why do even experienced researchers encounter problems when attempting to repeat reactions described in the literature? This talk will focus on the most common causes of problems involving reproducibility in organic synthesis. The specific examples discussed will be based on experiences from my own laboratory, examples taken from the literature, and examples from procedures submitted to Organic Syntheses that I am familiar with from my service as Editor in Chief of that journal.

2:00pm-2:30pm
CINF 18: Data and models, models and data

*Timothy Clark, Tim.Clark@fau.de


Timothy Clark1 , Christian Kramer2
1 Computer Chemie Centrum, Erlangen, Germany; 2 Center for Molecular Biosciences, Leopold-Franzens-University Innsbruck, Innsbruck, Austria

Clearly, models can be no better than the data on which they are based. Considering the limitations of the data is an important step in constructing robust and reliable Quantitative Structure-Property Relationships (QSPR). Using ever better interpolation techniques may lead to models that are superficially more accurate than the data on which they are based. This is clearly undesirable but depressingly common.

To turn the question around; what do the models tell us about the data? Can we gain additional information about the quality and applicability domain of the training data by analyzing the performance of the model in detail? Real examples will be given in addition to simulated models using artificial noise in the training data.

2:30pm-3:00pm
CINF 19: Reproducibility and the quality of chemical probes

*aled edwards, aled.edwards@utoronto.ca


aled edwards1
1 Structural Genomics Consortium, Toronto, Ontario, Canada

Chemical and pharmacological inhibitors are in common use within biomedicine. However, these inhibitors are often poorly characterized and the decision to use them is often based on a vendor's marketting material rather than any objective criteria. The Structural Genomics Consortium and its network of pharmaceutical companies and academics are using structure-guided methods to produce well-characterized inhibitors of proteins implicated in epigenetic signalling as well as protein kinases. These inhibitors meet pre-defined quality criteria, as assessed by the group and by an external Scientific Committee. The high quality inhibitors, termed chemical probes, are now being made available to the community without restriction.
3:00pm-3:15pm Intermission

3:15pm-3:45pm
CINF 20: MIRAGE – the minimum information required for a glycomics experiment: Rationale and progress

*William York, will@ccrc.uga.edu


William York2 , Carsten Kettner1 , Rene Ranzinger2
1 Beilstein Institut, Frankfurt/Main, Germany; 2 Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, United States; 2 Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, United States

Scientists have not fully appreciated the biological importance of complex glycans until recently, in part because these molecules are considerably more difficult to characterize than nucleic acids (DNA and RNA) and proteins. Furthermore, glycans are synthesized by a complex process in which each residue is added by a specialized enzyme. Nevertheless, recently developed glycomics technology make it possible to identify and quantify the glycans present in complex mixtures prepared from diverse cells and tissues. Although the need to evaluate the quality of such glycoanalytic results before publishing them in scientific journals or storing them in databases is obvious, assessing the validity of such analyses is not trivial, requiring specific information regarding the methods used to generate and interpret the data. This realization led to the initiation of an international working group to establish guidelines for the Minimum Information Required for A Glycomics Experiment (MIRAGE), now funded by the Beilstein Institute. The working group is developing guidelines for reporting the resuts the most commonly used glycoanalytic methods, including mass spectrometry and glycan arrays. By adopting these guidelines, journals that publish glycoanalytic studies will increase the reliability of the manuscripts they publish. Such guidelines are critical if glycoanalysis is to be recognized as a mature discipline that stands up to rigorous scientific scrutiny. The philosophical and practical issues that must be addressed to establish effective MIRAGE guidelines will be discussed as will the progress made by the MIRAGE working group.

3:45pm-4:15pm
CINF 21: Reporting and reuse of crystal structure data and knowledge

*Ian Bruno, bruno@ccdc.cam.ac.uk


Ian Bruno1 , Suzanna Ward1 , Colin Groom1
1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom

For half a century the Cambridge Crystallographic Data Centre (CCDC) has provided services that facilitate the reporting and reuse of small molecule crystal structure data. In recent years, we have evolved in order to manage the deposition of almost 100,000 structures per year and respond to the changing demands and expectations around scientific communication. These changes build on CCDC's history of providing access to datasets that support claims made in the scientific literature and of providing software applications that unlock knowledge captured in around 750,000 crystal structures for reuse across chemistry domains. This presentation, timed to coincide with the 50th anniversary of the Cambridge Structural Database, will describe recent developments specifically targeted at improving the validation, discoverability and reuse of crystal structure data and knowledge for current and future generations of researchers.

4:15pm-4:45pm
CINF 22: Reproducibility and variance of literature compound structure and bioassay data

*John Overington, jpo@ebi.ac.uk


John Overington1
1 EMBL European Bioinformatics Institute, Hinxton, United Kingdom

The literature, both patent and peer-reviewed, is a valuable source of background knowledge and prior art in medicinal chemistry. Mining of this allows activities such as library design, lead optimisation and analysis of safety liabilities. Central to our own research is the construction of two chemistry centered resources - ChEMBL (https://www.ebi.ac.uk/chembl) for published literature and SureChEMBL (https://www.surechembl.org) for patent literature; by the nature of their construction there are ambiguities/errors in both of these resources, although due to their construction methods (manual vs automated) the error structure is different in both. A further complexity is in the propagation of errors from the original published material - these errors can easily propagate from one resource to another, a final class of error is where the structure of the compound or original bioassay are not as reported. We will review ambiguities from all these error sources, and present some approaches to curate and flag inconsistent data. When data is filtered appropriately it is then possible to address the intra-lab/protocol variability for a particular class of assay, we have so far investigated variance in biochemical and cell-based assays.
CINF: Sci-Mix
8:00pm - 10:00pm
Monday, March 23

Hall C - Colorado Convention Center

8:00pm-10:00pm
CINF 11: Addressing researcher Incentives for publishability over accuracy

*Sara Davis Bowman, sed8n@virginia.edu


Sara Davis Bowman1 , Brian Nosek1,2
1 Center for Open Science, Charlottesville, Virginia, United States; 1 Center for Open Science, Charlottesville, Virginia, United States; 2 University of Virginia, Charlottesville, Virginia, United States

The climate of academic scientific research requires publication for professional success and sustainability. Publication of novel, positive, and “exciting” research is emphasized over negative results and replications. Thus, researchers are incentivized to make design, analysis, and reporting decisions that promote positive results and ignore negative results. This results in a biased body of literature. When the incentive structure emphasizes novelty over replication, false results persist in the literature because they remain unchallenged. The accumulation of scientific knowledge is slowed by both of these factors. This talk touches on the evidence and challenges for reproducibility in scientific research, then delves deeper into initiatives to nudge incentives and norms toward practices that can improve reproducibility.

8:00pm-10:00pm
CINF 1: Automated design of realistic organometallic complexes and catalysts

8:00pm-10:00pm
CINF 20: MIRAGE – the minimum information required for a glycomics experiment: Rationale and progress

*William York, will@ccrc.uga.edu


William York2 , Carsten Kettner1 , Rene Ranzinger2
1 Beilstein Institut, Frankfurt/Main, Germany; 2 Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, United States; 2 Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, United States

Scientists have not fully appreciated the biological importance of complex glycans until recently, in part because these molecules are considerably more difficult to characterize than nucleic acids (DNA and RNA) and proteins. Furthermore, glycans are synthesized by a complex process in which each residue is added by a specialized enzyme. Nevertheless, recently developed glycomics technology make it possible to identify and quantify the glycans present in complex mixtures prepared from diverse cells and tissues. Although the need to evaluate the quality of such glycoanalytic results before publishing them in scientific journals or storing them in databases is obvious, assessing the validity of such analyses is not trivial, requiring specific information regarding the methods used to generate and interpret the data. This realization led to the initiation of an international working group to establish guidelines for the Minimum Information Required for A Glycomics Experiment (MIRAGE), now funded by the Beilstein Institute. The working group is developing guidelines for reporting the resuts the most commonly used glycoanalytic methods, including mass spectrometry and glycan arrays. By adopting these guidelines, journals that publish glycoanalytic studies will increase the reliability of the manuscripts they publish. Such guidelines are critical if glycoanalysis is to be recognized as a mature discipline that stands up to rigorous scientific scrutiny. The philosophical and practical issues that must be addressed to establish effective MIRAGE guidelines will be discussed as will the progress made by the MIRAGE working group.

8:00pm-10:00pm
CINF 23: Chemical literature: A comparison of most important databases for searching the chemical literature from an undergraduate perspective

*Neelam Bharti, neelambh@ufl.edu


Neelam Bharti1
1 Marston Science Library, University of Florida, Gainesville, Florida, United States

Scientific database collects and process the scientific information from research papers, review articles, conference proceedings and case reports published in professional journals, patents, conference reports and at other platforms. They contain information about the topic, authors, abstract and other research specific information. Chemical literature search at undergraduate level is a mandatory exercise for students during their lab classes. When it comes to chemistry literature, there is always a dilemma, what are the best resources for non chemistry major undergrad? Three big databases, which play the most significant role in this exercise, appear to be SciFinder, Reaxys and Web of Science with their own characteristics. During my presentation, I will compare all these databases and their search results from an undergraduate perspective.

8:00pm-10:00pm
CINF 24: From lab to the libraries: A new route for chemistry librarianship

*Neelam Bharti, neelambh@ufl.edu


Neelam Bharti1
1 Marston Science Library, University of Florida, Gainesville, Florida, United States

Science and the libraries have an old tangible relationship which underwent change from time to time, but one thing stayed same and this is their connection to the knowledge. Role of a scientist and librarians also changed with time and today, both have a quest for the same thing, how to manage, access and use the scientific information in the best possible way. A research community requirements are more research focused and subject specific from libraries and their subject librarians. Chemistry librarians are expected to have thorough familiarity with subject and publishers update, take initiative for new subject material purchase, and should have a sufficient scientific background to know technical language, vocabulary and nomenclature. Concept to articulate in this presentation will be, how a researcher or a scientist can emerge as a library professional and how lab experience can be a given advantage to serve as a subject librarian. While at the same time it will highlight what professional challenges it brings when a scientist turned librarian deal with the subject community and academic library system.

8:00pm-10:00pm
CINF 25: 3Dmol.js: Simple visualization and sharing of 3D molecular data

*David Koes, dkoes@pitt.edu


David Koes1 , Nicholas Rego1,2
1 Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States; 1 Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States; 2 Department of Biochemistry and Molecular Biophysics,, University of Pennsylvania, Philadelphia, Pennsylvania, United States

3Dmol.js is an object-oriented, open-source JavaScript library that uses the latest web technologies to provide online hardware-accelerated molecular visualization. 3Dmol.js does not require Java or a browser plugin. 3Dmol.js provides a full featured API for developers, a declarative syntax for embedding viewers in HTML, and a hosted viewer that allows users to share complex scenes with a single URL.
Source code and documentation can be found at http://3Dmol.csb.pitt.edu.




8:00pm-10:00pm
CINF 26: Sharing and reproducibility/replication: An NIH view

*Philip Bourne, pebourne@gmail.com


Philip Bourne1
1 UCSD, Bethesda, Maryland, United States


The NIH is in the business of accelerating biomedical discoveries so as to
improve the human condition. That acceleration, like any scientific
endeavor, requires building on what has gone before. That process of
accumulation and reuse is not optimal and in the digital world there is even
less reason for this not to be so. Improving this situation requires policy,
infrastructure, changes to the incentives and reward structure, and more
broadly a cultural shift. I will describe policies in place or planned,
infrastructure development through The Commons and what we are doing to
encourage cultural change while acknowledging that none of this is easy.

8:00pm-10:00pm
CINF 34: Highly visual representation methods for comparison of chemical structures and related properties

*Jess Sager, jess.sager@yahoo.com


Jess Sager1 , Philip Mounteney1 , Curtis Snyder1 , Tamsin Mansley2
1 Dotmatics, Inc, San Diego, California, United States; 1 Dotmatics, Inc, San Diego, California, United States; 1 Dotmatics, Inc, San Diego, California, United States; 2 Dotmatics, Inc, Woburn, Massachusetts, United States

Modern tools provide many opportunities for comparison of structures and their properties, though sometimes the output of the comparison is hard to read or interpret. A highly visual output from computational tools facilitates structural searching and comparison. We will present case-studies using this approach. Some examples include color-coding structural components, depicting changing fragments, and presenting the results of structural comparison calculations graphically. These presentation methods are easily shared with colleagues and ease interpretation of large numbers of results.

8:00pm-10:00pm
CINF 35: Overview of the analytical Information markup language

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1
1 Department of Chemistry, University of North Florida, Jacksonville, Florida, United States

The Analytical Information Markup Language (AnIML – http://animl.sourceforge.net) is an XML data standard in development since 2004 and initiated as a replacement for the existing JCAMP data standard. Like JCAMP, AnIML is designed to store the analytical data from the instrument along with appropriate metadata describing the sample and the instrumental setup.

This presentation will go through the format of AnIML, show example files, and discuss implementation options. Additionally, the current status of the AnIML specification highlighting the challenges imposed by the original goal of the project to 'Develop an analytical data standard that can be used to store data from any analytical instrument' will be discussed.

8:00pm-10:00pm
CINF 36: Thermophysical property dissemination utilizing an XML-based standard

*Kenneth Kroenlein, kenneth.kroenlein@nist.gov


Kenneth Kroenlein1 , Robert Chirico1 , Vladimir Diky1 , Ala Bazyleva1 , Joseph Magee1 , Chris Muzny1
1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States

Exponential growth in publication rates and data generation in thermophysical properties has yielded tremendous challenges as well as potential rewards for data analysis groups. Data volumes have grown to such a degree that many traditional data collection and interpretation approaches cannot scale sufficiently to remain comprehensive and current, or to effectively track shifting interests within research and industrial communities. It is thus necessary to strongly rely on a substantially increased role for digital archives, automated analysis and machine learning approaches.

The approach adopted at the Thermodynamics Research Center (TRC) at the National Institute of Standards and Technology (NIST) is dynamic data evaluation, whereby a reliable and comprehensive underlying data archive is used in conjunction with an algorithmically-encoded expert analysis in order to generate up-to-date data recommendations. These efforts have facilitated a decade's long collaboration with 5 major journals which report thermophysical and thermochemical property information, where reported data are vetted for consistency by TRC before being made available in a free and open context. These data are disseminated via ThermoML [1], an XML-based file format and IUPAC standard that was developed in close collaboration with representatives from TRC, industry and academia, including journal editors. Schema development was largely informed by real data sets culled from the open literature, thus ensuring compatibility with a broad range of target information. Impacts from this collaboration and lessons learned from the development effort will be discussed.

[1] Frenkel, M.; Chirico, R.D.; Diky, V.V.; Marsh, K.N.; Dymond, J.H.; Wakeham, W.A.; Stein, S.E.; Königsberger, E.; Goodwin, A.R.H. “XML-based IUPAC standard for experimental, predicted, and critically evaluated thermodynamic property data storage and capture (ThermoML):IUPAC recommendations 2006.” Pure Appl. Chem. 2006, 78, 541−612.

8:00pm-10:00pm
CINF 37: Standard data format for computational chemistry: CSX

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1 , Neil Ostlund2 , Mirek Sopek3 , Bing Wang2
1 Department of Chemistry, University of North Florida, Jacksonville, Florida, United States; 2 Chemical Semantics Inc., Gainesville, Florida, United States; 2 Chemical Semantics Inc., Gainesville, Florida, United States; 3 Makolab S. A., Lodz, Poland

Computational Chemistry is becoming increasingly important in the scientific process due to its capabilities in predicting molecular properties and reaction pathways, energetics and kinetics. To address this need Chemical Semantics, with grant support from the U.S. Department of Energy, are in the second phase of development of a semantically enabled portal for publication, searching, and archiving of computational chemistry calculations.

This presentation will discuss the Chemical Semantics Markup Language (CSX), the file format used to transmit calculation results to the portal. CSX is designed to capture metadata about the published data, the chemical system under study, the type of calculation and input parameters, and calculation results. Copies of the calculation input and output files can also be stored within a CSX file, making it a perfect format for archiving of calculation results. In addition to the layout of CSX files, a discussion of how CSX metadata is transformed into linked semantic data will be presented along with current status of efforts to promote CSX as a new standard output format for computational chemistry software.

8:00pm-10:00pm
CINF 3: Classification of scientific journal articles for the NIST Thermodynamic Research Center

8:00pm-10:00pm
CINF 44: Building a standard for standards: The ChAMP project

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1 , Antony Williams2
1 Department of Chemistry, University of North Florida, Jacksonville, Florida, United States; 2 Royal Society of Chemistry, Cambridge, Cambridgeshire, United Kingdom

Development of the Chemical Analysis Metadata Platform (ChAMP) is a joint project between the University of North Florida and the Royal Society of Chemistry (RSC). In this project, we focused on the development of guidelines for organization, representation, and annotation of analytical science information, based around the definition of a standard set of metadata that can be used to annotate reports of chemical analysis methodologies.

Rather than define a standard that might be implemented across the board (the one size fits all approach) the platform describes metadata and data types, unique identifiers, controlled vocabularies (where appropriate), an ontology, and implementation guidelines such that any group, society, or company can use ChAMP as the basis for development of a standard that fits their needs. In this way, the same metadata items are identified uniformly across different standards allowing for searching and comparison across records from any and all standards based on ChAMP. This paper reports on the progress to date on ChAMP development.

8:00pm-10:00pm
CINF 4: Mining electronic lab notebooks for synthetic needles (or gems)

8:00pm-10:00pm
CINF 5: Withdrawn

8:00pm-10:00pm
CINF 6: Dynamic evaluation of impact for scholarly communications in the field of thermophysical properties

*Robert Chirico, robchirico@comcast.net


Robert Chirico1 , Vladimir Diky1 , Joseph Magee1 , Ala Bazyleva1 , Chris Muzny1 , Kenneth Kroenlein1
1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States

The Thermodynamics Research Center (TRC) at the National Institute of Standards and Technology (NIST) maintains an extensive database of published experimental thermophysical properties for pure compounds, binary and ternary mixtures, and chemical reactions. This database, in combination with expert system software (ThermoData Engine, TDE), allows on-demand (i.e., dynamic) critical evaluation of thermophysical property data. This combined system is at the core of a cooperation between NIST and key journals in this field, including the Journal of Chemical and Engineering Data, The Journal Chemical Thermodynamic, and Fluid Phase Equilibria, for rapid and substantive review of new experimental data in advance of publication. A measure of the immediate impact of new publications on the status of the field is a direct outcome of this analysis. Following publication, new experimental data are posted online in a structured format (ThermoML) for free download. Subsequent impact of new results can be assessed, in part, through access statistics, which provide a measure of interest outside of the traditional channel of citations in the academic literature. The overall system will be described with emphasis on impact assessment.
CINF: Research Results: Reproducibility, Reporting, Sharing & Plagiarism
8:30am - 11:50am
Tuesday, March 24

Room 110 - Colorado Convention Center
Martin Hicks, Organizing
Martin Hicks, Presiding
8:30am-8:35am Introductory Remarks

8:35am-9:05am
CINF 26: Sharing and reproducibility/replication: An NIH view

*Philip Bourne, pebourne@gmail.com


Philip Bourne1
1 UCSD, Bethesda, Maryland, United States


The NIH is in the business of accelerating biomedical discoveries so as to
improve the human condition. That acceleration, like any scientific
endeavor, requires building on what has gone before. That process of
accumulation and reuse is not optimal and in the digital world there is even
less reason for this not to be so. Improving this situation requires policy,
infrastructure, changes to the incentives and reward structure, and more
broadly a cultural shift. I will describe policies in place or planned,
infrastructure development through The Commons and what we are doing to
encourage cultural change while acknowledging that none of this is easy.

9:05am-9:35am
CINF 27: Globalization of Big Data: Access, integration, and quality control issues

*Stephen Boyer, skboyer@gmail.com


Stephen Boyer1 , Evan Bolton2 , Richard Martin3 , Eric Louie3 , Thomas Griffin1 , Gang Fu2 , Bo Yu2
1 IBM Research / Watson Division, San Jose, California, United States; 1 IBM Research / Watson Division, San Jose, California, United States; 2 NCBI / NLM / NIH, Warrenton, Virginia, United States; 2 NCBI / NLM / NIH, Warrenton, Virginia, United States; 2 NCBI / NLM / NIH, Warrenton, Virginia, United States; 3 IBM Watson, San Jose, California, United States; 3 IBM Watson, San Jose, California, United States

Numerous open source initiatives have made a significant amount of scientific information readily available on a global scale. To take advantage of this growing resource, several “big data” initiatives seek synergy among varied scientific pursuits. They strive to add value to the data by employing computer curation of text and images, data mining, and analytics. Challenges in this pursuit include data integration from disparate sources and quality control. Further complicating big data management is the wide variety of standards, nomenclatures and reference codes for entities such as chemicals, genes and ailments that are used globally, across industries and disciplines.

Furthermore, as the business of science becomes increasingly globalized, the importance of open access to worldwide intellectual property (IP) increases. Quality standards need to be upheld for international IP with global participation and without obfuscation.

We will discuss our collaborative efforts to identify and address some of these issues as part of our respective Watson and PubChem initiatives. We will then welcome discussion over our proposed solutions to arrive at global participation and cooperation.

9:35am-10:05am
CINF 28: Flagging and curating erroneous chemical and biological records using cheminformatics to ensure data reproducibility

*Denis Fourches, fourches@email.unc.edu


Denis Fourches1
1 Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States

With the growing availability of new data streams (e.g., literature-extracted assertions, online chemogenomics repositories, high-throughput screening) that involve very large sets of diverse compounds, the preprocessing of chemical and biological data has become a mandatory step. Additionally, concerns about the presence of significant fractions of irreproducible data entries recently culminated in a series of community-wide alerts [1, 2]. In this presentation, we discuss how cheminformatics approaches can be used to flag and potentially correct erroneous records in very large chemogenomics datasets. First, we discuss the overall problem of data irreproducibility in the context of Big Chemical Data and present several examples of non-reproducible data points. Then, we describe the impact of using non-curated chemical biological data to build QSAR models and conduct virtual screening. Building upon best practices for chemical standardization [3], we present a novel ensemble of guidelines for processing and cleaning both chemical structures and the target endpoint(s). In the second part of the presentation, we discuss the challenging cases of (i) data integration of CYP450 inhibition profiles from literature-extracted assertions and HTS results, (ii) experimental variability in multi-run HTS campaigns, and (iii) the detection of false-positives and false-negatives using baseline correction factors. The cheminformatics approaches described in this study represent another opportunity for engaging the community in curating chemogenomics records and thus ensuring their reproducibility.

[1] Collins FS, Tabak LA. NIH plans to enhance reproducibility. Nature, 2014, 505, 612-613.
[2] Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets ? Nat Rev Drug Discov, 2011, 10, 712-713.
[3] Fourches D, Muratov E, Tropsha A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model, 2010, 50, 1189-1204.
10:05am-10:20am Intermission

10:20am-10:50am
CINF 29: Increasing open communication to facilitate reproducibility

*Courtney Soderberg, courtney@cos.io


Courtney Soderberg1
1 Center for Open Science, Charlottesville, Virginia, United States

Transparency and reproducibility are two core values of scientific research. However, scientists often do not have the practical tools that will help them integrate these practices into their workflow in an efficient and effective manner. The development of such tools would improve scientific communication and increase the efficiency of the accumulation of knowledge. This presentation will introduce two such tools, the Open Science Framework (http://osf.io/) and the SHARE notification system, which can help researchers increase the documentation of their workflow, easily make their work publically accessible, and keep abreast of new publicly available research material.
CINF: Molecular & Structural 2D & 3D Chemical Fingerprinting: Computational Storing, Searching, & Comparing Molecular & Chemical Structures
1:30pm - 4:25pm
Tuesday, March 24

Room 110 - Colorado Convention Center
Rachelle Bienstock, Organizing
Rachelle Bienstock, Presiding
1:30pm-1:35pm Introductory Remarks

1:35pm-2:00pm
CINF 30: Insights into molecular similarity from crystal structures

*Colin Groom, groom@ccdc.cam.ac.uk


Colin Groom2 , Suzanna Ward2 , Ian Bruno2 , Shyam Vyas1 , Neil Feeder2
1 Cambridge Crystallographic Data Centre, Piscataway, New Jersey, United States; 2 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 2 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 2 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 2 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom

The output of every published crystal structure determination of a small organic molecule is available for all. Data from these experiments not only teaches us about the structure of molecules, it also informs us of their interactions.

When crystallizing, a molecule creates its own most preferred environment. This has given us invaluable insights into molecular recognition. But a comparison of these environment allows us to do something more – to compare molecules based on their interaction preferences.

This presentation, timed to coincide with the 50th anniversary of the Cambridge Structural Database, will highlight how Full Interaction Maps describe the environment around a molecule in a lattice and can be used to compare molecules – from the perspective of one molecule looking at another. We will also see how experimentally derived interaction fields can be used in virtual screening approaches.

2:00pm-2:25pm
CINF 31: Do chiral fingerprints and descriptors work?

*S Joshua Swamidass, swamidass@gmail.com


S Joshua Swamidass2 , Grover Miller1 , Tyler Hughes2 , Jessica Hartman1 , Steven Cothren1
1 Dept of Biochem and Mol Biol, Little Rock, Arkansas, United States; 1 Dept of Biochem and Mol Biol, Little Rock, Arkansas, United States; 1 Dept of Biochem and Mol Biol, Little Rock, Arkansas, United States; 2 Washington University in St. Louis, St. Louis, Missouri, United States; 2 Washington University in St. Louis, St. Louis, Missouri, United States

Chirality is a fundamental aspect of chemical structure, with specific importance in biological systems. Often otherwise identical molecules with different chirality exhibit large differences in their biological activity. Unfortunately, the 2D methods most widely used to understand and study chemical structures---like path-based fingerprints and topological descriptors---do not capture molecule's chiral features. Neither do many 3D methods---like many pharmacophore implementations---capture these critical chiral features. There are, however, ways of modifying fingerprints and pharmacophores to encode molecular chirality, though very little has been studied about the effectiveness of these approaches. Here, we explore whether or not these approaches encode chemical structures accurately enough to improve chemical property prediction on a several datasets with chiral molecules. Results will be presented on these experiments, with special attention to the implications of these results in chemical modeling and the value of 3D based modeling explicitly models all aspects of structure, including chirality.
2:25pm-2:40pm Intermission

2:40pm-3:05pm
CINF 32: Similarity to SAR - interactive navigation of similarity relationships to guide optimization

*Matthew Segall, matthew.d.segall@gmail.com


Matthew Segall1 , Edmund Champness1 , James Chisholm1 , Chris Leeding1 , Peter Hunt1 , Alex Elliott1 , Samuel Dowling1 , Hector Garcia1
1 Optibrium Ltd, Cambridge, United Kingdom; 1 Optibrium Ltd, Cambridge, United Kingdom; 1 Optibrium Ltd, Cambridge, United Kingdom; 1 Optibrium Ltd, Cambridge, United Kingdom; 1 Optibrium Ltd, Cambridge, United Kingdom; 1 Optibrium Ltd, Cambridge, United Kingdom; 1 Optibrium Ltd, Cambridge, United Kingdom; 1 Optibrium Ltd, Cambridge, United Kingdom

Many fingerprinting methods may be used to compare compounds and identify those that are similar in terms of structure or properties. These are used in a wide range of analyses, such as clustering, activity landscapes and matched molecular pairs (MMP), to find structure-activity relationships (SAR) that can guide the further optimisation of compounds and series. However, the interpretation of similarities and the resulting analyses can be challenging, presenting a barrier to the effective application of these methods. We will present a flexible and intuitive framework in which similarity relationships can be interactively navigated to quickly interpret the results and identify important SAR. We will illustrate this with applications of clustering in early lead identification and activity cliff detection and MMP analysis in lead optimisation.


An activity neighbourhood helps to quickly identify activity cliffs, i.e. small changes in structure that lead to large changes in activity, indicating important SAR.


3:05pm-3:30pm
CINF 33: Database fingerprint clustering methods using KNIME

*Rachelle Bienstock, rachelleb1@gmail.com


Rachelle Bienstock1
1 Independent Consultant, Chapel Hill, North Carolina, United States

Molecular fingerprints or string representations of chemical molecules and structures are widely used in chemoinformatics to computationally store and compare structures and identify structural similarities. Knime is a work flow processing tool which offers facility for applying clustering and fingerprinting methods to databases. This presentation will illustrate how knime can be used for comparing and clustering similar compounds in databases using fingerprinting methods .

3:30pm-3:55pm
CINF 34: Highly visual representation methods for comparison of chemical structures and related properties

*Jess Sager, jess.sager@yahoo.com


Jess Sager1 , Philip Mounteney1 , Curtis Snyder1 , Tamsin Mansley2
1 Dotmatics, Inc, San Diego, California, United States; 1 Dotmatics, Inc, San Diego, California, United States; 1 Dotmatics, Inc, San Diego, California, United States; 2 Dotmatics, Inc, Woburn, Massachusetts, United States

Modern tools provide many opportunities for comparison of structures and their properties, though sometimes the output of the comparison is hard to read or interpret. A highly visual output from computational tools facilitates structural searching and comparison. We will present case-studies using this approach. Some examples include color-coding structural components, depicting changing fragments, and presenting the results of structural comparison calculations graphically. These presentation methods are easily shared with colleagues and ease interpretation of large numbers of results.
3:55pm-4:00pm Concluding Remarks
CINF: Development & Use of Data Format Standards for Cheminformatics
9:00am - 11:55am
Wednesday, March 25

Room 110 - Colorado Convention Center
David Martinsen, Organizing
David Martinsen, Presiding
9:00am-9:05am Introductory Remarks

9:05am-9:35am
CINF 35: Overview of the analytical Information markup language

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1
1 Department of Chemistry, University of North Florida, Jacksonville, Florida, United States

The Analytical Information Markup Language (AnIML – http://animl.sourceforge.net) is an XML data standard in development since 2004 and initiated as a replacement for the existing JCAMP data standard. Like JCAMP, AnIML is designed to store the analytical data from the instrument along with appropriate metadata describing the sample and the instrumental setup.

This presentation will go through the format of AnIML, show example files, and discuss implementation options. Additionally, the current status of the AnIML specification highlighting the challenges imposed by the original goal of the project to 'Develop an analytical data standard that can be used to store data from any analytical instrument' will be discussed.

9:35am-10:05am
CINF 36: Thermophysical property dissemination utilizing an XML-based standard

*Kenneth Kroenlein, kenneth.kroenlein@nist.gov


Kenneth Kroenlein1 , Robert Chirico1 , Vladimir Diky1 , Ala Bazyleva1 , Joseph Magee1 , Chris Muzny1
1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States; 1 National Institute of Standards and Technology, Boulder, Colorado, United States

Exponential growth in publication rates and data generation in thermophysical properties has yielded tremendous challenges as well as potential rewards for data analysis groups. Data volumes have grown to such a degree that many traditional data collection and interpretation approaches cannot scale sufficiently to remain comprehensive and current, or to effectively track shifting interests within research and industrial communities. It is thus necessary to strongly rely on a substantially increased role for digital archives, automated analysis and machine learning approaches.

The approach adopted at the Thermodynamics Research Center (TRC) at the National Institute of Standards and Technology (NIST) is dynamic data evaluation, whereby a reliable and comprehensive underlying data archive is used in conjunction with an algorithmically-encoded expert analysis in order to generate up-to-date data recommendations. These efforts have facilitated a decade's long collaboration with 5 major journals which report thermophysical and thermochemical property information, where reported data are vetted for consistency by TRC before being made available in a free and open context. These data are disseminated via ThermoML [1], an XML-based file format and IUPAC standard that was developed in close collaboration with representatives from TRC, industry and academia, including journal editors. Schema development was largely informed by real data sets culled from the open literature, thus ensuring compatibility with a broad range of target information. Impacts from this collaboration and lessons learned from the development effort will be discussed.

[1] Frenkel, M.; Chirico, R.D.; Diky, V.V.; Marsh, K.N.; Dymond, J.H.; Wakeham, W.A.; Stein, S.E.; Königsberger, E.; Goodwin, A.R.H. “XML-based IUPAC standard for experimental, predicted, and critically evaluated thermodynamic property data storage and capture (ThermoML):IUPAC recommendations 2006.” Pure Appl. Chem. 2006, 78, 541−612.

10:05am-10:35am
CINF 37: Standard data format for computational chemistry: CSX

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1 , Neil Ostlund2 , Mirek Sopek3 , Bing Wang2
1 Department of Chemistry, University of North Florida, Jacksonville, Florida, United States; 2 Chemical Semantics Inc., Gainesville, Florida, United States; 2 Chemical Semantics Inc., Gainesville, Florida, United States; 3 Makolab S. A., Lodz, Poland

Computational Chemistry is becoming increasingly important in the scientific process due to its capabilities in predicting molecular properties and reaction pathways, energetics and kinetics. To address this need Chemical Semantics, with grant support from the U.S. Department of Energy, are in the second phase of development of a semantically enabled portal for publication, searching, and archiving of computational chemistry calculations.

This presentation will discuss the Chemical Semantics Markup Language (CSX), the file format used to transmit calculation results to the portal. CSX is designed to capture metadata about the published data, the chemical system under study, the type of calculation and input parameters, and calculation results. Copies of the calculation input and output files can also be stored within a CSX file, making it a perfect format for archiving of calculation results. In addition to the layout of CSX files, a discussion of how CSX metadata is transformed into linked semantic data will be presented along with current status of efforts to promote CSX as a new standard output format for computational chemistry software.
10:35am-10:50am Intermission

10:50am-11:20am
CINF 38: Development of an ontology specific to computational chemistry

*Mirek Sopek, sopekmir@makolab.pl


Mirek Sopek1 , Stuart Chalk1 , Bing Wang1 , Louis Nardozi1 , Neil Ostlund1
1 Chemical Semantics Inc., Gainesville, Florida, United States; 1 Chemical Semantics Inc., Gainesville, Florida, United States; 1 Chemical Semantics Inc., Gainesville, Florida, United States; 1 Chemical Semantics Inc., Gainesville, Florida, United States; 1 Chemical Semantics Inc., Gainesville, Florida, United States

Chemical Semantics is a new start-up devoted to bridging the gap between computational chemistry and the Semantic Web with a future goal to include chemical data of any kind, including experimental data. The company is building portals which currently enable chemists to publish results of computational experiments. Publications on these portals are uniquely addressable by Linked Data URIs, and are accessible for both humans (through HTML representation) and machines (through RDF – the lingua franca of the Semantic Web).
The company developed a structured XML file format (called CSX) that brings results from a variety of standard computational packages to the portals in a consistent and elegant way.
It has also been developing an ontology (called Gainesville Core (GC)) that defines vocabulary for semantic representation of computational chemistry results.
This presentation details the Gainesville Core ontology. It presents the theoretical foundations of the ontology, its relation to other chemical ontologies, and demonstrates its practical role in the portals being built by Chemical Semantics, Inc. It also sketches the path to other applications of the Ontology in the broader context of more general chemical knowledge representation.

11:20am-11:50am
CINF 39: Importance of data standards for large scale data integration in chemistry

*Antony Williams, tony27587@gmail.com


Antony Williams1 , Valery Tkachenko2 , Alexey Pshenichnov3 , Ken Karapetyan1 , Carlos Coba4
1 Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina, United States; 1 Cheminformatics, Royal Society of Chemistry, Wake Forest, North Carolina, United States; 2 Cheminformatics, Royal Society of Chemistry, Rockville, Maryland, United States; 3 Cheminformatics, Royal Society of Chemistry, Rockville, Maryland, United States; 4 Mestrelab Research, Santiago de Compostela, Spain

The Royal Society of Chemistry hosts large scale data collections and provides access to the data to the chemistry community. The largest RSC data set of wide scale interest to the community offers access to tens of millions of compounds. The host platform, ChemSpider, is limited as it is a structure centric hub only. A new architecture, the RSC data repository, has been developed that extends support to reactions, spectral data, crystallography data and related property data. It is also the architecture underlying a series of exemplar projects for managing data for a number of diverse laboratories. The adoption of data standards for the integration and distribution of data has been essential. Specific standards include molecular structure formats such as molfiles and InChIs, and spectral data formats such as JCAMP. This presentation will report on our development of the data repository, the importance of utilizing standards for data integration, the flexible nature of the architecture to deliver solutions for various laboratories and our efforts to develop new large data collections. This includes text-mining efforts to extract large spectrum-structure collections from large corpuses.
11:50am-11:55am Concluding Remarks
CINF: Development & Use of Data Format Standards for Cheminformatics
1:30pm - 4:25pm
Wednesday, March 25

Room 110 - Colorado Convention Center
David Martinsen, Organizing
David Martinsen, Presiding
1:30pm-1:35pm Introductory Remarks

1:35pm-2:05pm
CINF 40: InChI as the chemical data format standard for cheminformatics

*Stephen Heller, steve@hellers.com


Stephen Heller1
1 Retired, Silver Spring, Maryland, United States

The development and use of the InChI algorithmn will be presented. The background and histrory and use of the IUPAC endorsed InChI chemical structure standard will be described along with the current status and direction of the project.

2:05pm-2:35pm
CINF 41: Withdrawn
2:35pm-2:50pm Intermission

2:50pm-3:20pm
CINF 42: JCAMP-MOL: A JCAMP-DX extension to allow integrated delivery of structural models and correlated spectral data

*Robert Hanson, hansonr@stolaf.edu


Robert Hanson1 , Robert Lancashire2
1 Chemistry, St Olaf College, Northfield, Minnesota, United States; 2 Department of Chemistry, University of the West Indies, Kingston, Jamaica

JCAMP-MOL is a simple extension to the JCAMP-DX format that allows addition of 3D Jmol-readable models to a JCAMP file to allow association of specific IR and RAMAN spectral bands with vibration models, MS peaks with molecular fragment models, and NMR signals with specific atoms in a structure. The advantange of the format is that it provides for a single file that can be easily constructed from data formats readily available from other sources (structure servers for 3D models; instruments and simulation programs for spectral data). The file format can be read either by the standalone Jmol application (version 13+, incorporating JSpecView) or by twin Jmol and JSpecView applets on a web page, which can be either Java-based or purely HTML5/JavaScript. From a teaching perspective, the ability to create interactive web-based displays that can highlight and hyperlink areas of a spectrum/molecular display are powerful visual aids for significantly improving discussion of key spectral features and can simplify the process of learning how to interpret IR, MS, and NMR spectra. For example, clicking on an atom or selecting an IR/RAMAN vibration in Jmol highlights a band or peak or fragment on the spectrum. Clicking on the spectrum highlights one or more atoms, animates an IR vibration, or displays an MS fragment in Jmol. Examples will be highlighted during the presentation.

3:20pm-3:50pm
CINF 43: Communicating crystal structures: Successes, challenges, and opportunities

*Ian Bruno, bruno@ccdc.cam.ac.uk


Ian Bruno1 , Colin Groom1 , Suzanna Ward1
1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom; 1 Cambridge Crystallographic Data Centre, Cambridge, United Kingdom

The Crystallographic Information Framework (CIF) has had a transformative effect on the communication and publication of crystal structure determinations since the crystallographic community widely adopted it in the 1990s. CIF combines both data and textual information relating to an experiment and has enabled streamlined workflows for validation, publishing and archive of crystal structure determinations. This presentation, timed to coincide with the 50th anniversary of the Cambridge Structural Database, will reflect on reasons for the successful adoption of CIF, future opportunities for building on the framework and lessons that may be applicable to the communication of data from other domains. It will also describe the benefits and challenges of linking the results of a crystal structure determination to a reliable representation of the chemical substance studied.

3:50pm-4:20pm
CINF 44: Building a standard for standards: The ChAMP project

*Stuart Chalk, schalk@unf.edu


Stuart Chalk1 , Antony Williams2
1 Department of Chemistry, University of North Florida, Jacksonville, Florida, United States; 2 Royal Society of Chemistry, Cambridge, Cambridgeshire, United Kingdom

Development of the Chemical Analysis Metadata Platform (ChAMP) is a joint project between the University of North Florida and the Royal Society of Chemistry (RSC). In this project, we focused on the development of guidelines for organization, representation, and annotation of analytical science information, based around the definition of a standard set of metadata that can be used to annotate reports of chemical analysis methodologies.

Rather than define a standard that might be implemented across the board (the one size fits all approach) the platform describes metadata and data types, unique identifiers, controlled vocabularies (where appropriate), an ontology, and implementation guidelines such that any group, society, or company can use ChAMP as the basis for development of a standard that fits their needs. In this way, the same metadata items are identified uniformly across different standards allowing for searching and comparison across records from any and all standards based on ChAMP. This paper reports on the progress to date on ChAMP development.
4:20pm-4:25pm Concluding Remarks