Vol. 68, No. 4: Winter 2016

Chemical Information Bulletin

A Publication of the Division of Chemical Information of the ACS
Winter 2016 — Vol. 68, No. 4

Philadelphia, PA

 

David Shobe, Editor,
Patent Information Agent
avidshobe@yahoo.com

ISSN: 0364-1910
Chemical Information Bulletin,
© Copyright 2016 by the Division of Chemical Information of the American Chemical Society.
 

Message from the Chair

Rachelle BienstockThis will be my final message to you as CINF chair, as I turn over the reins of leadership to Erin. In evaluating the past two years in which I have had the privilege of serving the CINF community, I cannot express what a rewarding and enriching experience it has been. I have had the opportunity to meet and work with the leading chemoinformaticians, information chemists, and chemical librarians, and have had the responsibiity of insuring that CINF continues to grow as an ACS division.

CINF has seen its programming at national meetings grow both in quality and quantity. Attendance in our sessions at national meetings has been substantial. In fact, one of our recurring complaints is assignment to rooms too small to accommodate attendance at national meetings! We have had continued cosponsorship of programming with several other divisions, notably COMP, CHED, CHAL, ANYL, and MEDI. We will continue working with other divisions, particularly in trying to develop a closer relationship with COMP regarding complementary programming. We have worked closely with CHED on the Online Chemistry Course (OLCC) and developing educational training in the area of chemical information; hopefully, these types of programs and Innovative Project Grants (IPGs) will continue in the future and grow from a well-established base. Hopefully, too, ACS will reward division cosponorship as a true cosponsorship and give credit to both divisions.

We have blazed new trails with our “big data” summit, and CINF will continue with multiple-day, big-topic programming, when appropriate, at our national meetings. Our Skolnik symposiums have awarded the best innovation in chemical information and we have had wonderful symposia honoring the work of Dr. Jürgen Bajorath in the area of QSAR, scaffold definition, and Matched Multiple Pairs (MMP), and Drs. Evan Bolton and Stephen Bryant for their work on PubChem.

The attendance at the meeting in Philadelphia was a bit smaller, and ACS is evaluating the national meeting format and looking at ways to make improvements and innovations for the future. There is discussion of shortening the meeting and perhaps making more content available streaming or virtually and other novel ways of reaching more members, or making participation at national meetings more convenient and affordable.

Our sessions at the Philadelphia meeting involved education and chemical information (Bringing Chemoinformatics Into the Classroom); a Genetech-sponsored symposium organized by Dr. Dan Ortwine (Effectively Harnessing the World’s Literature to Inform Rational Compound Design); a session on identifying druggable targets from databases (Shedding Light on the Dark Genome: Methods, Tools & Case Studies); Safety (Using Public Information to Support a Chemical Safety Culture); and a joint symposium with the analytical division (a first!) (New Directions in Chemometrics: Making Sense of Big & Small Chemical Data Sets).

In a few weeks we will be meeting in Philadelphia.  We have an excellent program planned, including a session cosponsored by MEDI, “Effectively Harnessing the World’s Literature to Inform Rational Compound Design”, and a session cosponsored with CHED on “Bringing Cheminformatics Into the College Chemistry Classroom”, and with BIOT, COMP and MEDI “Shedding Light on the Dark Genome” and CINF is cosponsoring a ANYL symposium on “New Directions in Chemometrics: Making Sense of Big and Small Chemical Data Sets”.

So I leave CINF leadership in good hands and I am hopeful that we will continue growing and offering quality benefits and programming to our members. We still need to not only grow our division in members but also in active members willing to volunteer, to help grow our programming and activities and improve our division and member benefits and offerings because there is always room for improvement!

Your chair,

Rachelle J. Bienstock, Rachelleb1@gmail.com

Letter from the Editor

David ShobeThis issue of the Chemical Information Bulletin contains not only Rachelle Bienstock’s last message as chair of CINF, but my first message as editor of the Bulletin. But enough about me: without the input of numerous contributors, this publication would not be possible.

One such contribution is Svetlana Korolev’s interview with David Evans, Scientific Affairs Director for RELX (formerly Reed Elsevier) and chair of the CINF Awards Committee. In the interview, David Evans discusses Reaxys Inspires Chemistry, the evolving InChI standard, and his advice for young chemists eyeing alternative careers.

The bulk of this issue covers the Fall ACS National Meeting in Philadelphia, PA. For those of us that were unable to attend (and even those who did), summaries of several symposia from the technical program are included. This includes Wendy Warr’s report on the 2016 Herman Skolnik Award Symposium as well as reports on “New Directions in Chemometrics”, “Shedding Light on the Dark Genome”, and “Bringing Cheminformatics Into the College Chemistry Classroom”. A series of 119 photographs of the meeting, most by Wendy Warr and Phil Adler, are available in seven albums at https://www.flickr.com/photos/cinf/albums.

Also included in this issue are reports from several CINF and ACS committees, including the ACS Council Meeting, in which Bonnie Lawlor was formally appreciated for her 25 years of service on the ACS Council. Winners of CINF awards and scholarships are announced in the next section of the Bulletin, along with calls for nominations for upcoming awards. In addition, be sure to read the sponsor announcements near the end of the Bulletin.

Yours,
David Shobe
avidshobe@yahoo.com

Awards and Scholarships

2016 CINF Scholarship for Scientific Excellence Presented

CINF Scholarship for Scientific Excellence Winners Fall 2016

The scholarship program of the Division of Chemical Information (CINF) of the American Chemical Society (ACS) is designed to reward students and postdoctoral fellows in chemical information and related sciences for scientific excellence, and to foster their involvement in CINF. Since 2005 the program has awarded scholarships at each of the ACS National Meetings, 64 scholarships in total. The awards at the 252nd National Meeting in Philadelphia were sponsored by the American Chemical Society (ACS) Publications

Applicants presented their posters at the CINF Welcoming Reception and the Sci-Mix session, and the three winners received scholarships at the CINF Luncheon during the same meeting. Three full scholarships valued at $1,000 each were awarded to George Van Den Driessche, Mojtaba Haghighatlari, and Nathanael Kazmierczak.

The names of the recipients and the titles of their posters are (listed from left to right on the photo):

Mojtaba Haghighatlari, Department of Chemical and Biological Engineering, University at Buffalo “ChemML: A Machine Learning and Informatics Program Suite for the Chemical and Materials Sciences”

George Van Den Driessche, Department of Chemistry, Bioinformatics Research Center, North Carolina State University  “Forecasting Adverse Drug Reactions Triggered by the Common HLA-B*57:01 Variant”

Nathanael Kazmierczak, Department of Chemistry & Biochemistry, Calvin College “Modeling spectrophotometric titration data: tracking error from the measurement, through the model, and to the targeted output parameters”

Stuart Chalk, Coordinator, CINF Scholarships for Scientific Excellence

2017 Herman Skolnik Award Announced

The American Chemical Society Division of Chemical Information is pleased to announce that David Winkler, CSIRO, Australia, has been selected to receive the 2017 Herman Skolnik Award for his seminal contributions to chemical information in the development of optimally sparse, robust machine learning methods for QSAR and in leading the application of cheminformatics methods to biomaterials, nanomaterials, and regenerative medicine. The award recognizes outstanding contributions to and achievements in the theory and practice of chemical information science and related disciplines. The prize consists of a $3,000 honorarium and a plaque. Prof. Winkler will also be invited to present an award symposium at the fall 2017 ACS National Meeting to be held in Washington, D.C.

Dr. David WinklerProf. David Winkler is a Senior Principal Research Scientist at CSIRO Manufacturing in Clayton and Adjunct Professor at Monash, Latrobe, Flinders, and Nottingham Universities. During his thirty years at CSIRO he has worked on a variety of projects involving the discovery of bioactive agents and materials and has been active in developing improved modelling methods for QSAR. He has substantial experience working with industry clients including AMRAD, Du Pont, Schering Plough, Bio-RAD, Sirtex Medical, and Air Liquide. His work has also contributed to several biotechnology startup companies; Starpharma, Asymmetrex, and Betabiotics. He also licensed his Bayesian machine learning modelling methods to the Bio-RAD Corporation. His current research interests include molecular design, computational chemistry, QSAR, complex systems, stem cell modelling and simulation, computational nanotoxicology, design of materials, tissue engineering, and biomaterials. Current projects include application of novel mathematical techniques to drug and materials design, design of drugs for myelofibrosis, design and optimization of materials for medical applications and to direct the fate of stem cells, and modelling interactions of nanomaterials with biology. He is past Director of Science and Technology Australia, past Board Chair of the Royal Australian Chemical Institute (RACI), and current President-Elect of the Federation of Asian Chemical Societies (FACS). He a Fellow of the RACI and Asian Federation for Medicinal Chemistry (AFMC) and a Board member of the QSAR and Modelling Society and the Chemical Structure Association Trust. He represents the RACI on the Pacifichem international organizing committee.

The awarding of the 2017 Herman Skolnik Award to Winkler recognizes the significant contributions to the fields of QSAR and cheminformatics methods. His contributions to chemical information are novel and diverse, from conducting seminal work on the use of cheminformatics methods to model the biological impacts of nanomaterials to the application of informatics and QSAR methods to biomaterials and regenerative medicine.

Winkler’s early research focused on the design and properties of drugs, and he recently developed the most comprehensive model to predict aqueous solubility of small molecule drug candidates. He has designed several drugs that are progressing to the clinic for hypertension, fibrosis, radioprotection, and the first disease modifying drug candidates for incurable myeloproliferative neoplasms. Over the last decade Winkler shifted research focus to predicting the properties of materials and applying cheminformatics methods to a broad range of materials including nanomaterials, biomedical materials, ceramics, ionic and supercritical liquids, and catalysts. His work is documented in 190 publications, 22 book chapters, and 25 patents, many with his long-term collaborator Frank Burden. He has given over 300 presentations at international conferences.

Winkler is also cited for his general contributions to our field. He has served on the editorial boards of journals such as Molecular Informatics, ChemMedChem, and Perspectives in Drug Discovery and as a reviewer for many journals including The Journal of the American Chemical Society, The Journal of Medicinal Chemistry, Australian Journal of Chemistry, and several Nature journals. He has also served on numerous committees such as the Australian Academy of Science’s National Committee for Chemistry, the Molecular Graphics and Modelling Society (MGMS), the Chemical Structure Association Trust, and an IUPAC committee on QSAR nomenclature. He has also been very active in promoting our field in the Asia–Pacific region, as a long-term director of the Chemical Information Network project of the FACS past President of the AFMC.

David Evans, Chair, CINF Awards Committee

2017 CINF Scholarship for Scientific Excellence: Call for Applications

The international scholarship program of the Division of Chemical Information (CINF) of the American Chemical Society (ACS) sponsored by ACS Publications (http://pubs.acs.com) is designed to reward students in chemical information and related sciences for scientific excellence and to foster their involvement in CINF.

Up to three scholarships valued at $1,000 each will be awarded at the 253rd ACS National Meeting in San Francisco, CA, April 2-6, 2017. Student applicants must be enrolled at a certified college or university; postdoctoral fellows are also invited to apply. The applicants will present a poster during the welcoming reception of the Division on Sunday evening at the national meeting. Additionally, they will have an option to show their posters at the Sci-Mix session on Monday night. Abstracts for the poster must be submitted through MAPS, the abstract submission system of ACS.

To apply, please inform the chair of the selection committee, Stuart Chalk, at schalk@unf.edu that you are applying for a scholarship. Submit your abstract at http://maps.acs.org using your ACS ID. If you do not have an ACS ID, follow the registration instructions. Submit your abstract in the CINF program in the session “CINF Scholarship for Scientific Excellence. Student Poster Competition.” MAPS is now open and submissions are due by October 31, 2016. Additionally, please send a 2,000-word abstract describing the work to be presented to schalk@unf.edu by February 28, 2017. Any questions related to applying for one of the scholarships should be directed to the same e-mail address.

Winners will be chosen based on the content, presentation, and relevance of the poster, and their names will be announced during the Sunday reception. The content should reflect upon the student’s work and describe research in the field of cheminformatics and related sciences.

Stuart Chalk, Coordinator, CINF Scholarship for Scientific Excellence

2018 Herman Skolnik Award: Call for Nominations

The ACS Division of Chemical Information established this Award to recognize outstanding contributions to and achievements in the theory and practice of chemical information science. The Award is named in honor of the first recipient, Herman Skolnik.

By this Award, the Division of Chemical Information is committed to encouraging the continuing preparation, dissemination, and advancement of chemical information science and related disciplines through individual and team efforts. Examples of such advancement include, but are not limited to, the following:

  • Design of new and unique computerized information systems;
  • Preparation and dissemination of chemical information;
  • Editorial innovations;
  • Design of new indexing, classification, and notation systems;
  • Chemical nomenclature;
  • Structure-activity relationships;
  • Numerical data correlation and evaluation;
  • Advancement of knowledge in the field.

The Award consists of a $3,000 honorarium and a plaque. The recipient is expected to give an address at the time of the Award presentation. In recent years, an Award Symposium has been organized by the recipient.

Nominations for the Herman Skolnik Award should describe the nominee’s contributions to the field of chemical information and should include supportive materials such as a biographical sketch and a list of publications and presentations. Three seconding letters are also required. Nominations and supporting material should be sent by email to awards@acscinf.org. Paper submissions will not be accepted. The deadline for nominations for the 2017 Herman Skolnik Award is June 1, 2017.

David Evans, Chair, CINF Awards Committee

2017 Lucille M. Wert Scholarship: Call for Applications

Designed to help persons with an interest in the fields of chemistry and information to pursue graduate study in library, information, or computer science, the scholarship consists of a $1,500 honorarium. This scholarship is given annually by the Division of Chemical Information of the American Chemical Society.

The applicant must have a bachelor’s degree with a major in chemistry or related disciplines (e.g., biochemistry or chemical informatics). The applicant must have been accepted (or be currently enrolled) into a graduate library, information, or computer science program in an accredited institution. Work experience in library, information or computer science is preferred.

The deadline to apply for the 2017 Lucille M. Wert Scholarship is February 1, 2017. Details on the application procedures can be found at: http://www.acscinf.org/content/lucille-m-wert-student-scholarship.

Applications should be sent by email to: marge.matthews@outlook.com.

Marge Matthews, Coordinator, Lucille M. Wert Scholarship

Interview with David Evans

Behind-the-scenes conversation with a new CINF Awards Chair David Evans

In this bulletin we continue the “Meet your new CINF functionary” series of interviews. David Evans took over the helm of the CINF Awards Committee from Andrea Twiss-Brooks in January 2016. Mindful of the confidential nature of the committee work, Dr. Evans has graciously agreed to discuss his passion for giving professional prizes as well as his leadership of the Reaxys Inspiring Chemistry program and chairmanship of the InChI Trust.

David EvansBio: Dr. David Evans is a Scientific Affairs Director for RELX Intellectual Properties SA. He has been with RELX Group (parent company of Elsevier) in a variety of roles, including journals and books publishing, and software product management, for over 15 years. Dr. Evans has led Reaxys Inspiring Chemistry efforts since the program’s inception in 2009. His previous work experience includes positions as Executive Publisher at Elsevier (2004–2009), Senior Product Manager at MDL Information Systems, Inc. (1999–2004), Applications Scientist at Oxford Molecular Group (1998–1999), and a Research Fellow at New York University (1996–1998). David Evans has earned BSc and PhD degrees in chemistry at the University of Bath in the United Kingdom.

Svetlana Korolev: Greetings David! Let’s begin our conversation with an overview of your career path. When did you first realize you have an interest in chemistry? Have you considered specializing in other professions? Who or what influenced your transition to the field of scientific publishing?

David Evans: From an early age I have been (and remain) fascinated by how things work. I remember successfully pulling several radios apart as a child and then rather unsuccessfully trying to put them back together again afterwards. At school I was good at and enjoyed science and mathematics, and, despite occasional diversions into the arts, I have remained true to that calling. What pushed me into a chemistry degree was the passion, energy, and eloquence of my chemistry teachers when I was at school. One thing that motivates me today is communicating about science and chemistry. I am very lucky even today to be surrounded by people who have a passion and desire to learn to explore and to communicate about science.

I started out after my post-doc working for a contract research organization. After a short stint there I moved to MDL. The company had been recently purchased by Elsevier. After about five years with MDL in San Leandro, CA, I moved to Elsevier’s headquarters in Amsterdam for a publishing role managing the toxicology portfolio. Then after time in Amsterdam, New York, and Paris for Elsevier, I moved to Switzerland for RELX.

SK: You have been with RELX Group for over 15 years. How has your job evolved in that time? Can you describe the main activities of the “Scientific Affairs Director”? Which scientific, technical, and medical solutions do you oversee? Can you share with us more details about the Reaxys Inspiring Chemistry program and its team?

DE: I have been very lucky to be a part of RELX Group (the new name for Reed Elsevier). The business transformation in that time has been extraordinary (not a surprise for members of the ACS Division of Chemical Information) with the changes from predominantly print and advertising to predominantly digital products. In the future, I think, we will see even more changes as RELX Group continues to develop information-based analytics and decision-support tools. It is an exciting time and area to be involved with.

My role is a combination of communication, awareness, networking, and engagement. I work across a number of areas mostly in the life sciences. Product teams I work closely with are Reaxys, PharmaPendium, and Embase. I also work with other groups within Elsevier and other RELX Group companies.

The Reaxys Inspiring Chemistry program has three main aspects: 1) the Reaxys PhD Prize, 2) the Reaxys Prize Club, and 3) the Reaxys Advisory Board. The prize is intended to celebrate some of the very best chemistry being performed around the world by some of today’s brightest young chemists. We wanted to create something that was aimed squarely at those folks who are the future of science and of the world! The Prize Club is a networking alumni club for finalists and winners of the Reaxys PhD Prize. Finally, the Reaxys Advisory Board provides the Reaxys team with insights and advice on future directions.

The Reaxys Inspiring Chemistry program is one piece of the action here. There is a lot more that goes on besides. There are a bunch of really great people here who all make this a great place to work and live. I am very lucky to work with Anna, Coralie, Fabian, Fred, Ingrid, Ivana, Laure, and Robert, who are all part of the RELX team here in Switzerland. Some people who are involved in the program and known to the readers of this bulletin are: Pieder Caduff, Thibault Géoui, Tim Hoctor, and, of course, Jürgen Swienty-Busch.

SK: The name “Reaxys Inspiring Chemistry” sounds fascinating. May I ask you here a personal question of who or what inspires David Evans in life?

DE: When we were creating the program we liked the nuance the word “inspiring” had in that phrase. Who inspires me? Well gosh, a difficult question! I take inspiration from many different places. I said that I am passionate about how we communicate our science and the role science plays in our current and future lives. Not communicating properly does our science a disservice. There is an art to storytelling, I am blown away by those people who can capture an audience holding their rapt attention while they talk, those who can transform your mind through a single written phrase. They inspire me to go for it again every day. For example, the United Nations’ 17 Sustainable Development Goals are “a call to action to end poverty, protect the planet, and ensure that all people enjoy peace and prosperity”. Science, and chemistry in its many forms, will be key to the future of our planet and us on it. We need great scientists and we need great communicators to secure our future.

SK: In collaboration with Alexander Lawson, Jürgen Swienty-Busch, and Thibult Géoui, you wrote a chapter “The Making of Reaxys - Towards Unobstructed Access to Relevant Chemistry” (http://pubs.acs.org/doi/abs/10.1021/bk-2014-1164.ch008) for the 2014 ACS CINF symposium series book “The Future of the History of Chemical Information”. The chapter described the history leading to the launch of Reaxys in 2009 with its subsequent development until 2014 concluding the steps for better text-based searches by additional indexing supported by a chemical dictionary and searchable structures, enabling natural language query, integrating of the Reaxys repository into commercially available electronic lab notebooks, and linking to other information products (Scopus, ScienceDirect, PubChem, and eMolecules). Can you highlight some of the recent developments of Reaxys since 2014? What are the evolving steps looking forward to 2017?

DE: Reaxys is evolving all the time. There is an amazing group of people in the product team in Frankfurt. I guess Jürgen Swienty-Busch is the best known in the Division of Chemical Information, but there are a number of other people who all make this happen. A lot has happened since we wrote that chapter. We regularly perform “market research” studies, where we attempt to understand how people go about searching for information. Across all types of chemistry, about 70% of a researcher’s searches are text-based and 30% are structure-based. We also know that generic search engines (Google) are the first place people go. This kind of information is helping us decide which critical new features to include in Reaxys. The “Ask Reaxys” query box is an example, which involved an awful amount of work that has gone on in the background and the backend to structuring the data, ReaxysTree, understanding the query correctly, and then, of course, to returning the results—all in order to make the answers you get from “Ask Reaxys” meaningful. We are continuing to add experimental procedures from articles to enable users to see the “cooking instructions” when they see their results.

We’re also developing the medicinal chemistry features of Reaxys. I think the new data we’ve collected are outstanding, and the “heat map” capabilities really enable users to see their search results and to quickly find the answers they need.

I could go on (and on and on and on) but …well, there a great deal of work is going on for the future developments. One thing that I hope people are aware of is the work on the new Reaxys user interface. We’ve been working with a number of development partners on creating something that is really special. Over the past few months we’ve been in beta-testing, and (even though I say it) it is looking really nice and very special. I can’t wait for the release and to be able to show it off!

SK: Let’s move along from the evolution of Reaxys to the Reaxys PhD Prize, which has been awarded to the three most original and innovative researchers every year since 2010. There was a recent announcement of the ten shortlisted candidates for oral presentations at the 2016 Reaxys PhD Prize Symposium to be held on September 22-23, in London, U.K. All 45 finalists are invited to join the Reaxys PhD Prize Club. Can you reveal more information about the Club, its benefits, the communication channels of its network, and the symposium? Was it possible to observe an influence of the Reaxys PhD Prize on the careers of the “rising chemistry stars”?

DE: We wanted the Prize to be special. It has become acknowledged as the premier chemistry prize for PhD students, and we work hard to maintain and develop it. We imagined the Club back in 2009 when we launched a call for the Prize for 2010, the year of the first finalists and winners. We thought back then that inviting the finalists to join an alumni club for the Prize would be a great way for them to keep in touch with their fellow finalists and also to grow a network with finalists from other Prize years over time. We felt that networking and knowledge-sharing was something that we could offer to young scientists as they start their careers. We’ve set up a dedicated Internet site, which enables the Club members to find contact details for each other. We’ve set up groups on Facebook and LinkedIn too. In addition to providing travel bursaries for conferences, we’ve also supported travel amongst Club members to learn new chemistry and techniques in a couple of instances. We really want to help them to build a network and communicate with each other.

The symposium is the culmination of each year’s Prize. It is a special time for the finalists to come together and meet each other for the first time, meet members of the Reaxys Advisory Board, and usually a few members of the Club. It is always a harrowing time for me and the folks on the organizing team: we are running around making sure it all flows smoothly (or at least appears to!). We’ve been to some great places: from Nuremberg (Germany) to Bangkok (Thailand), to Philadelphia (USA), Grindelwald (Switzerland), Hong Kong, and this year to London, where we are being hosted by New Scientist Live (the U.K.’s biggest festival of science, technology, ideas, and discovery). It should be a great experience. And next year? Well, I know but I can’t say just yet …

We now have 315 Club members going back to 2010. We try to keep in contact and to know where people are and how their careers have evolved. We have over 60 people who are now in their first independent research positions in academia and close to 80 people who are in industrial research. And, of course, we have a number of people finishing their PhD work and in post-doc positions. I hope that the recognition of being a finalist or a winner of the Reaxys PhD Prize is something that helps people to stand out of the crowd!

SK: David, let’s make a connection from your leadership expertise with the Reaxys PhD Prize to becoming a chair of the CINF Awards Committee in 2016. How come this committee attracted you? Please highlight some non-confidential aspects of the CINF awards and scholarships. CINF has had some irregular success for the Herman Skolnik Award announcements to be published in Chemical & Engineering News. Is there a cost or other obstacle for a division to move up its prestigious award to the ACS national level of recognition? Who is the 2017 winner of the Herman Skolnik Award?

DE: I was invited to join the Awards Committee by one of the former chairs, Phil McHale. Phil had been my boss at MDL, and it was difficult to refuse when he asked! I thought that working on the Awards Committee would be a good way for me to contribute something back towards the Division of Chemical Information. After Phil, Andrea Twiss-Brooks become the chair, and then she twisted my arm into becoming chair. I must admit that I now know that she and Phil made the job of being chair seem effortless, and it is not! I am doing my best to live up to expectations. I am very lucky to have people on the committee who are doing an excellent job, and really ensure that the work of the committee gets done. We meet at every ACS national meeting, on the Saturday afternoon, immediately prior to the CINF Executive Committee meeting. We make decisions on the various Division awards like the Lucille M. Wert Scholarship and Val Metanomski Meritorious Service Award. The Herman Skolnik Award is handled differently as it is judged by a jury made up of the Awards Committee Chair, CINF Division Chair, and Division Chair-Elect.

I must admit I am still working on how best to promote or pitch the Herman Skolnik Awards to Chemical & Engineering News. I’d be happy for any thoughts and ideas from readers!

Am I allowed to announce the 2017 Skolnik Winner? I guess I am as there should be an announcement somewhere else in this bulletin. The 2017 recipient is Prof. Dave Winkler from CSIRO, in Australia, for his seminal contributions to chemical information in the development of optimally sparse, robust machine learning methods for QSAR, and in leading the application of cheminformatics methods to biomaterials, nanomaterials, and regenerative medicine.

SK: In addition to your involvement in recognizing excellence of division members and supporting students, you have been steadily contributing to the CINF technical program over many years. At the Fall 2015 ACS National Meeting you collaborated with Wendy Warr for organizing a full-day CINF symposium “Retrosynthesis, Synthesis Planning, Reaction Prediction: When Will Computers Meet the Needs of the Synthetic Chemist?” and then wrote a summary of it (http://bulletin.acscinf.org/node/812) for the Chemical Information Bulletin. Reviewing your presentations1 one may notice that the recent titles imply a sense of direction like in “navigating the sea of scientific information,” “from publishing to recognition,” “moving the standard ever onwards,” “digital transformation: the long and winding road,” “bridging worlds,” “from searching to finding,” “enabling information workflow”. Can you comment on such commonality and a scope of your presentations overall?

DE: I grew up watching Star Trek. It seemed so natural how Kirk et al. could ask the computer a question, and the answer was immediately forthcoming. Sandy Lawson in his Skolnik Award lecture described some of his ideas for an intelligent search (maybe support) assistant. Where is it not just you asking for answers, but the assistant actually knows what is going on, so maybe some analytical results have come in overnight and, based upon the calculated results, something is wrong. Your assistant is able to provide you with some background information, compare the results with others, and assist you to sort out what is going on!

We are a long way from being there, but I hope we’re getting there. A theme that runs through these and other presentations is all the hard work that goes into making that vision a reality. There is an awful lot of hard work that has, and continues to go into understanding customer’s needs, understanding data and its structure, understanding technology (and its limitations), and then there are some really clever people who are working to pull all of this together and make the magic happen.

SK: Let’s continue our conversation about your presentations with a focus on “moving the standard ever onwards” devoted to the InChI, the IUPAC International Chemical Identifier, and the InChI Trust, a U.K.-based charity founded in 2009 in support of the standard’s continued development. You are the current Chairman (2012-2017) elected to the Board of Directors (2010-2017) of the InChI Trust. The ACS Division of Chemical Information became a (non-paying dues) supporter of the InChI Trust promptly after the trust’s establishment and has organized several symposia at ACS national meetings with several reports in the Chemical Information Bulletin written by Keith Taylor (Winter 2015), Carmen Nitsche (Winter 2014; Winter 2011), and Alex Tropsha and Antony Williams (Summer 2012). Can I ask you the same question as in the title of Carmen’s report “What’s Up InChI?” Please discuss some features of the current InChI projects. What are the main activities conducted by the Board of Directors? How do supporting organizations, including yours, help in moving the InChI standard forward?

DE: The InChI Trust was set up in order to provide support for the ongoing maintenance and development of the InChI algorithm. In conjunction with IUPAC, we are also involved in extending the definition of the standard. The Board of Directors oversees the work of the Trust and we try to step in and ensure that things are moving ahead continually. The Trust is supported by some of the largest chemistry publishers, and some of the world’s largest government research institutions, as well as by an amazing group of associate members and supporters: it really is a great group of people to be part of. The InChI is a crucial part of “Internet plumbing” and it is our duty to ensure that it is fit for purpose, helping link together chemistry information all around the Internet world.

There is a lot going on at the moment. The IUPAC working groups are defining InChI standards for large molecules, mixtures, some aspects of complex coordination chemistry and organometallics. There is a group working on creating QR codes for InChIs: you can imagine how this and the mixtures group can really help make a difference in lab safety, where easy retrieval is vital.

A shameless plug here…next year immediately before the Fall ACS National Meeting in Washington, D.C., there will be a three-day meeting focusing on the future directions for the InChI. The NIH is graciously hosting us. Evan Bolton, Steve Heller, and Alan McNaught are developing the agenda and themes as we speak. So, if anyone has any ideas or wants to get involved please let me, Evan, Steve, or Alan know!

SK: Going a few years back in time, you co-authored a CINF talk titled “Beyond the Journal: Innovation in 21st Century Publishing” presented by Martin Tanke at the 2011 Herman Skolnik Award Symposium honoring Alexander Lawson. (A symposium report by Wendy Warr is at: http://bulletin.acscinf.org/node/256/.) Are you currently involved in journal publishing at any part? Can you point to some of the prominent technological advances for “smarter content” and added-value functionality links with Reaxys, or other enhancements at the article level in the last five years?

DE: I still regularly work with my colleagues in Elsevier’s journals publishing group, but I am not directly involved in publishing. I think over the last few years we’ve all seen some leaps and bounds in terms of what we can read and do online with journal articles. Some of the molecular viewers, and some of the interlinking between articles and other resources (including Reaxys) are creating a new environment for the reader. There is a great deal of work going on behind the scenes at Elsevier in the areas of smarter content and content enrichment, some of which I and others have spoken about at ACS national meetings. This work to enhance the reader journey is happening not only in the chemistry arena, but also across all of STM. And, of course, Elsevier is not alone in this regard. I think all publishers realize that enabling readers to find more relevant information is crucial. Here is also where initiatives like the InChI, and the work of the InChI Trust and IUPAC, are so important by providing standards.

SK: David, let me conclude our conversation on a personal note. You have lived in many countries: the United Kingdom, United States of America, and Switzerland, and travelled around the world. Are there any favorite places you liked living? Please tell us something about yourself beyond your professional life. Do you have hobbies?

DE: Wow. I am lucky that my job enables me to travel around the world. I get to visit some great places and to meet some great people. I always try to live where I live. I’ve lived in New York, Amsterdam, and Paris, and now I live in a small village in Switzerland. I loved those cities and the city life, but now I live in the countryside, and I love living here. I try to do all the things one can do here and now.

About four years ago my wife and I put all hobbies on hold, and started a new project. The project codename is William. It has pretty much taken over our lives. The project provides us with a sense of purpose, and an awful lot of headaches, but a lot of happiness and joy. We recently released William 4.0 into the local environment, enabling interactions with some other local projects. So far it seems to be going OK.

In addition to school, our son enjoys taking us to the local lake for swimming and ice-cream, cycling around the countryside, and running and running and running. Last winter, we put him on skis for the first time. It is now early September and he is asking when we can go to mountains to ski! I guess you get the idea, we are two very proud parents, who have a wonderful little boy!

SK: Thank you for your great sense of humor and time for this interview. What would be your final words of advice for young scientists wishing to explore alternative careers in chemistry like yours?

DE: Go for it! Science has so many different aspects to it. Take all the opportunities you can, explore many different paths until you find what makes you tick! Don’t be afraid to go for it.

David Evans’s presentations at recent ACS national meetings:

  1. Navigating the sea of scientific information. David Evans, Pieder Caduff, Thibault Géoui, Jürgen Swienty-Busch. 144-CINF, spring 2015. Slides http://bulletin.acscinf.org/PDFs/251nm/2016_spring_CINF_144.pdf
  2. From publishing to recognition - indexing literature for natural products. David Evans, Pieder Caduff, Jürgen Swienty-Busch. 2-CINF, fall 2014.
  3. Moving the standard ever onwards: The role of the InChI Trust in supporting and developing the InChI. David Evans. 31-CINF, fall 2014.
  4. Digital transformation - the long and winding road. David Evans, Pieder Caduff, Jürgen Swienty-Busch. 90-CINF, fall 2014.
  5. Bridging worlds: Speaking multiple scientific languages. Jessica Peterson, Pieder Caduff, David Evans, Jürgen Swienty-Busch.1-CINF, spring 2014.
  6. From searching to finding: New developments for managing large data sets. Jürgen Swienty-Busch, David Evans. 67-CINF, spring 2014.
  7. Enabling the translational medicine and drug discovery information workflow. David Evans, Timothy Hoctor, Jacqui Mason, Pieder Caduff. 61-CINF, fall 2013.
  8. Reaxys as an information resource for food chemistry. David Evans, Jürgen Swienty-Busch. 49-CINF, spring 2013.
  9. Chemical science that underpins the Reaxys database. Jürgen Swienty-Busch, Pieder Caduff, David Evans. 92-CINF, spring 2013.
  10. Helping you make the right choices for your next synthetic route! Jürgen Swienty-Busch, David Evans. 92-CINF, spring 2012.
  11. InChI here, InChI there, InChIs everywhere. Jürgen Swienty-Busch, David Evans.105-CINF, spring 2012.
  12. Useful and fun chemistry on the go. David Evans, Pieder Caduff. 14-CINF, fall 2011.
  13. Beyond the journal: Innovation in 21st century publishing. Martin Tanke, Rafael Sidi, David Evans, Philippe Terheggen. 22-CINF, fall 2011.

Technical Program

Herman Skolnik Award Symposium 2016

Honoring Stephen Bryant and Evan Bolton

A report by Wendy Warr (wendy@warr.com) for
the ACS CINF Chemical Information Bulletin

Introduction

Stephen Bryant and Evan Bolton were selected to receive the 2016 Herman Skolnik Award for their work on developing, maintaining, and expanding the Web-based National Center for Biotechnology Information (NCBI) PubChem database, and related software capabilities and analytical tools, to enhance the scientific discovery process. NCBI is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). A summary of Steve and Evan’s achievements has been published in the Chemical Information Bulletin. They were invited to present an award symposium at the Fall 2016 ACS National Meeting in Philadelphia, PA. They invited twelve speakers.

Herman Skolnik Symposium Presenters

After the award address, Rachelle Bienstock, chair of the ACS Division of Chemical Information, formally presented the Herman Skolnik Award to Evan and Steve:

Evan Bolton and Steve Bryant: The 2016 Herman Skolnik Awardees

Left to Right: Rachelle Bienstock, Evan Bolton, Steve Bryant

Developing databases and standards in chemistry

Steve Heller

Steve Heller was the first speaker, with an amusing scene-setting talk. He admitted that his secret in getting to where he is now was “luck, luck, luck”. He disliked chemistry lab work; he was at the right place at the right time with the right people; he worked with supportive people; and he planned for who would take over the work next. If the problem were just technology, someone would have solved it already. The real problem is always cultural and political, not technical. Steve had the good luck to be at NIH to collaborate with Hank Fales and Bill Milne; at the Environmental Protection Agency (EPA) with Morris Yaguda, when EPA started using mass spectrometry to identify pollutants; at the National Institute of Standards and Technology (NIST) with Steve Stein, when CAS stopped providing Registry Numbers to the NIST Mass Spectrometry database; and to be retiring just when Ted Becker and Alan McNaught thought that the International Union of Pure and Applied Chemistry (IUPAC) needed to move into the 21st century of chemical structure representation.

The NIH/EPA/NIST mass spectrometry database1,2 originated at MIT (with Klaus Biemann), and was run at NIH in the 1970s using a modification of Richard Feldmann’s search software. Control moved to EPA, and eventually to NIST in the 1980s. NIST was the right home for the database: NIST now collects a few million dollars a year in mass spectrometry database royalties. The NIH/EPA Chemical Information System (CIS)3 was a collection of chemical structures with links to various databases supporting environmental and scientific needs. It also had a number of analysis and prediction programs. All the databases had CAS Registry Numbers4 as their link. The CIS worked for a number for years, but never had the full support of the government or of ACS. It died in the mid-1980s; it was a bit ahead of its time.

Steve’s next example of luck dates back to November 1999 when he and Steve Stein seeded the idea of a chemical identifier. The right people in this case were the IUPAC International Chemical Identifier (InChI) team: Steve himself, Alan McNaught, Igor Pletnev, Steve Stein, and Dmitrii Tchekhovskoi. InChI5 was developed as a freely available, non-proprietary identifier for chemical substances that can be used in printed and electronic data sources, thus enabling easier linking of data compilations, and unambiguous identification of chemical substances. It is a machine-readable string of symbols which enables a computer to represent a compound in a completely unequivocal manner. The InChI algorithm normalizes chemical structures and includes a “standardized” InChI, and the hashed form called the InChIKey. InChI is easy to generate, expressive, unambiguous, and unique and it does not require a centralized operation. It enables structures to be searched by Internet search engines using the InChIKey.

InChI is not a replacement for any existing internal structure representations, but an addition to them. Its value is in finding and linking information. The proof of its success is in its widespread adoption.6 All the major structure drawing programs have incorporated the InChI algorithm in their products. There are millions of InChIs in large chemical databases. Regardless of controversies and differing opinions, InChI has been more widely adopted than SMILES. Currently, the InChI algorithm can handle neutral and ionic organic molecules, radicals, and some inorganic, organometallic, and coordination compounds. Steps to expand it to handle more complex chemical structures are underway, under the auspices of the InChI Trust.

Finally, Steve had the luck to join the PubChem Advisory Board, and worked with the right people, Steve Bryant and Evan Bolton. The database now contains nearly 92 million compounds, 223 million substances, and 1.2 million bioassays, and related data and publications. More than 100,000 searches are carried out every day, by 1.6 million unique users in a month. The success of PubChem, like that of InChI, is measured by its widespread use.

Two decades of open chemical data at the Developmental Therapeutics Program (DTP) at the National Cancer Institute (NCI)

Daniel Zaharevitz

The talk by Daniel Zaharevitz of NCI also covered freely available chemical and biological data. A history of DTP/NCI was posted on the Web on the 50th anniversary of the Cancer Chemotherapy National Service Center (CCNSC), which was set up in 1955. Until 1990, transplantable mouse tumors were used and gram quantities of test substances were needed. After that, human tumor cell lines in culture (the “NCI-60” cell lines) were used and only milligram quantities of test substances were needed.

The philosophy behind the National Chemotherapy Program7 was one of hundreds of independent investigators who were not required to collaborate. Indeed, over the last ten years, 42,301 compounds have been submitted from 1,477 different groups. Consequently data and decision making have been compartmentalized, and data systems development has reflected this compartmentalization. There was little pressure to apply any standardization.

From the 1970s until 2000 the Drug Information System was part of the CIS Structure and Nomenclature Search System (SANSS). Since 2000 there has been a Web interface for compound submission, accepting structures in only molfile format. Before 1994 there was no policy for making chemical structures publicly accessible. Data release was avoided if possible because of the costs and difficulties involved, and because there was no perceived advantage. In 1994, we made 127,000 structures for which there was a CAS REGISTRY Number available via FTP, after SANSS connection tables had been converted to molfiles, and CORINA had been used to generate 3D coordinates. Since 2000, molfiles have been extracted from a newer internal system, and structures are released about once a year on a Web page. In June 2016 there were 284,176 open NCI structures, but there are many versions of “NCI structures” around, including multiple depositions in PubChem.

DTP compound submissions are now performed online. The submitter must register as a user and the submission must include structures, which are subjected to consistency checks (with the Chemistry Development Kit, CDK), and stereochemistry consistency checks (with InChI). A material transfer and screening agreement is signed electronically, and, nowadays, the confidentiality period is limited to three years. Submitters are given access to screening results and to COMPARE analysis. Researchers can request samples or plated sets from a collection of about 100,000 compounds, if they submit a material transfer agreement electronically, and pay for shipping.

There is no science without communication, including communication with a more general audience, as well as with those immediately involved. Despite the barriers to widespread communication, it is important to do something. Note also that good communication of data is hard work, and attention to detail is critical.

The earliest plans8 for PubChem recognized the need for significant resources to store and disseminate data. NLM was a natural choice for this function, and Steve Bryant was brought in early in the implementation process. Evan Bolton came in when the nuts and bolts implementation started. When PubChem went live, about a third of the structures and all of the biological data were from DTP. In less than 20 years the world of open chemical structures has gone from about 100,000 compounds in a single file to millions of structures being freely available in a searchable database.

In the future, more applications will be built based on PubChem data. “Chemical awareness” should be integrated into the publication process, especially peer review. In future, data consistency will be improved, and we will be more able to know the context for structures and data, and to find out which similar structures are known and which assays have been run on them. Researchers will use predictive tools more as a measure of surprise than as a substitute for measurements.

Using InChI to manage data

Peter Linstrom

To explain the usefulness of InChI, Peter Linstrom of NIST started by defining a problem as follows. “I have data about a substance and my colleague has data about a substance. Are these substances the same so that we can combine the data about them? Are we talking about well-defined molecular species?” The term “well-defined” can mean different things to different people. A well-drawn structure can precisely identify a molecule, but there are issues with formats and drawing conventions. Drawing a structure from a name by itself does not improve identification because additional information is often required to improve specificity. Moreover, sometimes we do not have a “well-defined” molecular structure. This is a general problem which cannot be solved for a significant portion of historical data.

nChI can help because it identifies a molecule based on its structure, and it allows us to ask whether two “well-defined” structures are the same. Also, InChI has a layered design allowing matches to related compounds such as stereoisomers, geometric isomers, and “isotopologues” (compounds that differ only in isotopic composition). In addition, with a little string manipulation we can ask even more questions.

An InChI is hierarchically layered. There are several InChI layer types, each representing a different class of structural information. These include: formula, connectivity, geometric and stereo isomerization, isotopic composition, charge, and protonation state layers. Layers are separated by a forward slash. Consider the two isomers of carvone, the InChIs of which differ only in the stereochemical layer (emboldened in the following). One isomer smells of spearmint and has

InChI=1S/C10H14O/c1-7(2)9-5-4-8(3)10(11)6-9/h4,9H,1,5-6H2,2-3H3/t9-/m0/s1

The other smells of caraway and has

InChI=1S/C10H14O/c1-7(2)9-5-4-8(3)10(11)6-9/h4,9H,1,5-6H2,2-3H3/t9-/m1/s1

(The “1S” at the beginning of each string indicates a standard InChI.)

The NIST Chemistry WebBook provides an example of the use of InChI. It combines data from many sources. It is over 19 years old and there are many problems with identifiers from older datasets. Historically, CAS Registry Numbers and other accession numbers were used in matching species, but there were many problems (even the check sums in CAS Registry Numbers were wrong in one case out of ten). Newer data often come with structures and InChI can be used. Moreover, drawing structures can force additional analysis. Nevertheless there are still legacy data with incomplete identifiers (e.g., for stereoisomers and isoanalogues). An example is the species labeled as “gamma-elemene,” where 81 chromatographic retention values in the literature were analyzed, and found to correspond to five different chemical species (with similar mass spectra).9

PubChem is a great resource. Apart from the features that we all know and love, there are lesser known features that help disambiguate species. The substance database, separate from the compound database, records the mapping of names to structures by the various people who submitted the data. Partial InChIKey search allows compounds with the same composition and connectivity, but different information in further InChI layers, to be retrieved.

Voltaire said that perfect is the enemy of good. We cannot fix all chemical structure errors without abandoning valuable historical data, and newer data also are not immune to identification problems, but we can make progress where resources permit. There are tools such as InChI and PubChem that can help, but not solve the entire problem. “Zero Defects” was an industrial quality management approach championed in the 1960s and 1970 which was criticized as an exhortation to do something that may not be possible. Total Quality Management, the approach championed by W. Edwards Deming, is based on continuous improvement of systems, driven by measurement. It has been dramatically successful and has succeeded where “Zero Defects” failed. The transition from “econoboxes” in the early 1970s to modern, reliable compact cars did not happen overnight. Similarly, our chemical structure tools are getting better, but we still have a long way to go.

Open chemistry resources provided by the NCI computer-aided drug design (CADD) group

Marc Nicklaus

NCI has a 60-year history of cheminformatics, starting with the drug development program authorized by Congress in 1955, said Marc Nicklaus, the leader of the NCI CADD group. By 1963, “it became clear the system must track not just individual chemical compounds, but distinct samples of chemical compounds…magnifying the data management problem considerably”.10 This was a direct antecedent of the concept of separate PubChem Substance and Compound databases. The open NCI structure database was made publicly available in 1994 (see the talk by Daniel Zaharevitz, summarized above). The NCI Database Browser was, in 1998, the first public Web GUI for a large, small-molecule database, with advanced capabilities such as full substructure search. It arose from a collaboration between NCI and Wolf-Dietrich Ihlenfeldt at the University of Erlangen-Nürnberg. The Enhanced NCI Database Browser has 250,250 structure records and about 60 million data points: mostly Prediction of Activity Spectra for Substances (PASS)11 predictions. Sophisticated search and output options are available.

The CACTUS Web Server offers many services, tools, and downloadable datasets centered on small molecules. Apart from the database browser, Marc singled out the Chemical Structure Lookup Service (CSLS, pronounced “sizzles”), the Optical Structure Recognition Application (OSRA), and the Chemical Identifier Resolver (CIR). Developed by Igor Filippov in 2006, CSLS is a “phone book for chemical structures”, linking 74 million indexed structures (46 million unique structures) to over 100 databases. OSRA, developed by Igor Filippov in 2007, converts graphical representations of chemical structures in journal articles, patents, or other text, into SMILES. CIR, developed by Markus Sitzmann in 2009, converts one structure identifier or representation into another. Its workflow involves lookups in the CADD group’s chemical structure database (CSDB). CSDB contains about 121 million structure records for 85 million unique structures, in 140 databases, including PubChem, and the Sigma Aldrich iResearch Library.

It might be thought that the many large databases now available for CADD are enough, but perhaps we need a new approach. Perhaps we should not design a new molecule, and then ask how it can be made. Instead, we could look into what can be made reliably and cheaply, and then search only among those molecules for new, potentially bioactive compounds, using the usual CADD approaches.

Therefore, Marc’s team has begun building the Synthetically Accessible Virtual Inventory (SAVI), using a set of predictive and richly annotated rules (transforms) from Lhasa Limited and Lhasa LLC, a set of reliably available and inexpensive starting materials from MilliporeSigma, and the cheminformatics engine CACTVS from Xemistry GmbH.

A parser has been implemented in CACTVS for the CHMTRN/PATRAN retrosynthetic transforms (of which there are more than 2,300), and it has been adapted for the forward-synthetic SAVI approach. Fourteen transforms have been implemented and used in production runs so far. Among the 3.3 million building blocks in sets from Sigma-Aldrich, and other catalogs, 377,484 compounds were identified as highly available, and in their majority annotated with pricing and availability data.

Using 11 “productive” transforms in one-step reactions, a sample subset of about 610,000 compounds was generated in summer 2015, and made available for download. It is annotated with (but not yet filtered by) 54 compound, reaction, and typical drug design properties. As of August 2016, 238 million products have been generated; it is estimated that there might 280 million when the runs are completed. Overlap with PubChem is minimal: more than 99% of the compounds appear to be novel.

Eleven new transforms are being added, and in future, products will be steered toward interesting novel rings and scaffolds. The product files will be offered for download. Multi-step reactions will be investigated in future, and a Web GUI with extensive search capabilities will be developed. Topics of the ongoing work are how the predicted synthetic routes will work in actual syntheses, what filter rate will be needed for truly “interesting” compounds, and how the editing and adding of transforms can be made as easy as possible.

Evolution of open chemical information

Valery Tkachenko

Valery Tkachenko of the Royal Society of Chemistry (RSC) continued the theme of open data in chemistry. Everything changed in 1992 with the arrival of the World Wide Web. Later, PubChem changed the world of chemical information. ChemSpider, a structure-centric hub for Web searching now contains 57 million compounds chemicals from over 500 different sources, and deposition of data is ongoing. It differs from PubChem in that curation and annotation are crowdsourced. ChemSpider has analytical data, text and literature references, and data on compounds and reactions. NextMove Software’s text mining software has been used to analyze reactions from the RSC archive of journal articles, output CML, and break down each procedure summary into steps.

We are moving into the world of the Internet of Things and phones with modular, replaceable parts. Gartner has identified the Top 10 Strategic Technology Trends for 2016. Our world is hyperconnected, and connections require standards. The IUPAC “color books” took years to write, and thus data quality issues arose. Evan Bolton has referred to the proliferation of errors in public and private databases as “robochemistry”. Manual curation of huge databases is not feasible but automatic quality control systems such as RSC’s Chemistry Validation and Standardization Platform (CSVP) can be developed. CVSP allows users to upload chemical structure files which are then validated, and optionally standardized, in preparation for publication or submission to a chemical database. About 200 rules have been encoded and expressed as XML, to check for errors in, for example, the depiction of stereochemistry. The community can amend these rules. The structure’s relationship to names, SMILES, and other identifiers also needs checking.

Knowledge from the past is used to derive wisdom. The Open PHACTS discovery platform has been developed to reduce barriers to drug discovery in businesses and academia. It contains multiple data sources, integrated and linked together so that users can easily see the relationships between compounds, targets, pathways, diseases and tissues. The platform has been used to answer complex questions in drug discovery. It was built in collaboration with a large consortium of organizations involved in drug discovery, and is founded on Semantic Web and linked data principles. RSC developed the chemical data handling software for OpenPHACTS.

A high percentage of raw data is lost in the science data publishing workflow. Horizon 2020 is a very large EU research and innovation program. It already mandates open access to all scientific publications; from 2017, research data are open by default, with possibilities to opt out. In the era of Uber, transportation is now a commodity. Will scientific data become a commodity by 2020? How will publishers cope? Authorities have moved from centralized to decentralized to distributed, as we have moved into the hyperconnected world. We are on a verge of a new technical revolution; RSC is excited, and is ready to ride high on the wave of data science developments.

Open chemical information at the European Bioinformatics Institute

Christoph Steinbeck

Christoph Steinbeck of EMBL-EBI looked back to his early years as a natural products chemist, and recounted what has happened since the old days of access to Beilstein and CAS in 1992. There were no open source software libraries for cheminformatics in those days, but there were computer-assisted structure elucidation (CASE) systems.12,13 Christoph sold his CASE software to Bruker and it got buried. He learned that successful science requires data and software to be free and open.

So in 2000 he and his co-workers began work on an open source library for bioinformatics, cheminformatics, and computational chemistry written in Java: the Chemistry Development Kit (CDK).14,15,16 Sixteen years later, it is a well-established, mature code base (564,171 lines of code), maintained by a large development team; 16,521 commits have been made by 115 contributors.

Christoph’s database years really began when he moved to EMBL-EBI, although his open database NMRShiftDB17,18 was written earlier. It contains 50,000 compounds and their spectra. Christoph’s current research interest is documenting the metabolomes of all species on the planet. To coin Donald Rumsfeld’s phraseology, “known knowns” can be found in databases, “known unknowns” can be found using NMRShiftDB, but “unknown unknowns” are dark matter. Too many metabolomes are not known.

EMBL-EBI has many important databases, Chemical Entities of Biological Interest (ChEBI) and ChEMBL being just two of them. ChEBI is a freely available dictionary of molecular entities focused on small chemical compounds. The molecular entities are either products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents or children are specified. ChEMBL is an open data resource of binding, functional, and ADMET bioactivity data for a large number of druglike compounds.19 The types of data reported in PubChem and ChEMBL are distinct and complementary. To maximize the utility of the two datasets EMBL-EBI has worked with the PubChem group to develop a data exchange mechanism.

It is estimated that there are about 8.7 million eukaryotic species on earth, of which 1.2 million have been identified and classified. Three or four thousand complete species genomes have been sequenced. What about completed metabolomes? Steinbeck’s team has argued that the time is now right to focus intensively on model organism metabolomes.20 They have proposed a grand challenge to identify and map all metabolites onto metabolic pathways, to develop quantitative metabolic models for model organisms, and to relate organism metabolic pathways within the context of evolutionary metabolomics.

Species metabolomes are now being assembled through data sharing in metabolomics. MetaboLights21,22,23 is an EMBL-EBI database for metabolomics experiments and derived information. It is cross-species and cross-technique, and covers metabolite structures and their reference spectra as well as their biological roles, locations and concentrations, and experimental data from metabolic experiments. Christoph’s team has reported one dataset24 in the data publication Scientific Data.

History and the future of tools and software components for working with public chemistry data

Wolf-Dietrich Ihlenfeldt

Wolf-Dietrich Ihlenfeldt’s CACTVS software suite has been an integral component of the PubChem software since the beginning. It handles structure searching, 2D structure layout and image rendering, submission checking, property computation, hashcodes, and a sketcher application. CACTVS is not used only in PubChem. The CACTVS scripting toolkit (solutions in Python or Tcl) is free for academia, and can be used in database cartridges and in KNIME nodes. It can give access to more than 50 Internet chemistry data sources.

One of the reasons CACTVS works particularly well with PubChem is PubChem’s forward-looking design, including the PUG, Entrez E-utilities and REST interfaces which make it possible to access structured data by software without resorting to HTML page scraping. Additionally, CACTVS has some inherent advantages in performing these tasks: much of the PubChem engine is based on CACTVS, and CACTVS understands the native PubChem ASN.1 data formats for structures and assays, so it can process the original data content of PubChem, without format conversion losses. It is also possible to send native toolkit structure encodings directly to the PubChem query engine, which opens up query functionality which cannot be expressed by any standard structure query exchange formats, such as SMARTS or Query molfiles (which are, of course, supported by the query interface). An example of such advanced query functionality which will be made accessible on the PubChem side in the near future is querying for ring attributes which are not atom attributes, such as the overall ring atom formula, substituent counts and classes, and similarly also for ring systems, and even user-defined atom groups.

PubChem uses CACTVS hashcoding as a primary key (one-to-one mapping of hashcode to the PubChem compound identifier, called a CID); for mapping between CID and PubChem substance identifier (SID), for related compound links, and for a similarity boost scheme. The hashcodes are currently 64-bit pseudo-random numbers, but soon will be 128-bit. Computation is based on configuration-dependent atom seeds, and neighbor-coupled, atom-centric xor-feedback shift registers. The hashcodes are fast to compute: faster than SMILES and much faster than InChI. They are of constant length, and are independent of ring set, aromaticity system, and formal charge localization. Database performance is outstanding: identity is looked up on a fully indexed database field. PubChem variants of the codes include with or without stereochemistry, and with or without isotope labels, on the submitted structure, standardized structure, or canonical tautomer, but there are many more possible seed variants not used in PubChem.

Hashcodes link structures to closely related compounds which agree at least in fragment connectivity. Wolf-Dietrich is exploring more advanced options, hashing structure relationships relevant to medicinal chemistry, for example, linking structures with similar ring systems and substituent fragments at sites of interest, and using various fragment and generalized hashes. He calls this PogoChem and a proof-of-concept is available. Users simply click on a structure and query results appear instantaneously.

In one option, ring system variants are produced by generalizing ring system atoms. There is one hashcode per ring system. Ring system size and heteroatom count are stored for the similarity score. In another option ring systems or bridges are resized by excising unsubstituted atoms between substitution or fusion points, individually or in combination. This time there are from one to ten hashcodes per ring system. It is also possible to cut bonds, and compute a hash for the fragments. These are stored with bond information and basic fragment statistics. This leads to about 50 topology-filtered hashcodes per compound. Storing five billion records, at 56 bytes per record is no problem.

Wolf-Dietrich concluded by saying that PubChem is a great resource, in the hands of a capable team. It is still evolving at a fast pace, and it continues to inspire new ideas of how to access and analyze its contents.

PubChem a resource for cognitive computing

Stephen Boyer

Stephen Boyer of the IBM Almaden Research Center has collaborated with OntoChem, the University of Alberta, NIH, EMBL-EBI, and others on a chemical ontology approach to addressing drug discovery. Their work with chemical ontologies identifies a family of molecular attributes that define a molecule and explores how those attributes might be used for identifying functional attributes based on molecules with similar structure activity. An example of their use of molecular attributes can be seen below, illustrated by assignments within the target molecule (Azulfidine) of benzoic acid, carboxylic acid, carbonyl compound, phenol, azobenzene, azo compound, sulfone, sulfonamide, pyridine, benzene, and hydroxyl groups:

Azulfidine

In this example of Azulfidine, assignments are also made for functional attributes, for example, “it is used for” the treatment of Crohn’s disease, rheumatoid arthritis, and ulcerative colitis.

The process begins by converting a compound name to SMILES. From the SMILES, molecular attributes (also known as molecular descriptors or chemical labels) such as “hydroxy” or “benzoic” or “phenyl” are generated. Steve’s team submitted about 1.4 million SMILES strings from ChEMBL to two different auto-classification systems to make a ChEMBL ontology database with two computer-generated chemical ontologies: ClassyFire (written by David Wishart of the University of Alberta and Ph.D. student Yannick Djoumbou Feunang) and OntoChem (Lutz Weber).

Steve then used this database in a multi-step process. He queried it for a gene or target of interest (“XYZ”); created a set of candidate compounds with reported activity for XYZ; refined the candidate set to create a training set of compounds (e.g., with EC50 <30); scored and ranked the molecular attributes; and then used those results to query the ChEMBL database minus the candidate set and the training set. He thus identified 100 compounds with potential activity, exclusive of the candidate or training sets.

Steve reported two experiments. The first concerned MDM2 (mouse double minute 2 homologue), a protein that in humans is encoded by the MDM2 gene. The key target of MDM2 is the p53 tumor suppressor. Steve carried out a sample analysis, using the two chemical ontologies, to predict compounds that may have MDM2 activity, scored with a chi-squared test. In ChEMBL, 20,558 molecules have activity for MDM2, but only 27 of these have IC50 < 30 nM. He compared the top 100 compounds identified by ClassyFire with the top 100 compounds identified by OntoChem, generated with the parameters of the top 10 labels, assay minimum = 30, and corpus count cut off = 300,000. He found 57 predicted compounds in common between the two ontologies. Not having a laboratory, he was unable to test any of these compounds, but he did find structure activity data in numerous patents that had 26 compounds with reported assay data for MDM2, and some of them matched compounds in his set of 57 potential actives.

Steve’s second example concerned SGLT2 (sodium/glucose cotransporter 2) inhibitors that reduce blood glucose levels and have potential use in the treatment of type II diabetes. Thirty compounds with assay data for SGLT2 were derived from the ChEMBL database, but only 12 had EC50 < 10 nM. Using these 12 molecules as a training set, the team identified several new molecules as possibly having SGLT2 activity. A search of patents and the scientific literature confirmed that several of the identified compounds had reported significant activity as SGLT2 inhibitors.

Steve closed with some final thoughts on innovation. Steven Johnson25 coined the term “hummingbird effect” to describe how an innovation in one field ends up triggering changes that seem to belong to a different domain altogether. Innovations arise from the “adjacent possible” (a term Johnson borrows from the theoretical biologist Stuart Kauffman): you get railroads when it is railroading time, and not before, even if some prescient inventor sketches them out far in advance, and they open up all kinds of new possibilities.

SPL and openFDA resources of open substance data

Yulia Borodina

Yulia Borodina is in the Office of Health Informatics at the U.S. Food and Drug Administration (FDA/OHI). Her talk concerned “bulk” open data. Machine-readable data are extracted from text or legacy databases, harmonized, and coded in a machine readable format. To provide data interoperability you need a data standard, and then you harmonize the data according to the standard, and ensure that the standard is publicly available (and, ideally, freely available). Unfortunately, you may have to wait 50 years until the community adopts the standard. To support data reuse you can provide direct downloads and Application Progamming Interfaces (APIs), and let the user decide how to select and analyze the data.

Structured Product Labeling (SPL) is a document markup standard approved by Health Level Seven (HL7) and adopted by FDA as a mechanism for exchanging product and facility information. It covers health informatics, cheminformatics, and bioinformatics. It has many applications: Yulia concentrated on substances. SPL is a universal (not data-specific) exchange standard, with reusable data types, coded data elements, and data-specific validation procedures. Drug manufacturers and distributors submit SPL to FDA, and FDA makes a product SPL file with substance, pharm class, billing unit, and product concept index files. Data are output to the FDA Online Label Repository, the National Library of Medicine’s DailyMed website, and the public data warehouse, openFDA.

Substances in products can be small molecules, proteins, nucleic acids, polymers, organisms, parts of organisms, or mixtures. Definitions of non-confidential substances from the FDA Substance Registration System are available in SPL format, with unique ingredient identifiers (UNII). The data for over 50,000 chemical substances, and over 5,000 biological ones, are compliant with the Identification of Medicinal Products (ISO IDMP 11238) standard, and are available from DailyMed and openFDA. The IDMP standard defines “what” (e.g., proteins are to be defined by sequence) and the SPL standard defines “how” (e.g., UNII, molfile, InChI, and InChIKey for small molecules). Yulia showed the content of some SPL Substance Index Files for various types of substance. SPL data have been integrated into PubChem.

The concept of openFDA is to index high-value, high-priority, and scalable public datasets (e.g., medical device reports, drug adverse events, and food recall enforcement reports), to format and document the data in developer- and consumer-friendly standards, and to make those data available via a public-access portal that enables developers to use them in applications quickly and easily. openFDA allows direct downloads and APIs. Substance and Pharm Class SPL index files can be downloaded, and some substance SPL fields associated with a product label are available in JavaScript Object Notation (JSON) format via API. openFDA allows users to carry out statistical applications around adverse events, such as the likelihood ratio test-based method for signal detection in drug classes. Interactive open-source applications available on https://open.fda.gov/analytics/ demonstrate how openFDA APIs can be used for epidemiological research, combined with powerful statistical tools built by the openFDA community.

Building a network of interoperable and independently produced linked and open biomedical data

Michel Dumontier

Michel Dumontier of Stanford University, and his co-workers, develop tools and methods to represent, store, publish, integrate, query, and reuse biomedical data, software, and ontologies, with an emphasis on reproducible discovery, which necessitates data science tools and methods, and community standards. Data need to be “FAIR”,26 that is, findable, accessible, interoperable, and reusable.

The Semantic Web is the new global web of knowledge: it has standards for publishing, sharing and querying facts, expert knowledge and services, and a scalable approach for the discovery of independently formulated and distributed knowledge. Linked Data offers a solid foundation for FAIR data: entities are identified using globally unique identifiers (URIs); entity descriptions are represented with a standardized language (resource description framework, RDF); data can be retrieved using a universal protocol (HTTP); and entities can be linked together to increase interoperability.

Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF: it transforms silos of life science data into a globally distributed network of linked data for biological knowledge discovery. It shows how datasets are connected together. Queries can be federated across private and public Protocol and RDF Query Language (SPARQL) databases. A graph-like representation is amenable to finding mismatches and discovering new links.27 EbolaKB28 is an example using linked data and software.

In current, unpublished research on network analysis and discovery, Michel’s team is examining whether they can implement an open version of PREDICT29 using linked data. HyQue,30,31 for hypothesis validation, is a platform for knowledge discovery that uses data retrieval coupled with automated reasoning to validate scientific hypotheses. It builds on semantic technologies to provide access to linked data, ontologies, and Semantic Web services, uses positive and negative findings, captures provenance, and weighs evidence according to context. It has been used to find aging genes in nematodes, and to assess cardiotoxicity of tyrosine kinase inhibitors

The network of linked data goes beyond biology. Michel displayed a network from about 2007, and the linking open data cloud diagram as of August 2014, to show how rapid has been the expansion over domains:

Linked Data RDFLinked Data Cloud

EMBL-EBI have been producing RDF for two years, PubChemRDF was released more than two years ago, and NLM has released a beta version of Medical Subject Headings (MeSH) RDF linked data, but lack of coordination makes Linked Open Data chaotic and unwieldy. There is no shortage of vocabularies, ontologies and community-based standards. The National Center for Biomedical Ontology (NCBO) manages a repository of all publicly available biomedical ontologies and terminologies. The NCBO BioPortal resource makes these ontologies and terminologies available via a Web browser and Web Services. The NCBO Annotator service takes as input natural-language text and returns as output ontology terms to which the text refers. The Center for Extended Data Annotation and Retrieval (CEDAR) project relies on the BioPortal ontology repository and the NCBO Annotator. CEDAR is making data submission smarter and faster, so biomedical researchers and analysts create and use better metadata. Through better interfaces, terminology, metadata practices, and analytics, CEDAR optimizes the metadata pathway from provider to end user.

PubChem engaged the community to reuse and extend existing vocabularies. Semanticscience Ontology (SIO) is an effective upper level ontology, with over 1,500 classes and 207 object properties. Chemical Information Ontology (CHEMINF)32 is a collaborative ontology that distinguishes algorithmic, or procedural information from declarative, or factual information, and renders of particular importance the annotation of provenance to calculated data.

Large scale publishing on the Web across biomedical datatypes is possible. Hubs such as NCBI and EMBL-EBI now integrate data, but there is need for global coordination on all data types. Standard vocabularies must to be open, freely accessible, and demonstrably reused. Worldwide data integration formats such as RDF can improve linking of data, and some toolkits that are easier to deploy will provide standards-compliant, linked data. The development and use of standards by PubChem, and others, brings us closer to an interoperability ideal, but much more work is needed to support computational discovery in a reproducible manner.

Chemical structure representation in PubChem

Roger Sayle

A unique and invaluable feature of the architecture of PubChem is the distinction between the deposited structures (substances) and the normalized structures (compounds), and the retention of both. This feature allowed PubChem to avoid the early mistakes of CAS, said Roger Sayle of NextMove Software. PubChem Substance contains about 209.6 million structures; PubChem Compound contains about 91.7 million structures. The PubChem standardization service aims to determine when two chemical structures are the same.

Consider, for example, implicit and explicit hydrogens. Ethanol (PubChem CID 702) has been deposited 1569 times with six different explicit atom counts, and thus, six different SIDs. All have the same SMILES and InChI. Nitrobenzene (PubChem CID 7416) has been deposited as 164 distinct substance depositions, with five SIDs, two with molecular formula C6H5NO2, and the others with extra hydrogens: C6H6NO2+, C6H6NO2-, and C6H7NO2. To complicate matters, BIOVIA 2017 changed the interpretation of CTfiles (the default valences of some neutral main group elements have changed); this affects 342,689 SIDs and 213,097 CIDs. PubChem is inconsistent on protonation, but generally protonation state is preserved.

A major challenge in chemical databases is aromaticity: two compounds that differ in Kekulé forms are the same molecule. A significant novel innovation in cheminformatics was Evan Bolton’s development of a “canonical” Kekulé SMILES form of a molecule. This enabled PubChem to avoid the early mistakes of Daylight Chemical Information Systems. Different chemistry toolkits (and chemists) differ in opinion on which ring systems are aromatic and which are not, hence PubChem’s wish to remain “neutral” by only providing non-aromatic SMILES. Unfortunately, Evan’s algorithm aromatizes all conjugated cycles, and not just those associated with the smallest set of smallest rings, a computationally demanding requirement. PubChem does not restrict aromaticity to 4n+2 Hückel aromaticity; thus conjugated ring systems such as pentalene are deemed aromatic.

Tautomers are normalized. Thus 4-(phenylazo)-1-naphthalenol (CAS RN 3651-02-3), a case of classic tautomerism, has only one CID (5355205), but there are two InChIs, one for each tautomer. Unfortunately not all tautomers are handled so well: four tautomers of this molecule are recorded:

Image

PubChem follows InChI in breaking bonds to metals. It currently handles 109 of the 118 elements in the periodic table. PubChem registration confirms that any specified isotope has been observed experimentally. Hence 7CH4 is rejected, but 8CH4 (which has an exceptionally short half-life) is allowed. Another quirk is that PubChem does not normalize mononuclidic isotopes. Hence fluoromethane has CID 11638, while fluoromethane with 19F has CID 58338844. PubChem rejects chlorine dioxide, and carbide anions, but it accepts disulfur dioxide (O=S=S=O) which is stable for only a few seconds.

It is one of the innovations of PubChem that it explicitly stores relationships (such as having similar 3D shape) in the database. Given a CID, you can find all similar CIDs based on Tanimoto similarity, for example, but you can also find all the tautomeric forms provided by depositors by following the links from CID to SID. Likewise, there are internal links (backwards and forwards) between mixtures and their components, and between isotopes of a compound, and between enantiomers of a compound.

PubChem allows depositors to specify advanced representations of molecular structures such as inorganics and organometallics via SD tags. Quadruple, dative, complex, and ionic bonds can be specified with the non-standard bond option; hydrogen, resonance, bold, and Fischer bonds, and close contacts can be specified with the bond annotations option. Relatively few depositors make use of these options.

Roger concluded by saying that PubChem represents the current state-of-the-art in chemical structure representation.33,34,35 Under the surface, unseen to most users, are many technical and scientific innovations that have enabled PubChem to scale to contain nearly 100 million compounds. From simple design decisions such as the substance versus compound distinction, to breakthroughs such as canonical Kekulé SMILEs, the architecture of PubChem contains a treasure trove of cheminformatics innovations, covering normalization, tautomers, mixtures, 2D fingerprints and similarity, substructure search, biopolymers, text mining, and much more.

iRAMP and PubChem: of the people, for the people

Leah McEwen

Leah McEwen of Cornell University gave a talk on synergies between chemical safety and information literacy skills. In 2015, the ACS Committee on Professional Training (CPT) released an updated version of Undergraduate Professional Education in Chemistry: ACS Guidelines and Evaluation Procedures for Bachelor’s Degree Programs. These guidelines include a description of six skill sets that undergraduate chemistry majors should develop, two of them being chemical literature and information management skills, and laboratory safety skills. Laboratory safety skills can be viewed as a specific “use case” of information literacy skills.36 The CPT safety guidelines describe a RAMP model37 to organize safety information in a consistent way that is transferable, scalable, and sustainable as laboratory work evolves. RAMP is an acronym for the initial letters of the four core principles of safety: Recognizing hazards, Assessing risks of hazards, Minimizing hazards, and Preparing for emergencies. The iRAMP project was begun in 2014 by the ACS Divisions of Chemical Information (CINF), and Chemical Health and Safety (CHAS).36,38 The “i” of iRAMP signifies the iterative nature of the chemical safety decision cycle:

Safety Data Sheets

The research laboratory environment is complex, involving chemicals, biological agents, and radioactive materials, with five levels of Occupational Safety and Health Administration (OSHA) controls. The information environment is also very complex. Questions that safety professionals need to ask have been listed by the ACS Committee on Chemical Safety, Safety Advisory Panel. Data supporting chemical risk assessment are detailed in a National Research Council (NRC) work39, but there are many challenges for the information community. Many chemicals lack critical data. The diversity of substance forms that impact chemical reactivity is broad. Data are scattered across many sources. Reporting standards are variable and most data are not machine-readable.

The research practices described by the Association of College and Research Libraries, a division of the American Library Association, in Framework for Information Literacy for Higher Education reflect a process of iterative critical inquiry that can be used to address these questions about the chemical information available to be used in risk assessment, and the most effective process for identifying, compiling, analyzing, and applying this.36

A PubChem Laboratory Chemical Safety Summary (LCSS) for a compound is based on the format described by the National Research Council (NRC).39 LCSS provides a convenient consolidated view of an open Internet search on chemical hazard information, with non-authoritative sources filtered out and available documentation on the context of each data point. It became clear that PubChem could help chemists fill out an NRC safety form. The University of California has produced a pilot mobile app, UC Chemicals, a cloud-based chemical inventory management tool, which allows tracking of containers using a barcoding system. Chemical and safety information, such as hazard codes and first aid, are automatically populated from PubChem and other sources.

There are, however, some key gaps that iRAMP must address. These include resolvable identifiers for mixtures; associating the Global Harmonization System (GHS) with supporting data (a sort of “Rule of Five” for hazards would be good to have, as most compounds have not been classified); mapping chemical concepts to process conditions; mapping procedures to chemical, equipment, and process hazards; and empirical data from incidents.

iRAMP aims to build a “flexibly structured ecosystem of data, workflow tools, and domain expertise, mapped to the essential commonalities of the use cases and content, connected by good information management practices”.38 PubChem enables reuse of data in applied contexts, based on open data, open mission, open process and open collaboration, for the public good. Together, iRAMP and PubChem can build an ecosystem, of the people, by the people, for the people.

Open chemical information: where now and how?

Evan Bolton

Evan Bolton gave the award address on behalf of both awardees. Many people think that cheminformatics is a solved problem. “Open” is now a popular adjective: open learning, open access, open data, open government, open source, and so on. “Open” was much less of an “in” word when PubChem was conceived. There is still little openness when it comes to scientific data. There is still a lot to be done in the open space. For example, openness is not widespread in drug discovery. We have to empower researchers with ready access to information so that they do not repeat work that has already been done.

PubChem is an open archive; the data are free, accessible, and downloadable. Information is uploaded by depositors, it is normalized and displayed, and it can then be downloaded by other researchers. Algorithms carry out the normalization, but sometimes they go wrong and can introduce ambiguity; later processing of this ambiguous data can result in data corruption or error. For example, chemical file format interconversion can be “lossy”, such as when converting from SDF to SMILES, where the coordinates are lost and stereo must be perceived by algorithms. Different software packages may “normalize” or convert a chemical structure in different ways. This variation produces tens of different representations of nitro groups and azides in PubChem.

Atom environments have to be standardized. Data clean-up approaches include structure standardization; consistency filtering (name-structure matching, and use of authoritative sources, and hand-curated black, gray, and white lists); chemical concepts (groupings of chemical names, setting a preferred concept for a given structure, and a preferred structure for a given concept); and cross-validation via text mining (to gather evidence to support the reported association of a chemical to other entities). A chemical structure may be represented in many different ways (tautomer and salt-form drawing variations are common, for example), and the chemical meaning of a substance may change with context (e.g., the solid form may involve a hydrate, which affects molecular weight when weighing out a substance to make a solution). The boiling point of benzene is both 176.2°F and 200-500°F in PubChem Compound; the first record is that for benzene, but the second is for coal tar oil (a crude form of benzene). There are many-to-many relationships between chemical concepts and chemical structures.

PubChem is successful because it is inclusive, free, robust, innovative, and helpful. If a chemical exists, you often find it. Evan singled out a few features of PubChem for particular mention. Substances are converted to compounds, but the original information is kept. There is clear provenance, so users can trace from whom the data came. Information is downloadable, and there are extensive programmatic interfaces. PubChem is constantly improved, can handle a lot of abuse, and is sustainable. The PubChem synonym classification was available first in RDF. It indicates the chemical name type, allows grouping of names, and can involve guess work. More authoritative name sources have been added. Most non-classified names are unhelpful (perhaps because of chemical name corruption, or chemical name fragments).

As more data are added, the scalability of PubChem is difficult to maintain. It is not uncommon to reach the limit of technology. For example, PubChem could no longer use SQL databases for some queries due to performance bottlenecks. After examination of noSQL technologies like Solr/Lucene, better approaches were determined. An example of this is PubChem’s structured data query (SDQ), which uses the Sphinx search engine to perform the query, but then fetches data from an SQL database. It is a query language with clear logic in concise format, communicating with a JSON object. It features a powerful search ability, a URL-accessible Common Gateway Interface (CGI), and easy application integration.

PubChem faces many challenges. One is growth: 50% of the resources of the project are needed just to keep scaling the system. Government mandates (like the current HTTPS-only edict) necessitate regular migrations. Data clean-up and error proliferation prevention require constant vigilance: the team uses existing technology where possible, but solutions do not always exist. They must be developed for PubChem to remain scalable.

Chemical structure databases have come a long way since the origins of computerization in the 1960s, and the rise of databases such as CAS REGISTRY and Beilstein in the 1970s. The 2010s are the era of large, open chemical databases of aggregated content, with RESTful programmatic access. These large open collections of tens of millions of chemical structures need methods to lock down the data without curation, otherwise non-curation combined with open exchange of data leads to error proliferation. Digital standards are needed to improve chemical data exchange and chemical data clean-up methods to prevent error proliferation. Close attention to provenance, and a set of clear definitions for chemical concepts, are also needed.

ACS CINF had a data summit at the spring 2016 meeting in San Diego. Ten half-day symposia were held over five days, with over 70 speakers, including experts from different related domains. The summit helped to identify informatics “pain points” for which we need to find solutions. The Research Data Alliance and IUPAC had a follow-up workshop in July at EPA, where a number of projects were discussed. One on chemical structure standardization education and outreach aims to help chemists and other stakeholders to understand the issues of chemical structure standardization. Another, updating IUPAC’s graphical representation guidelines, seeks to help chemists to understand the issues of chemical structure standardization, often apparent in chemical depiction. Other recommendations concern open chemical structure file formats, and best practices in normalizing chemical structures. There are plans to develop a small-scale ontology of chemical terms, based on terms in the IUPAC Orange Book as a case study. A project on the IUPAC Gold Book data structure is related to a current effort to extract the content and term identifiers, and convert them into a more accessible and machine-digestible format for increased usability. Finally, a scoping project on use cases for semantic chemical terminology applications will focus on researching the current chemical data transfer and communication landscape for potential applications of semantic terminology.

We are entering a new era: in the 2020s we will have large, extensively machine-curated, open collections, with clear provenance, and standard approaches to file formats and normalization, where errors do not proliferate, and links are cross-validated. Open knowledge bases will emerge that contain all open scientific knowledge that is computable (i.e., inferences can be drawn using natural language questions). By the 2030s machine-based inference will drive the majority of scientific questions, and efficiency of research will grow exponentially by harnessing “full” scientific knowledge.

In all, accurate computer interpretation of scientific information content is paramount. It needs to be at or above the level of the human scientist for this vision of the future to occur. It will be the great achievement of our generation to make this leap forward. Improved chemical information standards and uniform approaches will be critical for it to occur.

References

  1. Heller, S. R.; Fales, H. M.; Milne, G. W. A.; Heller, R. S.; McCormick, A.; Maxwell, D.C. Mass spectral search system. Biomed. Mass Spectrom. 1974, 1 (3), 207-8. DOI: 10.1002/bms.1200010313
  2. Heller, S. R.; Feldmann, R. J.; Fales, H. M.; Milne, G. W. A. Conversational mass spectral search system. IV. Evolution of a system for the retrieval of mass spectral information. J. Chem. Doc. 1973, 13 (3), 130-3. DOI: 10.1021/c160050a009
  3. Heller, S. R. The chemical information system and spectral databases. J. Chem. Inf. Comput. Sci. 1985, 25 (3), 224-31. DOI: 10.1021/ci00047a017
  4. Heller, S. R.; Milne, G. W. A.; Feldmann, R. J. Quality control of chemical data bases. J. Chem. Inf. Comput. Sci. 1976, 16 (4), 232-3. DOI: 10.1021/ci60008a010
  5. Heller, S. R.; McNaught, A.; Pletnev, I.; Stein, S.; Tchekhovskoi, D. InChI, the IUPAC International Chemical Identifier. J. Cheminf. 2015, 7, 1-63. DOI: 10.1186/s13321-015-0068-4
  6. Warr, W. A. Many InChIs and quite some feat. J. Comput.-Aided Mol. Des. 2015, 29 (8), 681-694. DOI: 10.1007/s10822-015-9854-3
  7. Endicott, K. M. The National Chemotherapy Program J. Chron. Dis. 1958, 8 (1), 171. DOI: 10.1016/0021-9681(58)90047-X
  8. Austin, C. P.; Brady, L. S.; Insel, T. R.; Collins, F. S. NIH Molecular Libraries Initiative. Science 2004, 306 (5699), 1138. DOI: 10.1126/science.1105511
  9. Zenkevich, I. G.; Babushok, V. I.; Linstrom, P. J.; White V, E.; Stein, S. E. Application of histograms in evaluation of large collections of gas chromatographic retention indices. J. Chromatogr. A 2009, 1216 (38), 6651-6661. DOI: 10.1016/j.chroma.2009.07.065
  10. Milne, G. W. A.; Miller, J. A. The NCI Drug Information System. 1. System overview. J. Chem. Inf. Comput. Sci. 1986, 26 (4), 154-9. 10.1021/ci00052a002
  11. Poroikov, V. V.; Filimonov, D. A.; Ihlenfeldt, W.-D.; Gloriozova, T. A.; Lagunin, A. A.; Borodina, Y. V.; Stepanchikova, A. V.; Nicklaus, M. C. PASS Biological Activity Spectrum Predictions in the Enhanced Open NCI Database Browser. J. Chem. Inf. Comput. Sci. 2003, 43 (1), 228-236. DOI: 10.1021/ci020048r
  12. Steinbeck, C. LUCY - A program for structure elucidation from NMR correlation experiments. Angew. Chem., Int. Ed. Engl. 1996, 35 (17), 1984-1986. DOI: 10.1002/anie.199619841
  13. Steinbeck, C. SENECA: A platform-independent, distributed, and parallel system for computer-assisted structure elucidation in organic chemistry. J. Chem. Inf. Comput. Sci. 2001, 41 (6), 1500-1507. DOI: 10.1021/ci000407n
  14. Guha, R.; Howard, M. T.; Hutchison, G. R.; Murray-Rust, P.; Rzepa, H.; Steinbeck, C.; Wegner, J.; Willighagen, E. L. The Blue Obelisk - Interoperability in Chemical Informatics. J. Chem. Inf. Model. 2006, 46 (3), 991-998. DOI: 10.1021/ci050400b
  15. Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E. The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J. Chem. Inf. Comput. Sci. 2003, 43 (2), 493-500. DOI: 10.1021/ci025584y
  16. Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.; Willighagen, E. L. Recent developments of the Chemistry Development Kit (CDK) - an open-source Java library for chemo- and bioinformatics. Curr. Pharm. Des. 2006, 12 (17), 2111-2120. DOI: 10.2174/138161206777585274
  17. Steinbeck, C.; Krause, S.; Kuhn, S. NMRShiftDB - constructing a free chemical information system with open-source components. J. Chem. Inf. Comput. Sci. 2003, 43 (6), 1733-1739. DOI: 10.1021/ci0341363
  18. Steinbeck, C.; Kuhn, S. NMRShiftDB - compound identification and structure elucidation support through a free community-built Web database. Phytochemistry 2004, 65 (19), 2711-2717. DOI: 10.1016/j.phytochem.2004.08.027
  19. Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40 (D1), D1100-D1107. DOI: 10.1093/nar/gkr777
  20. Edison, A. S.; Hall, R. D.; Junot, C.; Karp, P. D.; Kurland, I. J.; Mistrik, R.; Reed, L. K.; Saito, K.; Salek, R. M.; Steinbeck, C.; Sumner, L. W.; Viant, M. R. The time is right to focus on model organism metabolomes. Metabolites 2016, 6 (1), 8/1-8/7. DOI: 10.3390/metabo6010008
  21. Haug, K.; Salek, R. M.; Conesa, P.; Hastings, J.; de Matos, P.; Rijnbeek, M.; Mahendraker, T.; Williams, M.; Neumann, S.; Rocca-Serra, P.; Maguire, E.; Gonzalez-Beltran, A.; Sansone, S.-A.; Griffin, J. L.; Steinbeck, C. MetaboLights-an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res. 2013, 41 (D1), D781-D786. DOI: 10.1093/nar/gks1004
  22. Kale, N. S.; Haug, K.; Conesa, P.; Jayseelan, K.; Moreno, P.; Nainala, V. C.; Spicer, R. A.; Williams, M.; Salek, R. M.; Steinbeck, C.; Rocca-Serra, P.; Li, X.; Griffin, J. L. MetaboLights: An Open-Access Database Repository for Metabolomics Data. Curr. Protoc. Bioinformatics 2016, 53, 14.13.1-18. DOI: 10.1002/0471250953.bi1413s53
  23. Salek, R. M.; Haug, K.; Conesa, P.; Hastings, J.; Williams, M.; Mahendraker, T.; Maguire, E.; Gonzalez-Beltran, A. N.; Rocca-Serra, P.; Sansone, S.-A.; Steinbeck, C. The MetaboLights repository: curation challenges in metabolomics. Database 2013, 2013, bat029. DOI: 10.1093/database/bat029
  24. Beisken, S.; Earll, M.; Baxter, C.; Portwood, D.; Ament, Z.; Kende, A.; Hodgman, C.; Seymour, G.; Smith, R.; Fraser, P.; Seymour, M.; Salek, R. M.; Steinbeck, C. Metabolic differences in ripening of Solanum lycopersicum 'Ailsa Craig' and three monogenic mutants. Sci. Data 2014, 1, 140029. DOI: 10.1038/sdata.2014.29
  25. Johnson, S. How We Got to Now. Six Innovations That Made the Modern World; Riverhead Books: New York, NY, 2014. ISBN: 978-1-59-463393-5
  26. Wilkinson, M. D.; Dumontier, M.; Aalbersberg, I. J. J.; Appleton, G.; Dumon, O.; Groth, P.; Strawn, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.-W.; da, S. S. L. B.; Bourne, P. E.; Bouwman, J.; Brookes, A. J.; Clark, T.; Crosas, M.; Dillo, I.; Edmunds, S.; Evelo, C. T.; Finkers, R.; Gonzalez-Beltran, A.; Rocca-Serra, P.; Sansone, S.-A.; Gray, A. J. G.; Goble, C.; Grethe, J. S.; Heringa, J.; Kok, R.; t, H. P. A. C.; Hooft, R.; Kuhn, T.; Kok, J.; Lusher, S. J.; Mons, B.; Martone, M. E.; Mons, A.; Packer, A. L.; Persson, B.; Roos, M.; Thompson, M.; van, S. R.; Schultes, E.; Sengstag, T.; Slater, T.; Swertz, M. A.; van, d. L. J.; van, M. E.; Mons, B.; Velterop, J.; Waagmeester, A.; Wittenburg, P.; Wolstencroft, K.; Zhao, J.; Mons, B. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. DOI: 10.1038/sdata.2016.18
  27. Hu, W.; Qiu, H.; Dumontier, M. Link Analysis of Life Science Linked Data. In The Semantic Web - ISWC 2015: 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part II; Arenas, M.; Corcho, O.; Simperl, E.; Strohmaier, M.; d'Aquin, M.; Srinivas, K.; Groth, P.; Dumontier, M.; Heflin, J.; Thirunarayan, K., Staab, S., Eds.; Springer International Publishing: Cham, 2015; pp 446-462. DOI: 10.1007/978-3-319-25010-6_29
  28. Kamdar, M. R.; Dumontier, M. An Ebola virus-centered knowledge base. Database 2015, 2015, bav049. DOI: 10.1093/database/bav049
  29. Gottlieb, A.; Stein, G. Y.; Ruppin, E.; Sharan, R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol. Syst. Biol. 2011, 7, 496-504. DOI: 10.1038/msb.2011.26
  30. Callahan, A.; Dumontier, M. Evaluating Scientific Hypotheses Using the SPARQL Inferencing Notation. In The Semantic Web: Research and Applications: 9th Extended Semantic Web Conference, ESWC 2012, Heraklion, Crete, Greece, May 27-31, 2012. Proceedings; Simperl, E.; Cimiano, P.; Polleres, A.; Corcho, O., Presutti, V., Eds.; Springer: Berlin, Heidelberg, 2012; pp 647-658. DOI: 10.1007/978-3-642-30284-8_50
  31. Callahan, A.; Dumontier, M.; Shah, N. H. HyQue: evaluating hypotheses using Semantic Web technologies. J. Biomed. Semantics 2011, 2 (2), 1-17. DOI: 10.1186/2041-1480-2-S2-S3
  32. Hastings, J.; Chepelev, L.; Willighagen, E.; Adams, N.; Steinbeck, C.; Dumontier, M. The Chemical Information Ontology: provenance and disambiguation for chemical data on the biological Semantic Web. PLoS One 2011, 6 (10), e25513. DOI: 10.1371/journal.pone.0025513
  33. Bolton, E. E.; Wang, Y.; Thiessen, P. A.; Bryant, S. H. PubChem: integrated platform of small molecules and biological activities. In Annual Reports in Computational Chemistry; Wheeler, R. A., Spellmeyer, D. C., Eds.; Elsevier: Amsterdam, 2008; Vol. 4, pp 217-241. DOI: 10.1016/S1574-1400(08)00012-1
  34. Hahnke, V. D.; Bolton, E. E.; Bryant, S. H. PubChem atom environments. J. Cheminf. 2015, 7, 41/1-41/37. DOI: 10.1186/s13321-015-0076-4
  35. Kim, S.; Thiessen, P. A.; Bolton, E. E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B. A.; Wang, J.; Yu, B.; Zhang, J.; Bryant, S. H. PubChem Substance and Compound databases. Nucleic Acids Res. 2016, 44 (D1), D1202-13. DOI: 10.1093/nar/gkv951
  36. Stuart, R. B.; McEwen, L. R. The Safety "Use Case": Co-Developing Chemical Information Management and Laboratory Safety Skills. J. Chem. Educ. 2016, 93 (3), 516-526. DOI: 10.1021/acs.jchemed.5b00511
  37. Hill, R. H.; Finster, D. C. Laboratory Safety for Chemistry Students; Wiley: Hoboken, NJ, 2010. ISBN: 978-0-470-34428-6 (See also 2016 2nd Edition)
  38. McEwen, L. R.; Stuart, R. B. Meeting the Google Expectation for Chemical Safety Information. Chemical Risk Assessment in Academic Research and Teaching. Chem. Int. 2015, 37 (5-6), 12-16. DOI: 10.1515/ci-2015-0505
  39. National Research Council Prudent Practices in the Laboratory; National Academies Press: Washington, DC, 2011. DOI: 10.17226/12654

ANYL: New Directions in Chemometrics: Making Sense of Big & Small Chemical Data Sets

This session was originally proposed as a CINF session dealing with computational methods for spectral analysis and spectral databases. Unfortunately, there were insufficient submissions for CINF to have a standalone session so we cosponsored and merged our submissions with those of the Analytical Chemistry Division. Although the session was scheduled for Thursday, including Thursday afternoon, the symposium was very well attended (thanks to the University of Delaware graduate students).

The use of the KNIME pipelining analytics platform for processing metabolomics MS/MS data performing noise reduction, signal normalization, peak picking, retention time correction, isotope merging, and annotation was presented. Some of the symposium was concerned with developing a modeling strategy that is robust to spectral interferents based on adaptive regression and multivariate calibration. Dr. J. Johnson from the Hilmar cheese company in California discussed the chemometric models used for real-time spectral monitoring of cheese processing for the FDA. Dr. Curtis Mowry of Sandia National Labs discussed principal component analysis of X-ray fluorescence (XRF) iron andc hromium peaks to obtain a signature to identify adulterated materials. Dr. Douglas Van der Gried discussed obtaining binding constants from spectrophotometric data and methods for modeling spectrophotometric titration data. Dr. Joseph Smith discussed Raman microspectroscopic data mapping. Dr. Barry Lavine presented principal component analysis methods of MALDI/MS-MS profiling for altered glycans as a cancer diagnostic. Dr. Kevin Moore of USP Pharmacopeia discussed the analysis of FTIR data of excipients (the “inert” drug ingredients). Dr. Greg Banik discussed Bio-Rad’s spectral optimization and correction methods and spectral identification databases. Dr. Tony Williams presented the new EPA ToxCast Interactive Chemical Safety for Sustainability (iCSS) dashboard available for public use and data access and analysis. Dr. Pillhum Son of CAS closed out the session presenting spectral data available through the CAS REGISTRY databases.

All in all, many useful methods and algorithms for spectral analysis and spectral identification through database comparison were presented, as well as the application of spectral data for identification of sample quality, defects, and alteration.

Rachelle Bienstock

Shedding Light on the Dark Genome: Methods, Tools & Case Studies

In 2014 the NIH initiated a program entitled, “Illuminating the Druggable Genome” (IDG) with the goal of improving our understanding of the properties and functions of proteins that are currently unannotated within the four most commonly drug-targeted protein families - GPCRs, ion channels, nuclear receptors, and kinases. The symposium entitled “Shedding Light on the Dark Genome: Methods, Tools & Case Studies” was put together by Rajarshi Guha (NIH National Center for Advancing Translational Sciences (NCATS)) and Tudor Oprea (U. New Mexico) and was designed to highlight recent work addressing data resources and methods being developed that can provide insight in to dark targets: protein targets that are unstudied or understudied in the public literature. The presence of these dark targets represents a knowledge deficit and together with the fact that a small fraction of the proteome is currently therapeutically targeted highlights the need for resources that can shed light on these dark targets.

Tudor Oprea (U. New Mexico) was the first speaker and presented an overview of the IDG Knowledge Management Center (KMC), highlighting the diverse data sources and data types that have been integrated to construct the Target Central Resource Database (TCRD). In addition to collating data sources, the TCRD includes the results of text mining of drug labels, patents, and medical literature. Oprea went on to describe the Target Development Level (TDL) that classifies protein targets according to the level of knowledge available about them. He also described the front-end to the KMC, namely Pharos, a Web portal that provides users access to the TCRD data. Based on analysis of TCRD data, Oprea concluded that only 38% of the human proteome is currently functionally annotated and less than 3% of the proteome is therapeutically targeted, and, finally, only a quarter of all diseases are targeted via therapeutic agents.

The next speaker, Prudence Mutowo (European Bioinformatics Institute (EBI)) spoke about the use of ChEMBL and SureChEMBL resources to track drug targets. The EBI is a collaborator in the IDG program and these resources represent key components of the TCRD data source developed at the University of New Mexico. These resources represent curated sources of information on small molecule bioactivity, mined from the medicinal chemistry literature (ChEMBL) and small molecules extracted from the patent literature (SureChEMBL). Mutowo highlighted the challenges involved in the curation process, especially for understudied targets, which are most relevant to the IDG. She concluded with specific examples of how the curation has proven useful in shedding light on dark targets.

Stephan Schürer (U. Miami) was the next speaker, and his talk addressed the development of the Bioassay Ontology (BAO) and Drug Target Ontology (DTO) and their role in providing a framework to support data integration, clarification, and mining. He described how these ontologies have been employed in multiple large scale projects such as the BioAssay Research Database (BARD) and most recently Library of Network-Based Cellular Signatures (LINCS). He highlighted the importance of standards in data integration pipelines and provided specific examples of challenges faced in the LINCS and IDG projects. He concluded by highlighting specific examples of how the DTO was used in the IDG project and pointed out how these ontologies provide a robust framework to represent, integrate, model, and query diverse drug discovery data generated in different projects.

The next speaker was Anders Dohlman (Mt. Sinai School of Medicine) who described the development of classification models to predict adverse cardiovascular events caused by tyrosine kinase inhibitors (TKIs). After providing an overview of the cardio effects of TKIs he described the role of the FDA FAERS and Drug Labels resources as a source of adverse event information on known TKIs. He then described a random forest model developed to predict adverse events based on structural features of the TKIs, and highlighted how c-Kit mutants and their distinct binding patterns correlated with occurrence of hypertension. He finally highlighted the use of their recently developed method, the characteristic direction, applied to the LINCS L1000 dataset to generate biomarker sets for the identification of cardiac adverse events.

Following Dohlamn, Rajarshi Guha (NIH NCATS) talked about Pharos (https://pharos.nih.gov/idg/index), the front-end for the KMC. Following on from the presentation by Oprea, Guha highlighted the architecture of the application, specifically pointing out the design decisions that were taken to address specific classes of users - biologists, computational scientists, and funders. He described the role of the underlying REST API and its role in providing direct, programmatic access to the TCRD data as well as the basis for the graphical interface. He also highlighted specific features such as the target dossier that enables users to collect information on targets, diseases, and compounds as they browse, and store them for later analysis. Given the diverse data sources and types, he then highlighted the various visualization methods implemented in Pharos to enable efficient summary and drill down when required.

Next, Meir Glick (Merck) described the strategy at Merck to enable target identification and validation, based on integrated screening, synthesis, and informatics. He highlighted the role of informatics in bridging multiple data types, whose subsequent integration is vital for linking small molecules to phenotypes. He then described examples of harmonization of small molecules, targets, and activities. He concluded his presentation by describing the concept of dark chemical matter, small compounds that are inactive in multiple assays, and how they could represent possible tool compounds against the right system.

The penultimate speaker was Olexander Isayev (U. North Carolina) who described an approach to predicting kinase activity profiles using deep convolutional neural networks. He pointed out that traditional profile activity models are built separately for individual kinases and then concatenated independently. In contrast the work he presented involved developing a multi-task learning model that uses data on multiple kinases simultaneously during training. He highlighted how his model exhibits very good training statistics, compared to a random forest model.

The final speaker was Haobo Gu (U. Tennessee, Knoxville) who described a study of intrinsically disordered regions of proteins, designating them as the dark matter of the proteome. He then described an approach using sequence length and intrinsic disorder to cluster sequences. He then went on to show how this clustering distinguished eukaryotes from prokaryotes and various other groupings. He concluded that the proposed method is capable of clearly identifying the evolutionary status of the organisms.

Rajarshi Guha, National Center for Advancing Translational Science

Bringing Cheminformatics into the College Chemistry Classroom

This report is on selected presentations from a symposium organized by Robert Belford (Universty of Arkansas at Little Rock (UALR)) and Sunghwan Kim (PubChem), which sought papers related to the teaching of cheminformatics to chemistry majors. Papers were sought related to both teaching cheminformatics classes and the use of cheminformatics in other classes across the chemistry curriculum. The objective of the symposium was to give cheminformaticians and chemical educators a chance to share resources and experiences. The symposium was attended by both faculty and students, and afterwards we went to lunch at JOY TSIN LAU to further our discussions on bringing cheminformatics into the college chemistry classroom.

Teaching cheminformatics symposium participants

Participants of the symposium who continued the discussions over a lunch of dim sum

Thibault Géoui of Elsevier started the symposium with “Learning to find the right information: A survey of chemistry information literacy in the undergraduate classroom”, which was based on a survey of 138 educators. The survey identified that students expect information immediacy, expect information to be easy to find, want direct answers, and assume search engines understand them, all of which identified a mismatch between the structure of scientific information and student behavioral expectations. These behaviors could be the result of usage of Internet search engines like Google and indicate a need to introduce search engines like Reaxys into the classroom in order to improve student learning outcomes.

The second paper was presented by Ralph Stuart (Keene State) and coauthored by Leah McEwen (Cornell) on “Co-developing chemical information management and laboratory safety skills”. “Chemical Information” and “Laboratory Safety” are two skill sets identified by the ACS Committee on Professional Training (CPT) that undergraduate programs should impart to their students. Stuart and McEwen identified that these two skills support each other when taught as connected topics, and this could be done through the RAMP hazard assessment method described in the CPT guidelines as a chemical information research challenge. This approach enables determining the relevant question to search, identification of information sources, assessment of source quality, developing decisions based on the information collected, and documenting the basis for those decisions. This approach is described in an open access Journal of Chemical Education article that the authors coauthored last fall (https://dx.doi.org/10.1021/acs.jchemed.5b00511)

Douglas A. Vander Griend of Calvin College followed this with a presentation on “Introducing SIVVU.org, a Web-based program for modeling spectrophotometric titration data”. SIVVU (UVVIS spelled backwards) can analyze spectrophotometric titration data used to thermodynamically characterize multicomponent systems. Designed from a chemist’s perspective, it can use factor analysis to analyze the mathematical structure of pertinent datasets, and model the data according to user-provided chemical reactions to determine spectroscopic signatures and binding constants for the system. The site SIVVU.org is free to use, making it ideal for implementation in undergraduate chemistry laboratories.

Following a short break, Bob Belford (UALR) presented a paper coauthored with Delmar Larsen (UC-Davis) and Andrew Cornell (UALR) on “Integration of cheminformatics material in the STEMWiki hyperlibrary”. One of the objectives of the Cheminformatics OLCC, a multi-campus course taught in the fall of 2015, was to make educational material that could be used in the traditional courses of the undergraduate curriculum. By moving material created in the OLCC to the Libretext Hyperlibrary (formerly STEMWiki, of which the chemwiki alone had 55 million page views last year), the cheminformatics material became available to other classes across the curriculum. Currently 37 schools have libretexts within the Chemistry LibreText hyperlibrary, covering the spectrum of core courses taught in the undergraduate curriculum. So now, if a student in organic chemistry or any other class at one of those schools searches for InChI or SMILES in his textbook, the student is directly connected to educational material, including videos, that were used in the Fall Cheminformatics OLCC.

Hao Zhu of Rutgers-Camden then presented on “Cheminformatics education and research at home: the best way to teach graduate chemistry in the professional community”. Fifty percent of Rutgers-Camden graduate students are part-time students, most with full time jobs, and Hao presented a cheminformatics program that leveraged the flexibility of cheminformatics in the sense that “lab work” could be done off-site and at home. He show-cased several students who participated in this program, and whose interest became so great that they went on to earn PhDs. One of these students was an elementary school teacher. The take-home message of his presentation was threefold. First, as a relatively new field, cheminformatics is open for innovation; second, it is a topic of great interest to today’s youth; and third, cheminformatics research provides an option for students with full-time jobs to continue their education if their work obligations would prevent them from working in a traditional academic chemistry laboratory.

Following a second break, Bob Belford stood in for Brian Murphy, a student of the Fall 2015 Cheminformatics OLCC and presented on Brian’s paper “Fall 2015 Cheminformatics OLCC project based learning: Validation of Wikipedia Chembox hazard information,” while morphing the talk to cover multiple student projects. The presentation discussed aspects of the Drupal-based Cheminformatics OLCC course management system that was designed to facilitate multi-campus collaborative projects, and how students generated screen-capture videos showing the process of how they solved their projects. Specific projects presented included use of Google sheets to webscrape from Wikipedia Chemboxes and connecting academic laboratory inventories to PubChem Laboratory Chemical Safety Summaries (LCSS). The presentation ended with an overview of the upcoming spring 2017 Cheminformatics OLCC.

The final presentation was by Chase Smith (Massachusetts College of Pharmacy and Health Sciences (MCPHS) University) and Tamsin Mansley (Optibrium) on “Modern cheminformatics tools in the teaching laboratory: A practical exercise simulating a drug discovery project”. This presentation reported on a five-week long laboratory exercise that was incorporated into the Pharmaceutical Sciences graduate program at MCPHS University to simulate early-stage hit-to-lead and lead optimization in a drug discovery program. Students use the StarDrop cheminformatics software package from Optibrium Ltd. to guide the selection and design of compounds with an optimal balance of properties, together with publicly available datasets downloaded from the European Molecular Biology Laboratory (EMBL) Neglected Tropical Disease website. The laboratory simulation exercise provided a much-needed hands-on experience related to complex topics normally discussed only in theory, including mining primary screening data, predictive modeling, and drug metabolism, and provided the students with practical experience utilizing modern cheminformatics software.

Dr. Robert E. Belford, UALR Department of Chemistry

Committee Reports

ACS Council Meeting

The Council of the American Chemical Society met in Philadelphia, PA on Wednesday, August 24, 2016 from approximately 8:20 a.m. until 12:15 p.m. in the Grand Ballroom Salon E-H of the Philadelphia Marriott Hotel. There were seven items for council action and they are summarized below.

Elected Committees of Council

Council Policy Committee: Council voted to fill seven slots on the Council Policy Committee from fourteen nominees as follows: the four candidates receiving the highest numbers of votes, Harmon B. Abrahamson, Lissa A. Dulany, Sally B. Peters, and Andrea B. Twiss-Brooks were elected for the 2017-2019 term; one candidate, Martin D. Rudd, was elected for a 2017-2018 term; and two candidates, Karl S. Booksh and Ella L. Davis, were elected for a one-year term in 2017.

Committee on Nominations and Elections: Council voted to fill six slots on the Committee on Nominations and Elections from twelve nominees as follows: the five candidates receiving the highest numbers of votes, Lisa M. Balbes, Alan M. Ehrlich, Alan A, Hazari, Amber S. Hinkle, and Thomas H. Lane, were elected for the 2017-2019 term; and one candidate, Neil D. Jespersen, was elected for a one-year term in 2017.

Committee on Committees: Council voted to fill five slots on the Committee on Committees from ten nominees as follows: Dee Ann Casteel, D. Richard Cobb, Emilio X. Esposito, Wayne E. Jones, Jr., and Stephanie J. Watson, were elected for the 2017-2019 term.

Continuation of Select Committees

On the recommendation of the Committee on Committees, the council voted to approve the continuation of the ACS Committee on Analytical Reagents, and the Committee on Chemical Abstracts Service, subject to confirmation by the Board of Directors.

Change in Local Section Territory

On the recommendation of the Committee on Local Section Activities, the Council voted to approve a petition from the Permian Basin Local Section to annex the Texas counties of Pecos and Brewster, and the petition from the Upper Peninsula Local Section to annex seven unassigned counties in Michigan, and also reassign one Michigan county (Menominee) currently assigned to the Northwest Wisconsin Local Section.

Unemployed Members’ Dues Waiver

On the recommendation of the Committee on Membership Affairs, the council voted to approve a petition to amend the ACS Bylaws to extend the unemployed members’ dues waiver to allow Society members to remain as members without paying dues for a period of up to three years from the current period of two years (Bylaw XIII, Sec.3, k), subject to confirmation by the Board of Directors.

Chemical Professional’s Code of Conduct

On the recommendation of the Committee on Economics and Professional Affairs, the council approved the proposed revisions of the Chemical Professional’s Code of Conduct, subject to confirmation by the Board of Directors.

Charter Bylaws

On the recommendation of the Committee on Constitution and Bylaws (C&B), the council approved two proposed revisions to the Charter Bylaws templates: one for divisions in probationary status and one for new local sections. C&B also proposed two amendments for the ACS Bylaws for consideration: a petition for the removal of ACS officers and councilors, and a petition on the rights of Local Section and Division Affiliates. The first petition was referred to the Council Policy Committee and the second to the Committee on Membership Affairs, and several other committees, for their evaluation.

International Chemical Sciences Chapters

On the recommendation of the International Activities Committee, the council voted to approve their petitions to establish three new International Chemical Sciences Chapters in Greater Beijing (China National Capital Area), South Western China, and Iraq, subject to confirmation by the Board of Directors.

Recognition of Service

Council members were recognized for their 15, 20, 25, 30, 35, 40, and 45 years of service on the ACS Council. Bonnie Lawlor, Councilor for the Division of Chemical Information (1992-2016), was appreciated for her 25 years of service on the ACS Council.

Special Discussion

Councilors were invited to share their thoughts on the proposed recommendations from the ACS Presidential Task Force on the U.S. Employment of Chemists. The task force has been examining all known influences that can impact employment in the chemical sciences in preparation for the report’s expected release later this year.

Reports of Society Committees and Committee on Science (highlights)

Budget and Finance (B&F)

The Society’s 2016 probable year-end budget projects a Net from Operations of $17.3 million. This is $3.9 million higher than the approved budget, but only $723,000 higher than 2015. Total revenues are projected to be $528.8 million, essentially on budget, and 3.3% higher than the prior year. Total expenses are projected at $511.5 million, which is 0.6% favorable to budget, and 3.3% higher than 2015. B&F considered several program funding requests for 2017, and the board subsequently approved funding for the Atlantic Basin Conference on Chemistry, the ChemIDP Program (see: https://chemidp.acs.org/node/532), and the International Student Chapter Program.

Education (SOCED)

SOCED is working with the Division of Education on a workshop proposal for the development of general chemistry performance expectations for submission to the National Science Foundation. The American Association of Chemistry Teachers (AACT) includes more than 3,600 members, 88 of whom are K-12 teachers. High school programming at the Biennial Conference on Chemical Education will be co-listed as an AACT track. The USA team won one gold medal, two silver medals, and one bronze medal (a highest score overall for the USA team) at the 2016 International Chemistry Olympiad in Tbilisi, Georgia.

Science (ComSci)

ComSci is working to inform ACS members and policymakers on strengthening forensic science. The 2013 ACS policy statement on this issue is being revised for board consideration later this year. A symposium, “Forensics: the crossroads of science, policy, and justice,” was organized for the ACS national meeting in Philadelphia. Success stories of chemistry-related cross-sector collaboration on the important role of university-industry partnerships in accelerating innovation will inform a pilot session at the ACS Southwest Regional meeting in November. ComSci submitted an ACS nomination for the Presidential National Medal of Science in April.

Reports of Council Standing Committees (highlights)

Meetings and Expositions (M&E)

Total attendance at the Philadelphia Meeting as of Tuesday evening, August 23, was 12,800, with the breakdown as follows:

Attendees 7,437
Students 3,249
Exhibitors 1,181
Expo only 613
Guests 320
Total 12,800

Attendance at the Fall National Meetings since 2004 is as follows:

2004: Philadelphia, PA 14,025
2005: Washington, DC 13,148
2006: San Francisco, CA 15,714
2007: Boston, MA 15,554
2008: Philadelphia, PA 13,805
2009: Washington, DC 14,129
2010: Boston, MA 14,151
2011: Denver, CO 10,076
2012: Philadelphia, PA 13,251
2013: Indianapolis, IN 10,840
2014: San Francisco, CA 15,761
2015: Boston, MA 13,888
2016: Philadelphia, PA 12,800

As part of M&E’s sustainability plan, print copies of the program book for Philadelphia were significantly reduced: 1,274 print copies were sold; the mobile app received 7004 transfers; and the online program received 2,653 transfers. For its Greener Meeting Program ACS was named a co-winner of the 2016 UFI Sustainable Development Award (see: http://www.ufi.org/awards/sustainable-development-award/), and one of the finalists for the 2016 RISE Award sponsored by Meeting Professional International. The Fall 2024 ACS National Meeting, originally planned for Philadelphia, PA, will be held in Denver, CO.

Divisional Activities (DAC)

DAC worked with selected divisions (Organic, Polymer, Polymeric Materials: Science and Engineering, and Environmental) to produce Presentations on Demand (POD) Shorts, two- to-three-minute video abstracts of presentations delivered at a national meeting. Collectively, they generated 264 presenters in Philadelphia, which will be posted along with the full-length POD in October. DAC will present a formula for allocating funding to the divisions for council approval at the 2017 ACS National Meeting in San Francisco.

Membership Affairs (MAC)

The Society continues to attract large numbers of new members, nearly 24,000 in 2015. However, as of December 31, 2015, the ACS membership stood at 156,876, which is a 0.96% loss from the prior year. Overall membership retention rate for 2015 was 84%. MAC did a pilot, offering 66% discount ($52 full member dues) to members residing in India. As of November 30, 2015, 888 new members from India have joined at the reduced dues rate. MAC approved a similar test to implement the World Bank Model ranking of discounts for countries of interest where ACS has an International Chapter.

Economic and Professional Affairs (CEPA)

The report of the 2015 Survey of New Graduates in Chemistry and Related Fields will be released soon. The initial results show that the overall new graduate unemployment number increased slightly from 12.4% in 2014 to 13.0% in 2015. While below its peak of 14.9% in 2013, the number is high. Mass layoffs continue to be a concern. Recent responses in 2016 included a career day in collaboration with the Delaware Local Section in relation to the DuPont layoffs and work with the Portland Local Section to provide access to the Green Chemistry & Engineering Conference for any member impacted by the Intel layoffs. There will be a pilot series of online career events from September through November.

Constitution and Bylaws (C&B)

Since January 2016, C&B certified eight sets of bylaws for Local Sections and one set of bylaws for a division. (The bylaws for the Division of Chemical Information were certified on August 4, 2016). C&B sent options to update bylaws to 30 local sections, most of whom have not updated their bylaws since 1977. The unit bylaws and the July 1, 2016 edition of the ACS Governing Documents (Bulletin 5) are posted at http://www.acs.org/bulletin5. There are two petitions for consideration: Petition on the Rights of Affiliates and Petition for Removal of Officers and Councilors. New petitions must be received by the Executive Director (bylaws@acs.org) by December 14, 2016 to be included in the Council agenda for the spring 2017 meeting in San Francisco.

Reports of Other Committees (highlights)

Younger Chemists (YCC)

YCC recognized three Local Sections: Northeastern, Nashville, and San Diego, with ChemLuminary Awards at the fall national meeting in Philadelphia. The most recent initiative, Catalyze the Vote, will be piloted this election season. Both ACS Presidential candidates will participate in a virtual town hall specifically for younger chemists. With support of N&E, YCC hopes this event will become an annual campaign.

Women Chemists (WCC)

WCC organized several events at the fall national meeting in Philadelphia: 1) WCC Merck Research Award Symposium, 2) WCC/Eli Lilly Travel Award (poster session), 3) a working breakfast with the ACS Board Committee on Grants and Awards and the ACS Diversity and Inclusion Advisory Board to increase the number and quality of nominations for ACS National Awards from underrepresented groups, 4) The Women in the Chemical Enterprise breakfast, 5) WCC Luncheon, and 6) the WCC Open Meeting/Just Cocktails reception. In 2017, WCC will be celebrating its 90th anniversary and revising its strategic plan. Suggestions can be sent to wcc@acs.org.

Senior Chemists (SCC)

SCC is encouraging the establishment of senior chemists groups within local sections. Ten mini-grants were provided in 2015 and the program was extended in 2016. SCC is planning a symposium “Golden age of industrial chemistry” in collaboration with the Division of History of Chemistry and YCC for the spring 2017 national meeting in San Francisco.

Public Relations and Communications (CPRC)

CPRC co-sponsored a symposium “Chemists and the Public: What Research Shows about Engagement and Communication,” held the second Wikipedia-Edit-a-thon workshop, and presented three ChemLuminary Awards at the fall national meeting in Philadelphia to: 1) Howard and Sally Peters, 2) Northeastern Local Section, and 3) Lehigh Valley Local Section.

Project SEED

Project SEED offers summer research opportunities for high school students from economically disadvantaged families. This summer, more than 425 high school students were placed in over 100 academic, governmental, and industrial laboratories to engage in hands-on research. SEED reviewed 16 scholarship applications and awarded three Ciba Specialty Chemicals Scholarships, each renewable for three years, for the 2016, 2017, and 2018 academic years.

Minority Affairs (CMA)

CMA sponsored or co-sponsored five symposia (a record number) aimed at diversity topics for the meeting theme “Of the People, By the People, for the People”. In collaboration with ComSci, CMA held a luncheon with a keynote speaker, Dr. Cato Laurencin, talking about “Success is what you leave behind: innovation and leadership focused on humanity”. CMA recognized the Richland Local Section with a ChemLuminary Award, and asked to consider a donation to the ACS Scholars Program to support minority undergraduates pursuing degrees in chemical sciences.

Ethics (ETHX)

The Division of Chemical Information will co-sponsor a symposium “The Write Thing to Do: Ethical Considerations in Authorship & the Assignment of Credit” in San Francisco.

Environmental Improvement (CEI)

CEI co-sponsored a session, “The debate: What role should we play in the biotechnology era?” in a two-day symposium intended to explore whether or not to propose a policy statement in this area. CEI also recognized Dallas Fort Worth and Midland Local Sections with ChemLuminary Awards.

Community Activities (CCA)

CCA facilitated hands-on activities at The Franklin Institute in Philadelphia. In 2017 the theme for the Chemists Celebrate Earth Day will be “Chemistry Helps Feed the World” and the theme for the National Chemistry Week will be “Chemistry Rocks!”

Chemical Safety (CCS)

CCS has established a new taskforce for Safety Education Guidelines (about 18 months ago), that has developed guidelines for high school teachers entitled “The ACS Guidelines for Chemical Laboratory Safety in Secondary Education,” and the guidelines for faculty and staff entitled “ACS Guidelines for Chemical Safety Education in Academic Institutions”. The guidelines are organized around the concept of R.A.M.P. (Recognize the hazard, Assess the risk of the hazard, Minimize the risk of the hazard, and Prepare for emergencies). They also include student learning competencies in the area of chemical safety. CCA has also posted “Identifying and Evaluating Hazards in Research Laboratory”. New resources are posted at the CCS website.

Chemical Abstracts (CCAS)

CCAS is organizing a strategic planning meeting to re-establish the committee’s mission and set clear strategy and specific goals. Input from multiple stakeholders to identify ways in which CCAS can further serve Society members is invited by CCAS Chair, Wendy Cornell (cornell@us.ibm.com).

Analytical Reagents (CAR)

CAR released the eleventh edition of its book, Reagent Chemicals: Specifications and Procedures for Reagents and Standard-Grade Reference Materials, in June, 2016 (available in print through Oxford University Press; an electronic version is expected in early 2017).

Board of Directors Actions

On the recommendation of the Committee on Grants and Awards, the board voted to approve the Society’s nominees for the 2017 Perkin Medal, the 2017 National Science Board Public Service Award, and the 2017 Alan T. Waterman Award; and to approve the revised rescission procedures for the national awards and ACS Fellows designation.

On the recommendation of the Joint Board-Council Committee on Publications, the board voted to approve the reappointment of editors-in-chief for two ACS journals.

On the recommendation of the Society Committee on Budget and Finance, the board voted to approve the advance member registration fee of $445 for national meetings in 2017 and also authorized several program funding requests.

Board Discussions

The Board of Directors received and discussed reports from the Governance Agility Task Force, the Committee on Grants and Awards, the Committee on Planning, the Society Committee on Budget and Finance, and the ACS Governing Board for Publishing.

The board and the Council Policy Committee are creating a new taskforce that will look at the Society’s future governance needs. Per the board’s discussion, the chair’s report to Council provided additional details on a proposed Society-wide initiative to ensure an agile, efficient, and effective ACS. Additionally, the board is considering proposed changes to certain board committee duties and roles.

The board discussed the history, role, and contributions of its Standing Committee on Corporation Associates, and the responsibility of ACS to effectively capture the needs of industrial members and their corporations.

The presidents of Chemical Abstracts Service and the ACS Publications Division shared details on their financial performance, and editorial and new product development highlights.

Executive Director and CEO, Tom Connelly, and his direct reports updated the board on issues relating to human resources, ACS finances, and education.

The board heard reports from members of the Presidential Succession on their key priorities and activities as they relate to those of the board and for the purposes of coordinating their ongoing activities on behalf of the Society.

The Board’s Regular (Open) Session

The board held an open session which featured a discussion on “ACS National Meetings of the Future”. Board chair Pat Confalone opened the discussion by stating ACS national meetings are a key way that ACS fulfills its obligation to deliver scientific information, but they need to anticipate members’ future needs. Almost two dozen audience members came forward to offer comments, observations, and suggestions, prompted by a list of questions for the discussion.

Respectfully submitted, September 15, 2016
Svetlana Korolev, Councilor 2016
Charles Huber, Alternate Councilor 2015-17
Bonnie Lawlor, Councilor 2016-18

Joint Board-Council Committee on Publications

The progress report for the Journal of the American Chemical Society and the monitoring reports for Chemical Research in Toxicology, Inorganic Chemistry, ACS Sustainable Chemistry & Engineering, ACS Macro Letters, and Macromolecules were presented, discussed thoroughly, and accepted with thanks. Editor reappointments were reviewed and recommendations were made. ACS Photonics, Bioconjugate Chemistry, and ACS Applied Materials & Interfaces will be monitored next.

C&EN is taking advantage of social media as well as products such as its weekly newsletters to engage with readers and increase Web usage. As part of C&EN’s digital strategy, C&EN has made changes to the website to make it mobile-friendly and the C&EN mobile app was refreshed. Also announced was the development of selected C&EN content translated in Chinese.

The ACS Publications Division President announced that the Society will organize a chemistry preprint server to promote early research sharing for the global chemistry community. This pursuit will be organized jointly with Chemical Abstracts Service and will enlist other external stakeholders and potential co-sponsors.

ACS Publications is undertaking a conference in concert with the Institute of Chemistry, Chinese Academy of Sciences (ICCAS) 60th Anniversary in October, with nearly 300 delegates already registered. ACS Omega, led by co-editors located in the Americas, Europe, China, and India, represents a strong foray for ACS Publications to court global authors to our fully open access journal. Other outreach activities include the development of an Asian editorial advisory board.

Nicole S. Sampson, Chair, Committee on Publications

Joint Board-Council Committee on CAS Report to C&EN

The CCAS Committee met in Executive Session on August 19, 2016 where CAS management reported on highlights from the first half of 2016 and the committee reviewed its past contributions and explored potential future objectives in the context of CAS and Society needs.

CAS President Manuel Guzman reported that the management team continues to make progress on the Strategic Plan for Growth & Optimization, and that new product launches are all performing well. The most recent product launch, ChemZent, has been well received in the market since its June introduction. PatentPak, introduced earlier this year, has received industry recognition with the CODie Award and the Steve Award. The committee was pleased to learn that the next chapter in SciFinder has been unveiled: SciFindern, which will provide exponential power for researchers around the world, was announced at the fall national meeting.

The committee discussed plans for an upcoming CCAS Strategic Planning Meeting to be held in early 2017 in order to update the committee’s mission and set strategy and specific goals. As a prelude, the committee reviewed past and current interactions between CAS and other parts of the Society such as the Membership and Scientific Advancement, Education, and Publications business divisions as well as various technical divisions. Input from multiple Society stakeholders will be invited to help identify ways in which CCAS can serve Society members as a unique conduit and help CAS to fulfill the ACS Mission “to advance the broader chemistry enterprise and its practitioners for the benefit of Earth and its people”.

ACS members having questions or suggestions related to the committee’s mission or strategy are encouraged to reach out directly to committee members.

Wendy Cornell, CCAS Chair

CINF Education Committee

Saturday, August 20, 2016, from 1-3 pm
Philadelphia Convention Center

Attendees: Grace Baysinger (Chair), Judith Currano, Chuck Huber, Jeremy Garritano, Martin Walker, and Donna Wrublewski
Regrets: Ye Li, Suzanne Redalje, Teri Vogel

The committee needs to re-activate efforts to review existing content and determine what new content is needed in the Chemical Information Wikibook. Action: complete this review in six months and create plan for topics and authors. When securing authors, offer options of one-time or ongoing commitment (three years).

At the Spring 2017 ACS National Meeting, the committee plans to have a symposium on open access that will be organized by Grace, Ye, and Erja Kajosalo. Judith and Donna are interested in organizing a symposium for the Fall 2017 National Meeting on ethics of authorship.

SOCED Report: Because Jeremy is a member of the Society's Committee on Education and they met the previous day, he gave us an update. There is a record number of student chapters. Submitting articles (e.g., tips, resources) to “In Chemistry” or the “Graduate Student and Postdoc Newsletter” would be a good way to reach this population. The Committee on Professional Training (CPT ) is working on a survey about virtual classes and labs. They plan to finalize the survey in January. A key issue associated with this survey is whether commensurate credit or experience should be granted for online classes as in-person classes.  Another issue CPT is dealing with are requests to grant accreditation to schools outside the United States.  The Safety Committee has developed guidelines and will be publishing "Hazard Assessment in Research Laboratories" https://www.acs.org/content/acs/en/about/governance/committees/chemicalsafety/hazard-assessment.html     ACS just published two sets of safety guidelines, one for secondary schools and another for academic institutions. See links on this page: https://www.acs.org/content/acs/en/about/governance/committees/chemicalsafety.html .  CAS has been beta testing a new product called "Chemistry Class Advantage" that was designed to give students an authentic research experience.  For more details, please see this press release: http://www.cas.org:8104/news/media-releases/chemistry-class-advantage.

Grace Baysinger, Chair, CINF Education Committee

Sponsor Announcements

Announcement from CAS, a division of the American Chemical Society

CAS continues to provide unparalleled discoverability of chemistry with the recent launch of ChemZent. ChemZent is a new solution available for purchase in SciFinder that delivers the complete collection of approximately three million abstracts from Chemisches Zentralblatt, the oldest compendium of chemistry abstracts dating from 1830-1969.

If you are a chemistry faculty member teaching organic chemistry this fall, CAS invites you to participate in our voluntary beta of a new education solution that leverages SciFinder to enhance teaching and learning of organic chemistry. Organized by the topics taught in the classroom, Chemistry Class Advantage harnesses the power of SciFinder through carefully architected problems aimed at enhancing an undergraduates' ability to use original literature to improve overall comprehension of organic chemistry fundamentals. The goal is to help students think more like researchers. Preview our short video (above), and contact CAS today to learn more.

About CAS

Dedicated to the ACS vision of improving people’s lives through the transforming power of chemistry, the CAS team of highly trained scientists finds, collects, and organizes all publicly disclosed substance information, creating the world’s most valuable collection of content that is vital to innovation worldwide. Scientific researchers, patent professionals, and business leaders around the world rely on a suite of research solutions from CAS that enable discovery and facilitate workflows to fuel tomorrow’s innovation.

JCIM Launches Two New Manuscript Types

Journal of Chemical Information and Modeling has launched two new manuscript types: Application Note and Review. Application Note articles are informative peer-reviewed reports on novel software packages, databases, and Web servers. Review articles are peer-reviewed topical overviews of general interest to the JCIM community. Please refer to the journal’s author guidelines for more details. For pre-submission inquiries, please email the journal at eic@jcim.acs.org.

Learn more about the ACS Publications Website

ACS Publications

In 1879, the American Chemical Society began the publication of chemical research with the first issue of the Journal of the American Chemical Society. Today, ACS publishes 50 peer-reviewed journals filled with cutting-edge articles across a broad spectrum of scientific disciplines.

ACS Publications delivers more than 1 million research articles through its award-winning Web platform. In 2015, these articles were accessed more than 94 million times.

To discover more freely available features and tools available across the ACS Publications website, please check out the other demonstrations and tutorials that are now available:

  • Learn about ACS2Go, our mobile optimized platform
  • Find out how to search and browse our high-quality content
  • Discover our feature rich articles

Visit http://pubs.acs.org/demo to enhance your online experience!

CINF Officers and Functionaries

Chair
Rachelle Bienstock,
RJB Computational Modeling LLC
rachelleb1@gmail.com

Chair-Elect
Erin Davis,
Cambridge Crystallographic Data Centre
erinsdavis@gmail.com

Past-Chair
Rachelle Bienstock,
RJB Computational Modeling LLC
rachelleb1@gmail.com

Secretary
Tina Qin,
Michigan State University
ginna@mail.lib.msu.edu

Treasurer
Rob McFarland,
Washington University
rmcfarland@wustl.edu

CINF Councilors
Bonnie Lawlor,
chescot@aol.com
Andrea Twiss-Brooks,
University of Chicago
atbrooks@uchicago.edu
Svetlana N. Korolev,
University of Wisconsin, Milwaukee
skorolev@uwm.edu

CINF Alternate Councilors
Carmen Nitsche,
carmen@cinformaconsulting.com
Charles Huber,
University of California, Santa Barbara
huber@library.ucsb.edu
Jeremy Ross Garritano,
University of Virginia
jg9jh@virginia.edu

Archivist/Historian
Bonnie Lawlor,
chescot@aol.com

Audit Committee Chair
TBD

Awards Committee Chair
David Evans,
RELX
david.evans@relx.ch

Careers Committee Co-Chairs
Pamela Scott,
Pfizer
pamela.j.scott@pfizer.com
Sue Cardinal,
University of Rochester
scardinal@library.rochester.edu

Communications and Publications Committee Chair
Graham Douglas,
communications at acscinf.org

Procedures Chair
Bonnie Lawlor,
chescot@aol.com

Education Committee Chair
Grace Baysinger,
Stanford University
graceb@stanford.edu

Finance Committee Chair
Rob McFarland,
Washington University
rmcfarland@wustl.edu

Fundraising Interim Committee Chair
Graham Douglas,
communications at acscinf.org

Membership Committee Chair
Donna Wrublewski,
Caltech Library
dtwrub@caltech.edu

Nominating Committee Chair
Rachelle Bienstock,
RJB Computational Modeling LLC
rachelleb1@gmail.com

2016–2017 Program Committee Chair
Elsa Alvaro,
Northwestern University
elsa.alvaro@northwestern.edu

2015–2016 Program Committee Chair
Erin Davis,
Cambridge Crystallographic Data Centre
erindavis@gmail.com

Tellers Committee Chair
Sue Cardinal,
University of Rochester
scardinal@library.rochester.edu

Chemical Information Bulletin Editor Spring
Vincent F. Scalfani,
The University of Alabama
vfscalfani@ua.edu

Chemical Information Bulletin Editor Summer
Judith Currano,
University of Pennsylvania
currano@pobox.upenn.edu

Chemical Information Bulletin Editor Fall
Teri Vogel,
UC San Diego Library
tmvogel@ucsd.edu

Chemical Information Bulletin Editor Winter
David Shobe,
Patent Information Agent
avidshobe@yahoo.com

Webmaster
Stuart Chalk,
University of North Florida
schalk@unf.edu

Contributors to This Issue

Grace Baysinger,
Stanford University
graceb@stanford.edu

Robert Belford,
University of Arkansas at Little Rock
rebelford@ualr.edu

Rachelle Bienstock,
RJB Computational Modeling LLC
rachelleb1@gmail.com

Stuart Chalk,
University of North Florida
schalk@unf.edu

Wendy Cornell,
IBM
wdcornell@yahoo.com

David Evans,
RELX
david.evans@relx.ch

Rajarshi Guha,
National Institutes of Health
guhar@mail.nih.gov

Charles Huber,
University of California, Santa Barbara
huber@library.ucsb.edu

Svetlana N. Korolev,
University of Wisconsin, Milwaukee
skorolev@uwm.edu

Bonnie Lawlor,
NFAIS
chescot@aol.com

Marge Matthews,
Computer Software Professional
marge.matthews@outlook.com

Nicole S. Sampson,
Stony Brook University,
nicole.sampson@stonybrook.edu

David Shobe,
Patent Information Agent
avidshobe@yahoo.com

Wendy Warr,
Wendy Warr & Associates
wendy@warr.com