Future of the History of Chemical Information

Yes, you read it correctly; we are wondering where the venerable story of chemical information is bound. Consider the impact on chemical research of machine-readable documentation over the past 50+ years, and systematic chemical nomenclature the 100+ years before that. Consider the generations of chemists who built this discipline through their scholarly exchange and navigating the politics of their time. Bend those lenses around to look forward and consider what of the current day will most influence the progress of the chemical enterprise and its information in 50 years. What can we learn from our history to help us focus our endeavors to make future history? As we chart our way forward, what are the important principles for chemistry and chemical information, in particular, that we all in the information profession need to keep clear, front and center? These questions were the drivers of a CINF symposium at the recent ACS Meeting in Philadelphia.

We heard from a diverse panel of knowledgeable information professionals what the landscape of today could lead and distill to, based on what we have learned from various perspectives over 100+ years, about chemistry, information, and most importantly, the people involved in it all. Twelve speakers gave reflective analyses based on their respective areas of expertise, tying it to essential issues for CINF with implications for the fellow Divisions of Chemical Education (CHED) and History of Chemistry (HIST) as well. Links to the presentation slides for most talks are included in this report and also available at: http://bulletin.acscinf.org/node/347 (abstract numbers 47-51 & 59-65). My impressions and reflections on the impact of the future of the history of our chemical information are represented below.

Peter Rusch, currently the Chair of and the CINF Liaison to, the ACS Committee on Nomenclature, Symbols and Terminology, set the tone of the day by “cantilevering history.” He aptly illustrated how cantilevering, much like in bridge building, is a critical aspect of the work of information professionals, and never is really done. His on-point, prescriptive “prospective retrospection” suggested that those practicing the unrecognized “central science” with its “unobvious” information will need to keep vigilant to the integrity of the science. Important principles to consider are seemingly self-evident, but not to be overlooked in any scenario: price/performance, chemical integrity, personal contact and conversation, and of course, good information habits.

Delving into the long history of chemical nomenclature and structure representation were two talks based on a symposium held at the Royal Society of Chemistry in London in November of 2010 (http://www.rsc.org/Membership/Networking/InterestGroups/CICAG/meetings.asp, scroll down to “Celebrating the History of Chemical Information”).

Bill Town gave a thoughtful walk through the histories of confusing nomenclature and eventually more specified compound classification. Early alchemical history was fraught with persecution, resulting in layers of confusion between warring desires of useful classification and secrecy. It took several hundred years to work through multiple systems until the atomic theory and more accurate analysis pulled together understanding. As the need for granularity increased, different nomenclatures and classifications appeared appropriate for organic compounds, inorganic compounds and the elements. Scientists finally started grappling with standardization in the 19th century.

Phil McHale delivered an entertaining evolution of structure representations, from early recognition of atoms and aromatics, through complexities of stereochemistry and delocalized bonds, to implications of Markush generics. Computerized systems depend on clear notation to support robust compound RSVP (register, search, view, print/publish) and have served up a variety of coding schemas based on fragments for substructure searching or linear notation for unambiguous identification. Current structure representation techniques focus on informatics applications, including calculation, prediction, analysis, and leveraging the networked environment through enhancing traditional information formats, linking diverse information streams, and pushing molecular manipulation potential into a variety of social communication venues.

Steve Heller picked up the story of structure representation with a primer on the emerging InChI standard, IUPAC’s algorithm-based, open source International Chemical Identifier system. The idea of the InChI is to enable linking across the very diverse landscape of chemical notation, and definitely gives a twist on future thinking, pushing information publishers and vendors into thinking beyond their current systems and focus on transferrable deliverables. This approach is compatible with any registry or indexing system, but the challenge for InChI will be encouraging support and cooperation across the information industry to implement and develop further specifications as the chemical and computational landscapes continue to evolve.

Guenter Grethe traced the evolution of chemical reaction information from early alchemy focusing heavily on methodology. Desire for control brought on more scientific-like approaches to experimentation and the need for more systematic explanation. Printed sources were characterized by complex indexes and vetted methodology. The diversity of information related to reactions lends itself to endless creativity in computational approaches, including synthesis design, which predated reaction information retrieval. Early synthesis design programs used a variety of algebraic, knowledge-based or numeric approaches; later algorithms relied on reaction information. The real challenge with any reaction tool is interacting with the chemists using the systems and classification remains an important mental indexing tool for chemists. RInChI is currently under development and may help navigate some of the many wrinkles that still persist across systems. Guenter’s call to honor “the intelligence and creativity of…chemists” is good aspiration as we hurtle into the future.

The afternoon session started off with two information services having long histories of innovation in chemical searching, Web of Science and Chemical Abstracts. Vijay Bhatia and Roger Schenck both focused on the future of evaluation and analysis in information systems at the chemical level. Current trends indicate increasing abundance of chemical information of diverse types and sources and chemically robust systems will need to enable scientists across disciplines to sift through the cornucopia more actively and intellectually, and reach decisions. Search and delivery have vastly improved in quality and efficiency over decades and scientists now need sophisticated tools supporting various informatics techniques. Not all information is created equivalent in content or quality and not in all contexts, especially in such intertwining, cross-disciplinary areas as chemical biology.

The next two talks considered the role of chemical information incorporating basic knowledge into learning. Through a historical tour of chemical information education, Adrienne Kozlowski delivered a strong sentiment to revive the focus on information skills in education, reminding us that CINF originated in CHED. Bruce Lewenstein focused on the central role of textbooks in chemical education. With this form in particular there are warring factors under the hood that influence what is presented to students, including considerations of economy, education as industry, adoption-rejection, and different takes on basic subjects by different types of scientists. A lively audience discussion considered Internet-based tools and data flows for chemical education, trending towards increased availability of materials, a divergence of large one-stop tools and many specialized approaches, and the mobile environment that lends itself to smaller discrete steps, or “apps.” A general concern emerged throughout the day that with less tedious activities required to search, find and work with chemical information, there is in effect less practice and less re-enforcement with students about this important aspect of chemistry research.

Engelbert Zass delivered a rigorous retrospective of the interaction of chemists and their information in tandem with the technical developments of access and use over time. We are at a unique point in this history where career information specialists have directly experienced many approaches to stitching together the pieces necessary for robust chemical searching. Some interesting patterns emerge when considering the long view: there are many core fundamental steps that the tools of any day need to address and the data sources need to be well-structured to support this retrieval; chemists themselves need to weigh in scientifically at many of these steps, the searching process is as unique and critical to chemical research as the individual scientists; and this intellectual engagement has ironically been most often accomplished through usually tedious “work-arounds.” Engelbert gave a passionate call that the vigilance of information professionals today needs to be no less; there are as many dangers in today’s searching systems demanding multi-step complicated “work-arounds” and the primary responsibility for searching has again shifted back to chemists themselves as in the previous era of printed sources.

A unique and thought-provoking contribution to the consideration of the future of the history of chemical information was provided by Jeff Seeman’s focus on chemists’ information. As a chemist-historian interested in the unfolding of chemistry through the people who practice and produce it, Jeff seeks information from archival sources as well as the published literature and searching tools. A series of powerful stories around some of the classic discoveries in chemistry gleaned from “primary data” sources illustrated the ongoing importance of considering the past in light of the present and future, for practicing chemists and historians alike. The past is a moving target depending on the vagaries of technology, economics, politics and how researchers choose to build on it; continued access to this past is a concern for all involved. Chemists themselves should be aware of and engage in thoughtful record keeping of their correspondence, data and other aspects of their research process, especially as the daily interactions around research become increasingly ephemeral in the digital environment.

Robert Buntrock brought the symposium together completing the bridge analogy connecting seekers and information. Through a whirlwind tour of the diverse variety of information sources and a dizzying array of print and early machine “interfaces,” the core principles of good information seeking remain the same, from keeping current to experimental design to comprehensive literature reviews and competitive analysis. With the advent of greater access and options for searching online, it is more critical than ever before for information professionals to support chemists. While the construction techniques need updating to meet the technologies, information professionals continue to bridge the same abyss between practicing chemists and the information they need.

Overall it was a great team perspective on how we’ve arrived to the present day; and how even less well prepared I feel than ever before...but inspired. I don’t have any answers. I am still deep in the middle of it all, not quite long enough to fully appreciate where we have been with the intersection of computers, and not quite naive enough to jump into every idea that washes through. I am especially interested in the players: amid international and government players how much of a role will the industry continue to have in shaping information? Is there really a future for the academic side and is this best focused through computer science and information theory approaches, or do we need to bring in an ethnographic approach, or just more chemists? With enhanced data access, linking, parsing and re-mixing just on the horizon, what new complexities and abilities will chemists and their science encounter? The impression is a perfect storm of centripetal forces; and I am looking forward to pushing this momentum into the murky landscape rich in potential for high-value information.

Leah McEwen, Symposium Co-Organizer