Impact of IUPAC InChI on Finding and Linking Information on Chemicals

With the inception of the internet and the proliferation of vast and scattered resources of chemical information on the web, it became evident that the community needed an open identifier to allow for a free and meaningful discoverability and exchange across this body of data. So back in 2000 IUPAC began a project to establish just such an identifier, releasing the first version in 2005. This fall’s symposium on InChI, taking place over 10 years since that effort began, clearly demonstrated that the original goals of the IUPAC efforts have succeeded, as most chemical database producers support or use InChI in one way or another and the exchange of chemical compound information across web resources is now commonplace and successful, thanks to InChI.

The symposium covered many aspects, from ideas for improvement of the existing implementation, to ideas and actions to expand coverage to include new compound types, to examples of how InChI is being used at various organizations today to improve and foster data exchange, and how it could be used in the future.

InChI Expansion

We heard from the InChI Trust (Richard Kidd, Treasurer) and several working groups, whose mission it is to maintain and further develop the InChI algorithm. Polymers are next on the list to be handled by InChI. The specifications have been completed and the standard will be programmed by end of this year. The organometallics proposal is out for comment. For now, funding is still being sought for inorganics and Markush structures.

A fair amount of work has also been completed on the RInChI: a chemical Identifier to handle reactions (update given by Guenter Grethe). Efforts in the biomolecules project are moving forward, with an InChI working group holding a requirements-gathering meeting at NIH later this Fall. (update provided by Keith Taylor). Don Burgess of NIST described a project underway to use the existing InChI structure and expand the tiers to capture conformer, electronic state, and a quantum enumeration layer to create a representation for elementary reactions (InChI-ER).

InChI Improvements

The NCI/CADD group has worked with InChI since the beginning as part of their free web services.  Marc Nicklaus presented an interesting analysis of the current state of tautomers within the InChI 1.x algorithm. With this data they are putting together a proposal to improve the algorithm so InChI can achieve its original design goal of being a “tautomer-invariant” identifier.

Community Use of InChI

Major database producers updated us on their uses of InChI. Tony Williams of the Royal Society of Chemistry discussed how InChI has allowed them to integrated disparate compound databases, and how it supports the pursuit of their open source drug discovery platform. He held out the hope that more chemists outside of the CINF Division would become more aware of InChI and related topics. Evan Bolton of National Center for Biotechnology Information discussed how InChI is fundamental for their cross-resource correlation within PubChem and how they have started offering programmatic services that include InChI. Users can import, export, and compute InChIs, in addition to searching by them. He also discussed the PubChemRDF project, which allows users to download slices of related PubChem data. Ian Bruno of the Cambridge Crystallographic Data Centre discussed how they were using InChI to identify the overlap between the Protein Data Bank (PDB) and the Cambridge Structural Database (CSD).

Future Uses of InChI

Image

One interesting application of InChI was as part of the information incorporated into a QR code, discussed by Don Cruickshank of University of Southampton. This could be a good way to deliver emergency information as well as speed up inventory management. Even labels damaged up to 30% are still readable.

InChI Keys also lend themselves to text mining, and Tom Griffin of IBM reported on processing full text documents and assigning InChIs and InChIKeys (“entity insertion”) to make the originally text-based chemical information indexable and retrievable.

One question that kept coming up in various guises at the Q&A sessions was why the “same compound” often seemed to have “different” InChIs. This tended to do with the way a molecule was normalized rather than with a failure of the algorithm. While the vendors have always understood the challenges of normalization and representation rules, I believe this forum allowed the wider audience to appreciate the rich and nuanced intricacies behind our taken-for-granted chemical drawing tools.  

Please visit http://bulletin.acscinf.org/node/621 for the full program with abstracts and slides, where available. 

Carmen Nitsche, Symposium Reporter