50 Years of the Cambridge Structural Database: Some Personal Perspectives
In 1965, Olga Kennard, with the support of J.D.Bernal, obtained some government funding and assembled a small group of crystallographers and clerical staff to begin work on the Cambridge Structural Database (CSD). I first remember hearing about the project as a crystallography PhD student at Lancaster University and had little idea, at the time, that I would become involved in the project within a few years.
My connection with the Cambridge Crystallographic Data Centre (CCDC) covers two distinct eras: from 1969 to 1974 I worked as a Senior Assistant in Research at Cambridge’s Chemistry Laboratory in Lensfield Road with the main responsibility for computerizing the database, and then from 2003 to 2011 (when I had retired from ChemWeb), I joined the CCDC Board of Governors and eventually became Chair.
The early years
By the mid-60s, it became clear that the number of crystal structures being determined was growing rapidly. Jack Dunitz later told me that, up to this time, a good crystallographer could hold the details of any known crystal structure in memory and have them at his/her fingertips. However, when I joined in 1969, the total number of known organic and organometallic structures had already reached 5,000 and that number was to double again in the next five years. Today, the total number of published structures has reached in excess of 750,000.
In 1969, the CSD database consisted of many sets of 80-column punched cards, assembled and maintained in card trays in a filing cabinet in our common workspace in Lensfield Road. As a first task, we loaded the card images on to magnetic tape and, so as to preserve as much as possible of the existing workflow, we developed a suite of programs to check and maintain these card sets without changing formats. Our priority at the time was to begin to create some books with which we could share the database with colleagues. We were fortunate to obtain use of some computer typesetting software developed for Science Abstracts by the INSPEC group at IEE (Institution of Electrical Engineers), but as well as adapting the software to produce the pages we had designed for the initial typeset bibliographies, we also had to create tapes on an IBM computer which would be readable on an ICL 1900 computer (which used a totally different character representation). The challenge was met and the first two-volume bibliography was published in 1970 by the International Union of Crystallography (IUCr). Supplementary volumes of the bibliography were published in 1971, 1972 and 1974, and then annually. In 1977, a set of comprehensive retrospective indexes 1936-76, including a KWOC (Key Word Out of Context) index, was produced.
In parallel to the production of bibliographies, we began the task of checking the datasets for self-consistency. At a time when computer typesetting of journals wasn’t common, many errors were introduced at the typesetting stage. Digits were often transposed in tables of atomic coordinates and other errors were commonplace. We estimated that some 12% of the unchecked crystallographic literature was wrong and we initiated a large program of error checking. Where problems were found we tried to suggest possible corrections, but we always sought confirmation from the original authors. The first data publication (volume A1) was published in 1972 by IUCr as a continuation of the Chemical Society’s Interatomic Distances special series and covered the literature from 1960 to 1965. It contained some 1300 entries and, for each entry, calculated bond lengths, bond angles and torsion angles were presented. We also generated stereoscopic diagrams which enabled most people to visualize the three-dimensional structure.
In December 1972, a magnetic tape copy of the database was installed by Richard Feldman at the US National Institutes of Health, which became the first national data center, acting as a data distribution point for crystallographers in the United States. As more national data centers became established, the format of the database became more important and thoughts were given to a total database restructure and to adding chemical connectivity information to each record.
CCDC in the 21st century
Fast forward 30 years and the picture has changed dramatically. In the intervening period, CCDC has become a charity (technically a company limited by guarantee) with a commercial subsidiary that handles the thriving software business. Now the bulk of CCDC’s income derives from services provided to the chemical and pharmaceutical industries. A new building, constructed adjacent to the University Chemical Laboratory in the 1990s, houses all aspects of the UK operations. Some US sales operations are co-located with the PDB (Protein Data Bank) facility in New Jersey. As well as database maintenance (which is now as automated as possible), software development and basic research are located in the Cambridge, UK office. As technology became more powerful, new distribution methods became possible and much of the database distribution occurs over the internet, rather than physical mailing of DVDs or magnetic tapes.
Most of the original CCDC personnel from the 1960s have reached retirement age. Robin Taylor (who led the software development group), W.D.S (Sam) Motherwell (who led the research group), Frank Allen (Executive Director), and Steve Salisbury (Administrative Director) all left CCDC within a relatively short period, but Frank Allen became an Honorary Fellow and continued his research interests for a further six years until his untimely death in 2014. Colin Groom was appointed Executive Director in 2008 following Frank’s retirement and benefited from Frank’s advice. Colin has managed the CCDC well in the following years.
CCDC is a recognized teaching institution at Cambridge and has co-supervised PhD students with the staff of the University Chemistry Department in Cambridge or elsewhere. In a typical year, two or three fellowships are awarded to PhD students. One of the highlights of the spring meeting of the Board of Governors was an extra day which gave an opportunity to hear from all the current students and to meet their supervisors.
I have decided to make this a personal recollection rather than a slick commercial presentation on the CCDC and its products. The website http://www.ccdc.cam.ac.uk contains an up-to-date overview.
William Town. Former Chair, CCDC Board of Governors