Book Reviews

Data Management for Researchers: Organize, Maintain and Share your Data for Research Success

ImageBriney, K. Data Management for Researchers: Organize, Maintain and Share your Data for Research Success; Pelagic Publishing: Exeter, UK, 2015.
191 p. + x.  ISBN 978-1-78427-012-4 Hardcover, £ 49.99.
ISBN 978-1-78427-011-7 Paper, £ 24.99.

An excellent practical treatise on the art and practice of data management, this book is essential to any researcher, regardless of subject or discipline. Each of the eleven chapters begins with a recounting of a real life encounter with data management, some favorable, some disastrous. Data are defined broadly, as anything one performs analysis upon and specific examples are discussed. Data management is described in detail as those practices necessary for efficient use of data before, during, and after the research is performed. Each chapter has a concluding summary and references, and the text concludes with an index.

Chapter 1 covers the importance of data management in modern research. Funding agencies now require data management plans and data sharing, reproducibility concerns highlight data management issues, and researchers cannot manage their increasing amounts of digital data the same way as physical samples. The difference between doing data management and writing a data management plan is also discussed.

Chapter 2 describes the “new” circular lifecycle of data (as opposed to the “old” lifecycle that was linear): see the figure below. 

Image

Figure reprinted with permission by Kristin Briney

This lifecycle defines the organization of chapters 3-6 and 10-11 while chapters 7-9 come under the category of storage, covering data security, storage and backups, and long-term preservation.

Chapter 3 covers data management plans and data policies. These policies come from granting agencies, government, and institutions and cover issues such as data retention (including policies), ownership, and copyright. Notebooks, electronic and paper, are covered in depth in chapter 4, and the advantages and disadvantages of each are discussed. The chapter also reviews other types of documentation such as methods, metadata, and standards from publishers and professional societies. File organization, including naming, documentation, and databases, is described in chapter 5. Data analysis is discussed in detail in chapter 6, including the retention of both raw and analyzed data, and analysis methods.

Chapters 7-9 digress from the roadmap outlined in chapter 2 and treat the topics of data security and storage in depth. Managing sensitive data is an important aspect of data security and responsibility, ethics, and methods (including encryption) are described (chapter 7) as well as cloud versus local storage issues. Storage and backup methods (chapter 8) are essential aspects, including long-term versus short term, hardware and software, and storage of non-digital data. Long-term storage is discussed in detail (chapter 9), including retention times (regulated or not), selection of data to be retained or culled, and more on hardware and software including obsolescence. Data ownership, personal copies, and outsourcing in repositories are also essential considerations.

Chapter 10 covers data sharing, (including sharing with a research group), organization, publication, and public access. The last includes Open Access. A brief description of intellectual property (IP), that is, copyright, trade secrets, and patents, is included, although for patents additional sources should be consulted. Licensing is recommended for all data sharing, including collaboration and copyrightable material. Citations and altimetrics are discussed, as well as repositories and their locations. Librarians are cited as resources for data management support.

Chapter 11 covers data reuse and restarts the data lifecycle. Sources of data include libraries and published articles. Reuse rights vary and some exclude use for commercial research. Error treatment and citation practices are discussed with examples.

I noticed that Table 4.3, “Different Representations of the Molecule Acetone,” (p. 60) has the InChI code, but not the InChIKey. Only the CAS Registry Number is listed for CAS, but CAS also has systematic names. (CAS systematic nomenclature is a dialect of IUPAC nomenclature.) Also, to turn to another issue, I have often wondered about the extent of the embargo on reuse of data for “commercial purposes.” MEDLINE had such an embargo, but did that cover the contents of literature searches performed for commercial enterprises, or by consultants to commercial enterprises? Does that embargo also apply to the use of PubMed information? It would seem to be even harder to enforce (if it ever were enforceable).

Readers of this Bulletin will see a continuation of a theme on information management1 covering issues essential to the effective performance of any kind of scientific research. Although it’s been decades since this reviewer generated any laboratory data, he does continue to perform literary research for publication and he is prompted to improve his data management.

(1)    Baykoucheva, S., Managing Scientific Information and Research Data, Chandos Publishing, Amsterdam, Boston, 2015. Reviewed in Chemical Information Bulletin 2015, 67 (3), p. 20-22.

Robert Buntrock, Member, CINF Communications and Publications Committee

The Merck Index

ImageMany readers will be aware that the Royal Society of Chemistry acquired The Merck Index* from Merck & Co. in 2012. The Merck Index is an incredibly useful resource that has gained legendary status among chemists, and so we were delighted to take on its stewardship and future development.  

We published the 15th print edition in 2013, but our main ambition was to create a modern, user-friendly online home for the same content. The Merck Index Online enables users to search by chemical structure, physical properties and text, or a combination of these. Free from the space restrictions of a single printed volume, we have also reintroduced about 1,500 entries that were previously cut from print editions.

The Merck Index Online team here in Cambridge, UK comprises the Editor, Serin Dabb, who provides editorial oversight, and two Data Content Editors, Michael Townsend and Mark Archibald, who investigate new scientific areas and research and write the content. Our scientific backgrounds are spread across organometallic chemistry and catalysis (Serin), physical organic chemistry (Michael) and synthetic organic chemistry (Mark). We’re always keen to hear comments, suggestions, and corrections from readers, either regarding content or features of the website; you can reach us at rscindex@rsc.org.

One of the things we love about The Merck Index is the quirky nature of some of the older content. Perhaps the most famous example is caproic acid’s eponymous “goat-like odor.” This gives a flavour, often quite literally, of a former age of chemistry when smell and taste were routine methods of product characterization. When we bring old records up to date, we try to add the relevant modern science without removing this link to the past.

In keeping with The Merck Index’s tradition, our primary focus is on substances of pharmaceutical interest, with the aim of including every newly approved drug. That said, the scope of the existing content spans all of chemistry and we have already created new entries in the fields of materials chemistry, agrochemicals, and synthetic chemistry. We plan to carry on in this vein: if a chemical substance is of significant interest and importance, for medicine or technology or anything else, then it belongs in The Merck Index.

This continued survey of the broad scope of chemistry goes hand-in-hand with continued technical development of the online platform. We recently added a browsing feature to complement the usual search-based approach, which restores the serendipitous discovery that was always possible with the printed book. Further development is planned to present important information as clearly as possible, and to provide clear links to related external content. The combination of modern database technology with expert curation of the overwhelming mass of chemical data offers readers easy access to the relevant, authoritative information they need.

Mark Archibald, Royal Society of Chemistry

*The name THE MERCK INDEX is owned by Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Whitehouse Station, N.J., U.S.A., and is licensed to The Royal Society of Chemistry for use in the U.S.A. and Canada.