Data Management for Researchers: Organize, Maintain and Share your Data for Research Success

ImageBriney, K. Data Management for Researchers: Organize, Maintain and Share your Data for Research Success; Pelagic Publishing: Exeter, UK, 2015.
191 p. + x.  ISBN 978-1-78427-012-4 Hardcover, £ 49.99.
ISBN 978-1-78427-011-7 Paper, £ 24.99.

An excellent practical treatise on the art and practice of data management, this book is essential to any researcher, regardless of subject or discipline. Each of the eleven chapters begins with a recounting of a real life encounter with data management, some favorable, some disastrous. Data are defined broadly, as anything one performs analysis upon and specific examples are discussed. Data management is described in detail as those practices necessary for efficient use of data before, during, and after the research is performed. Each chapter has a concluding summary and references, and the text concludes with an index.

Chapter 1 covers the importance of data management in modern research. Funding agencies now require data management plans and data sharing, reproducibility concerns highlight data management issues, and researchers cannot manage their increasing amounts of digital data the same way as physical samples. The difference between doing data management and writing a data management plan is also discussed.

Chapter 2 describes the “new” circular lifecycle of data (as opposed to the “old” lifecycle that was linear): see the figure below. 

Image

Figure reprinted with permission by Kristin Briney

This lifecycle defines the organization of chapters 3-6 and 10-11 while chapters 7-9 come under the category of storage, covering data security, storage and backups, and long-term preservation.

Chapter 3 covers data management plans and data policies. These policies come from granting agencies, government, and institutions and cover issues such as data retention (including policies), ownership, and copyright. Notebooks, electronic and paper, are covered in depth in chapter 4, and the advantages and disadvantages of each are discussed. The chapter also reviews other types of documentation such as methods, metadata, and standards from publishers and professional societies. File organization, including naming, documentation, and databases, is described in chapter 5. Data analysis is discussed in detail in chapter 6, including the retention of both raw and analyzed data, and analysis methods.

Chapters 7-9 digress from the roadmap outlined in chapter 2 and treat the topics of data security and storage in depth. Managing sensitive data is an important aspect of data security and responsibility, ethics, and methods (including encryption) are described (chapter 7) as well as cloud versus local storage issues. Storage and backup methods (chapter 8) are essential aspects, including long-term versus short term, hardware and software, and storage of non-digital data. Long-term storage is discussed in detail (chapter 9), including retention times (regulated or not), selection of data to be retained or culled, and more on hardware and software including obsolescence. Data ownership, personal copies, and outsourcing in repositories are also essential considerations.

Chapter 10 covers data sharing, (including sharing with a research group), organization, publication, and public access. The last includes Open Access. A brief description of intellectual property (IP), that is, copyright, trade secrets, and patents, is included, although for patents additional sources should be consulted. Licensing is recommended for all data sharing, including collaboration and copyrightable material. Citations and altimetrics are discussed, as well as repositories and their locations. Librarians are cited as resources for data management support.

Chapter 11 covers data reuse and restarts the data lifecycle. Sources of data include libraries and published articles. Reuse rights vary and some exclude use for commercial research. Error treatment and citation practices are discussed with examples.

I noticed that Table 4.3, “Different Representations of the Molecule Acetone,” (p. 60) has the InChI code, but not the InChIKey. Only the CAS Registry Number is listed for CAS, but CAS also has systematic names. (CAS systematic nomenclature is a dialect of IUPAC nomenclature.) Also, to turn to another issue, I have often wondered about the extent of the embargo on reuse of data for “commercial purposes.” MEDLINE had such an embargo, but did that cover the contents of literature searches performed for commercial enterprises, or by consultants to commercial enterprises? Does that embargo also apply to the use of PubMed information? It would seem to be even harder to enforce (if it ever were enforceable).

Readers of this Bulletin will see a continuation of a theme on information management1 covering issues essential to the effective performance of any kind of scientific research. Although it’s been decades since this reviewer generated any laboratory data, he does continue to perform literary research for publication and he is prompted to improve his data management.

(1)    Baykoucheva, S., Managing Scientific Information and Research Data, Chandos Publishing, Amsterdam, Boston, 2015. Reviewed in Chemical Information Bulletin 2015, 67 (3), p. 20-22.

Robert Buntrock, Member, CINF Communications and Publications Committee