Research Results: Reproducibility, Reporting, Sharing & Plagiarism

Technical and methodological advances in almost all natural scientific disciplines have led to enormous amounts of experimental and calculated data being available in the written literature and databases. Both the quality and the reproducibility of data are important factors when generating hypotheses, designing experiments, and creating new knowledge. However, the way to finding the nuggets in this scientific goldmine is paved with unknown error levels. Difficulties arise when comparing one’s own experimental findings with those published in the literature, or by being confronted with incomprehensible experimental results and conclusions drawn by authors. The inability to reproduce these findings, not to mention the challenges to construct computer models on the basis of published experimental data, highlight the seriousness of the problems. Furthermore, the quality of data in databases is of course highly dependent on the reliability and depth of literature reports, which in turn can only provide high quality information if the experimental context, in which the data were generated, is comprehensively reported. However, this so called metadata is often inadvertently or deliberately incomplete. In the Beilstein ESCEC Proceedings, 2007, Nicolas Le Novère expressed the consequences more drastically: “There is no point to exchanging quantitative data or models if nobody understands the meaning of the data and the content of the models beside their initial generators.” [1]

The aim of the ACS CINF symposium “Research Results: Reproducibility, Reporting, Sharing and Plagiarism” was to discuss the reproducibility of research results in various fields of chemistry, improving the way that data can be validated, for example, through better reporting and sharing, and to consider the effects of the “publish or perish” paradigm, in particular, in terms of "salami slicing" and plagiarism. In general, the goal was to raise the awareness of potential benefits of changing some current research and publishing practices. The symposium was organized by Martin Hicks (Beilstein-Institut) and chaired by Martin Hicks and Carsten Kettner (Beilstein-Institut) for the three half-day sessions. The screenplay of the first session set the stage for the analysis of the current situation by addressing questions on the motives, and the practical and technical reasons for publishing irreproducible, inconsistent and even fabricated and plagiarized data. In the second session the speakers proposed a variety of potential ways and solutions in order to increase the value of information in journals and databases, and the third session addressed research communication models that are aimed at supporting assessment and quality control of data by the community.

The first session was started by Sara Bowman (Center for Open Science) who shed light on the researchers’ incentives to publish their data. She concluded that often the community is more interested in being visible through publications rather than in disseminating correct data. The use of short-cuts within the circle of hypothesis-driven research and dubious practices, such as eliminating undesirable results or poor application of statistics, can lead to low-quality and often hardly reproducible data. It is not unusual that researchers make the hypothesis fit the data instead of using the data to confirm (or disprove) a hypothesis, although claiming otherwise. [2, 3]

Editor-in-Chief of ACS Nano, Paul Weiss (California NanoSystems Institute, UCLA) reported on publication practices that are considered a common practice even by experienced researchers that range from slightly manipulated data in diagrams to the often observed “copy & paste” practices of previous written texts from the same author. In particular, with regard to self-plagiarism, many authors have not yet developed any sensibility that, once sold, texts cannot be reused in subsequent papers. For the future, this means increased efforts need to be invested by journal editors not only to implement tools for detecting plagiarism, but also to convince authors to change this behavior. Interestingly, PI’s rarely seem to be good models for their students. [4, 5, 6]

A different view on some scientific practices was presented by Kenneth Busch (Office of Inspector General, National Science Foundation) who talked about research misconduct investigations. As a major US-based funding agency, the NSF is interested in both the accuracy of research and the general availability of research results. It is acting as a trustee of taxpayers’ money (as do most funding agencies) and wants to see this money spent efficiently. Correspondingly, NSF has developed a set of rules, which are monitored to make sure that processes for archiving and sharing data are implemented for proper conduct of research. Unfortunately, these rules are often neglected, not only by the researchers themselves who are awarded the financial support, but also by the organizations for which they work. Some real life examples gave insight into the monitoring and investigating activities of NSF (

Robert Bergman (Department of Chemistry, University of California, Berkeley) talked about his observations of scientific misconduct in organic and inorganic chemistry research. By means of some examples he showed that both journals and academic institutions are actively addressing ways to discover both manipulation of data and scientific fraud in detail and obviously welcome the support from the Office of Research Integrity, but this process usually takes a long time from the formal allegations to the implementation of actions taken against researchers. In the case of doubtful research results, which cannot be ascribed to weak methodological approaches, some journals try to introduce a post-peer review system that includes the repetition of experiments by independent researchers. Despite the success of this procedure, for example, for homeopathic studies, it is questionable if this method of peer review would be able to be generally implemented due to potential high costs for the experiments and low benefit for the conducting researchers. (See case summaries

The same topic, reproducibility as a means of establishing the reliability of research results, was addressed by Paul Clemons (Computational Chemical Biology Research, Broad Institute of Harvard and MIT). By confrontation of the three terms: reproducibility, resilience, and consilience, he showed that the meaning of experimental results is relative and depends on the question asked prior to designing the experiment, the hypothesis being generated after the results are obtained. With the example of screening of small molecules as potential drugs in cancer therapy, he demonstrated a high degree of reproducibility and consilience of his data when a significant amount of basic data is taken from several and unrelated sources. The strength of this approach is dependent on the availability of a broad data pool resulting in a high degree of resilience which obviously derives from the statistical distribution of strong (true) data and background noise, that is, weak data. [7, 8]

Rick Danheiser (Chemistry Department, Massachusetts Institute of Technology) asked the question: “Why are the procedures in synthetic organic chemistry often not reproducible?” In principle, he demanded that papers that are not reproducible should be retracted, but this needs to be traced by the journals. His observation was that in the period between 1982 and 2005 in 12% of the papers submitted to Organic Syntheses the results were not reproducible. In the subsequent period until 2015 this number dropped to about 7%. As the main causes for irreproducibility in general, he identified errors such as: the identity and purity of the reactants, which were not determined accurately, temperature control issues and, in some cases, unidentified artificial issues, which led to problems with the repetition of the data. In particular, with regards to the reactants used in the assays, a number of factors can affect the successful reproduction of the findings from one lab by another one. Often, failures are explained by the use of impure and aged compounds or the compounds used were incorrectly identified. Sometimes, experimental data result from contaminations of mixed solutions rather than from single compounds. In the case of aged compounds, the unidentified contaminant contributed to the result whereas the desired compound was absent. In conclusion, many factors can lead to irreproducible results if the experiments are not carried out with strict accuracy.

The usefulness of published data in models is an often discussed issue and was addressed by Tim Clark (Computer Chemistry Center, University of Erlangen-Nürnberg) in his talk. He emphasized that the modeled data never can be better than the experimental data, but often, when obtaining Quantitative Structure-Property Relationships (QSPR), models appear to generate better data by applying interpolation techniques that may over-fit data. Interestingly, research also goes through various fashionable phases in terms of preferring specific algorithms to describe the properties of datasets. When looking at the performance of models one can also assess the experimental data and gain insights into the quality of these data. By means of real examples of simulations and modeling of training data that were enriched by artificial noise, Tim presented the results of his investigation of the effect of noise on the performance of a state-of-the-art regression procedure. (Bagging multiple linear regression demo is available at:  

The poor characterization of inhibitors used in pharmacological studies against proteins of signal transduction cascades was addressed by Aled Edwards (Structural Genomics Consortium). He noted that commercial inhibitors used in biomedicine appear to suffer from over-effective marketing. The reason, why they are used may be linked to the sales pitches of the vendors. The probes available generate irreproducible results, but are involved in an extremely high number of publications for non-selective and poorly characterized inhibitors of kinases. However, well-characterized chemical probes can help identify new targets and thus push research into new areas. The Structure Genomics Consortium (, together with its network of pharmaceutical companies, decided to be highly systematic when developing structure-guided methods to produce inhibitors of high quality, that is, structurally and functionally well-studied. With this approach this consortium was able to develop specific inhibitors against protein kinases within only a few years and at low cost that are involved in epigenetic signaling.

The scientific community is often unable to differentiate between reliable and unreliable data due to reporting practices that are characterized by obscurity. Carsten Kettner (Beilstein-Institut) opened the second session with his presentation of a community-driven initiative that is concerned with the reporting of enzymology data. The STRENDA (Standards for Reporting Enzymology Data, Commission has developed reporting guidelines, which are already recommended by more than 30 biochemistry journals. The aim of the guidelines is to help authors to include all relevant information that is required for interpretation and reproduction of the experimental data. However, the STRENDA guidelines and those from other standardization initiatives, and also rules of funding agencies, all suffer from the same fate, namely, neglect of use. In order to make the guidelines applicable to the entire community, the Commission developed a web-based software tool that assesses the data entered from the manuscript for compliance with the STRENDA guidelines. In addition, after the publication of the manuscript these data will be made public in a database. It is hoped that after the release, this system will gain wide acceptance within the scientific community. A similar path is being followed by the MIRAGE (Minimum Information Required for A Glycomics Experiment, project that proposes reporting guidelines when data on identification and analysis of glycans are reported.

Will York (Complex Carbohydrate Research Center, University of Georgia) first reported that glycans are inherently more complex than proteins and nucleic acids; they are not coded by genes, but catalyzed by specific enzymes, which are specific gene products. This means that for oligosaccharides, tens of thousands of genes need to be expressed in an extremely coordinated way. Oligosaccharides provide numerous combinations of their building blocks, monosaccharides and, in addition, are often branched. Thus, predictions are not possible and validation of published data is very difficult, unless specific information about the methods used to generate and interpret the data is fully and comprehensively given. The MIRAGE Commission ( is addressing a number of glycoanalytic methods when developing guidelines for reporting glycomics data, including mass spectrometry, diverse chromatography approaches and glycan arrays. In order to make these guidelines widely applicable to the community, the group is also developing data exchange formats and specifications for computer-accessible data.

The collection of high-quality data in a database is the central project of the Cambridge Crystallographic Data Centre (CCDC). Ian Bruno informed us that every year CCDC stores about 100,000 new crystal structure records for small molecules. The CCDC provides computational methods that support iterative modeling (2D and 3D) as well as methods for simplification and abstraction. To ensure that raw data are reliable and that reasonable models can be made, it is essential that the experimental conditions are included in the dataset. This metadata, which includes the temperature and pressure of the study, type of radiation, and the model of the instrument, is captured in the Crystallographic Information Framework (CIF), which is a standard format for crystallographic data. Beyond the experimental conditions, CIF data also contain information about the raw, processed, analyzed, and interpreted data. This framework is a suitable prerequisite for providing services for publication, deposition, and validation purposes.

Sharing, reproducibility, and replication, were the headlines of the talk by Philip Bourne of the National Institutes of Health. He delineated the challenges for ensuring rigor and transparency in data reporting. A combination of many pressures such as the scientists’ need for becoming visible within the community, receiving incentives and grants, and creating innovative science with novelty and positive results, rather than negative data, leads to insufficient reporting, poor experimental design, or both, which in turn results in a lack of reproducibility. Therefore, NIH set up some principles to raise awareness within the community, enhance formal training, increase published data quality by adoption of a more systematic review process, and provide sufficient financial support for the investigators. [9] These principles were endorsed by over 130 journals, which paved the way to more detailed means with regards to both the development of alternative methods for judging the quality of grant proposals, and the implementation of infrastructure for the deposition of machine-readable data. (Presentation slides)

Open data, that is, research data, published and unpublished, raw and analyzed, stored in publicly accessible databases, is considered the gold standard for maintaining high quality. The hope is that scientific fraud and data manipulation can be prevented or, at least more quickly uncovered and investigated, since post-publication reviewing can be carried out by the entire scientific community. In addition, open data can create a very broad data basis, which after successful integration may increase the value of the collective scientific data by amplifying significant findings and reducing the impact of poor data as is expected in a Gaussian distribution. [10, 11]  Thus, the third session focused on data sharing and exchange, as potential means for the improvement of data in literature and databases.

After John Overington (European Bioinformatics Institute) unfortunately had to cancel, Antony Williams (Cheminformatics, Royal Society of Chemistry) stepped in to talk about the need to make data standards compulsory. The emphasis of these standards is to prepare any data that also include presentations, images, diagrams, and schemas, for making them publicly accessible. There are already many resources available for publications, pictures and drawings, that can be downloaded and printed, but they are not machine-readable, and thus cannot be re-analyzed using a suitable software tool. Therefore, the goal is to deposit this material in appropriate databases in compliance with widely accepted standards. The talk concluded with the statement that the current standards are rarely in practical use, and thus should be mandatory. Even though mandatory standards may be burdensome for scientists, they are considered advantageous for science.

In his talk, Evan Bolton (National Center for Biotechnology Information) focused on the access and integration of data that are distributed in various databases. In the last decade the amount of publicly accessible data has been growing at nearly exponential rates. However, the databases are usually not interconnected, and thus scientists need to query a number of databases to obtain a dataset that meets their needs. Besides data quality issues and the need for manual curation of the data, there is much push towards the ability to access all the databases at the same time on just one mouse click. In contrast to the title of the talk (“Globalization of Big Data”), the presentation showed ways toward a centralization of distributed data on the local computer by implementing an integrative query tool that includes required ontologies and terminologies. The tool suggested here is RDF, Resource Description Framework (, which is designed as a metadata data model, but can also be used as a description of information implemented in databases. Examples from chemistry showed that RDFs could be applicable to integrate and harmonize data, provided that more open data become available.

The quality of data in databases was the topic of the talk given by Denis Fourches (Chemistry Department, Bioinformatics Research Center, North Carolina State University). He stated that the application of high-throughput screening in chemistry generates huge amounts of data on chemical compounds, which are stored in an increasing number of databases, but the problem is that most of this data is considered irreproducible since they are incomplete, inconsistent, irrelevant, inaccurate, and incorrect. Thus, there is need for multidisciplinary approaches for identifying, characterizing, and potentially curating problematic data entries, as well as the demand for guidelines and workflows to better ensure the reproducibility of chemical and biological data. In order to increase the value of this data for virtual screening, cheminformatics approaches were developed to detect erroneous records in large chemogenomics datasets and to formally correct and curate these data. The suitability of these methods was demonstrated using examples of the integration of data on CPY450 inhibition profiles (SuperCYP, the detection of false-positive and false-negatives, and the normalizing of the experimental variability in multi-run HTS (High Throughput Screening) campaigns. [12]

The last talk of the session was given by Courtney Soderberg (Center for Open Science), who presented a project for collaborative research to support researchers in implementing better research practices and to increase the openness, integrity, transparency, and the reproducibility of scientific research (Reproducibility Project: Cancer Biology The core of this project is an open source web-based framework which is intended to act not only as a portal for the deposition and sharing of raw data, metadata, analyzed and interpreted data, but also for the deposition of project plans and documentation of scientists who are working on the same project from geographically separated labs. In principle, this data can be either kept private within a defined group of scientists, or shared with the entire world if the researchers would like to do so. The idea is to accumulate the entire knowledge on a research project, giving a high degree of transparency of workflows and data, in order to be able to generate reproducible information. The major difficulty might be to convince the scientific community to use this framework as a kind of LIMS (Laboratory Information Management System) that stores all data, including confidential, unpublished data, in the cloud.

“Trust but verify” became the slogan of the session. Scientific fraud and deliberate manipulation of data is imputed to a very small minority of the scientific community. But as discussed by many speakers, there are many mechanisms that can lead to inaccurate and incomplete research results that lack the high standards of science. The organizer obviously struck the right note with this symposium. Throughout the entire session between 40 and 60 participants contributed to constructive discussions setting a vibrant atmosphere even long after the last talk. Due to many comments and opinions from the community that clearly demand an improvement of data quality and integrity in the written literature, this debate will be an ongoing process. Hence, this symposium can be considered a follow-up of the discussion at last year’s symposium “Global Challenges in the Communication of Scientific Research” in San Francisco organized by David Martinsen and Norah Xiao (/node/616) and would bridge to the symposium entitled "Scientific Integrity: Can We Rely on the Published Scientific Literature?" organized by Judith Currano and Bill Town at the next fall national meeting in Boston.


  1. Le Novere, N., Courtot, M. and Laibe, C (2007) Adding Semantics in Kinetics Models of Biochemical Pathways. In: Proceedings of the Beilstein ESCEC Symposium 2006, Eds. Hicks, M.G. and Kettner, C., Logos-Verlag, pp. 137–153.                               
  2. Nuzzo, R. Scientific method: Statistical errors. Nature, 2014, 506 (7487), 150-152.
  3. Kerr, N. HARKing: Hypothesizing After the Results are Known. Pers Soc Psychol Rev. 1998, 2 (3), 196-217.
  4. Committee on Publication Ethics (COPE),
  5. ACS Publications Ethical Guidelines to Publication of Chemical Research,
  6. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age,
  7. Cancer Cell Line Encyclopedia (CCLE) project,
  8. Wawer, M.; Clemons, P. et al. Automated Structure-Activity Relationship Mining: Connecting Chemical Structure to Biological Profiles. J. Biomol. Screen. 2014, 19 (5), 738-748.
  9. Principles and Guidelines for Reporting Preclinical Research, National Institutes of Health,
  10. Big Data to Knowledge initiative, National Institute of Health,
  11. Collins, F.; Tabak, L. Policy: NIH plans to enhance reproducibility, Nature, 2014, 505 (7485), 612-613.
  12. Fourches, D.; Sassano, M.; Roth, B.; Tropsha, A. HTS navigator: freely accessible cheminformatics software for analyzing high-throughput screening data, Bioinformatics, 2014, 30 (4): 588–589.

Carsten Kettner, Symposium Presider and Presenter