Book Review

Book Review: Bibliometrics and Research Evaluation: Uses and Abuses

Robert E. (Bob) Buntrock, Buntrock Associates

Bibliometrics and Research Evaluation: Uses and Abuses; Gingras, Yves; NIT Press, Cambridge, 2016. 119p. xxii, ISBN 978-0-262-03512-5. Hardcover $25.99.

The title of this book is a mini-abstract. The author is Professor of the History and Sociology of Science at the University of Quebec, Montreal. In his words, “An opinionated essay, not a survey of the field” and “rankings have no scientific validity”. The book is an author-translation and updated version of the original publication in French. Reviews of that version (1) and this English version (2) have been published. The book concludes with chapter notes and an index. Several of the references in the former are in French.

The introduction begins with “Since the first decade of the new millennium, the words ranking, evaluation, metrics, h-index, and Impact Factors have wreaked havoc in the world of higher education and research” (the footnote cites several of many books on the subject). The rest of the book is outlined and the history of the book provided. The book is aimed at researchers and research managers and bibliometric experts will not encounter much on technical details, except for the author’s criteria for evaluating the indicator validity.

However, along with definitions, Chapter 1, Origins, presents a concise history of citations, citation indexes, and bibliometrics. The value of citations precedes the developments of Eugene Garfield 50 to 60 years ago, but of course Garfield’s Science Citation Index solidified the field as not only a useful searching tool, but also a field of study. Scientometrics, for journal evaluation, appeared in 1978, the Journal Research Evaluation began in 1991, and extension to individuals/researchers began in the early 21st century.

Chapter 2 begins with more history, tracing the development and attributes of the Web of Science (WOS; née SCI), Elsevier’s Scopus in 2004, and eventually Google Scholar (GS). Although fee-based, the first two databases are superior to GS with author addresses and countries, bibliographies of papers cited in the article, and subfield classification. Citation searching has been extended to include patents, although use in that area remains controversial. Deficiencies and myths are discussed, including the impact of self-citation, author bylines only for first named author, Impact Factors (IFs) only for journals not books (penalizes social science and humanities), and myths like “only papers in the last five years are cited”. Types of citations are discussed (affirmative, negative, and perfunctory), and the differing value of each for evaluation is noted. Use as a value indicator for possible commercialization of research is not necessarily appropriate.

Eugene Garfield was on record that extension of Impact Factors and the Science Citation Index (SCI) beyond evaluation of journals was not advisable and that journal editors were encouraged to require complete citation records for publication of manuscripts, and that more data were needed for good citations than just for retrieval.

Chapter 3, Proliferation of Research Evaluation, intensifies the critique of misuse of Impact Factors and other bibliometrics. Although researchers have been evaluated for about 350 years, extension of bibliometrics as a supplement to peer review for evaluation of scholarly publications and communications, grant applications, teaching, promotions, departments and research centers, graduate programs, and universities, is a feature of the last few decades. Peer review for hiring researchers goes back two centuries, but bibliometrics began to be used in the 1970s. Citation counts are not always objective. Garfield also recommended that Nobel Prizes should not be awarded on citation counts alone, but that citation searching should be used to access articles followed by evaluation of the relevance of the citation to the evaluation of the researcher. Lysenko, the discredited, fraudulent Soviet biologist is presented as an example. He was highly cited, but mostly in negative mode. The development of the h-index in 2005 is described as well as its worth and deficiencies. It does not necessarily measure both production and the quality of that production. Index developer Hirsch maintains that the index is more democratic than others, but Gingras (and others) say not so. Normalization is difficult and the numerical value can never decrease.

The Impact Factor also comes under critical scrutiny. For 70 years, it has been touted as the measurement not only of the quality of the journal but also of the papers published within. The main source of data since 1975 is the Journal Citation Report (JCR), now based on WOS data. However, the two year “window” of data is too short. The “half-life” of publications varies by discipline and a longer period is needed for validity. The basic version also uses self-citations, so IFs are also published by deleting self-citations. However, not all self-citations are bad, but often are necessary. JCR does “blacklist” some journals for manipulation of their data. Some individuals also manipulate their data via peer review by friends. False precision is also generated by listing IFs to three decimal places, since that precision is not warranted. The Nature Index ranks countries and organizations on the publications they publish in “high-quality journals”. However, the number of journals is only 68, and these rankings put pressure on organizations to publish in these journals. Once again, individual articles are not evaluated, just the journal.

Chapter 4, Evaluation of Research Evaluation, covers just that. Metric indicators do not make evaluations that determine real value. For example, academic and non-academic organizations have different criteria. WOS and Scopus cover only peer-reviewed publications. GS has no such limit, but is coverage of Web sources really “democratization”? The h-index can be manipulated as demonstrated by 100 articles submitted under a fictitious name which yielded an h-index of 94. The evaluation market is growing and at least two private organizations have appeared, typically with contracts to universities. Such an arrangement generated controversy at Rutgers and elsewhere. Market forces have increased competition between WOS and Scopus, and coverage data is given for both. Neither indexes books, but of course books are cited in articles in both resources. There is an English language bias, but English is becoming the international language of science.

There has been much criticism published on the unintended consequences of misapplication of metric indicators, but there is little interest in improving the meaning and accuracy of the measurements. Even the Berlin Principles demonstrate that evaluation is not equivalent to valid ranking. Both require valid indicators. Gingras lists three criteria for valid indicators: (1) adequacy of the indicator for the property or object measured, (2) sensitivity to the “inertia” or lifetime of the object, and (3) homogeneity of the dimensions of the indicator. Production is more easily evaluated than “quality” and “impact” of research, and the latter are better analyzed by surveys. The number of Nobel Prize winners associated with a university is not a good indicator, nor is presence on the Web. A good indicator varies in concert with the variability of the object being measured. Annual rankings of organizations exhibiting large variance are meaningless, and longer intervals are recommended. They tend to be only useful in marketing strategies. Combination of indicators like that done in generating the h-index becomes heterogeneous and it is difficult to assign reasons for any changes. If the values of concepts measured increase, so must a valid indicator increase. For example, there is a limit to the value of having foreign students and professors increase. These criteria were used to determine validity of both the Shanghai ranking and the h-index. For the latter, mixing the number of publications with the number of citations leads to invalidity. A lower index often obscures better researchers.

So, why are invalid indicators used? Mainly for marketing, but political reasons for funding are also involved. Such abuse is not limited to administrators, since scientists have also embraced the use of indicators, especially the h-index. Examples are given for boosting the ranking of universities using misleading figures in relative ranking. Other manipulation is also common, including “dummy” affiliations with other organizations to demonstrate international cooperation. These and other shady actions can lead to fraud.

The emphasis of the book is on abuse of the metrics by universities and administrations, but such abuses also occur at the personal and individual level. The conclusion compares abuse by using invalid indicators to the famous story of The Emperor’s New Clothes. Invalid metrics are often used in ranking of institutions and research, as well as for promotions and hiring decisions.

I have some criticisms of the book. Metrics other than the h-index are alluded to but not addressed, including the g-index, the h(2)-index, and the w-index, which admittedly have similar anomalous behavior as well as being more complicated scoring indexes (3). For comprehensive searching, citation searching should not be the only method used, especially for chemistry with excellent indexed databases. Also, with the use of citations to determine value, the citation is often to a concept not related to the subject of the paper in question. However, these are just quibbles, and the book is a valuable summation of the problems of misuse of bibliometrics, of interest to researchers, librarians, and administrators.

References

  • Book Review of Les dérives de l’évaluation de la recherche; Zitt, Michel; J. Am. Soc. Inf. Soc. Technol., 2015, 66, 2171-2176.
  • Book Review of Bibliometrics and Research Evaluation: Uses and Abuses; Bar-Ilan, J.; J. Am. Soc. Inf. Soc. Technol., 2017, 68, 2290-2292.
  • Scholarly Metrics Under the Microscope: From Citation Analysis to Academic Auditing; Cronin, B. Sugimoto, C.R., Eds., ASIS&T/Information Today, Medford, NJ, 2015. pp. 522-524. Reviewed in CIB, 2016, 68 (2).