Hum(ai)n Peer Review

Issue 83 · November/December 2025

Join the conversation on Twitter @Briefer_Yet and LinkedIn.

Did a colleague forward the The Brief to you? Sign up via the button below to get your very own copy.

C&E is Hiring!

C&E is recruiting for 3 positions at the firm. If someone you know (or perhaps yourself) might be interested in working with a great team, please send them to our website for more information on these roles: 

Marketing ManagerLead C&E marketing — from brand positioning and content strategy to marketing operations, campaigns, and digital optimization

Marketing Consulting Associate: Support client projects across marketing technology, digital strategy, AI enablement, modern marketing operations, and product strategy

Research Analyst / Senior Research AnalystBlend rigorous research, financial and bibliometric analysis, and strong writing skills to illuminate trends in scholarly communication, publishing business models, and market dynamics

The Physiological Society and Wiley Renew Partnership

C&E supported The Physiological Society in a competitive RFP process that resulted in a strengthened publishing agreement with Wiley for its field-leading journals, The Journal of Physiology and Experimental Physiology. We congratulate The Society and Wiley on their renewed partnership

New Recording Available: Strategic Data Harmonization in Publishing

Colleen recently spoke on a Wiley Partner Services webinar exploring how publishers can unlock hidden value through strategic data harmonization across their portfolios.

The discussion covered practical approaches to transforming submission data into actionable insights, including centralized portfolio dashboards, trend tracking, and data-driven editorial decision-making. Panelists included Colleen Scollans (Clarke & Esposito), Natalia Ortuzar (Wiley), and Amanda French (Research Organization Registry, ROR) — moderated by Joe Mackenzie (Wiley). Listen to the Discussion>

Hum(ai)n Peer Review

Peer review has become a target for those developing AI research tools, and with good reason. As Cambridge University Press & Assessment’s Mandy Hill points out in the Financial Times, the rapid growth of the literature, fueled by publish-or-perish imperatives, is overwhelming our existing and perhaps out-of-date peer review systems. But simply throwing AI at the problem seems unwise (what could go wrong?!), both for peer review of grants and for individual papers. Northwestern University’s Mohammad Hosseini persuasively argues(in an article by Dalmeet Singh Chawla in Chemistry World) that AI is poorly suited for evaluating grants that propose novel ideas. “If AI cannot create really novel ideas, it also is unlikely to detect really creative ideas because it is being trained on existing data.” 

Giorgio Gilestro from Imperial College London, writing in Nature, points out the concerns with AI paper review tools such as q.e.d Science (that missing final period is enough to make our inner copy editors question not only the judgment of everyone in this company but also the series of historical events that has enabled this kind of decision to be made without any kind of intervention). Human peer review consists of an editor sampling three specialist opinions to build a consensus, whereas AI peer review “collapses this process: rather than sampling, it outputs the average reviewer’s assessment.” In other words, the noise in the system is a feature, not a bug to be eliminated. Gilestro also suggests that q.e.d may be engaging in “algorithmic perfectionism,” where “an LLM rewarded for its ability to identify weaknesses will have an incentive to flag every potential issue and become insatiable.” To be fair, this also happens with humans, as reflected in the Reviewer 2 meme

And of course, beyond these concerns lies the “black box” nature of AI algorithms and how readily their outputs may be altered to support aims that have little to do with the present use case of the user. As has been demonstrated for other attempts to outsource research evaluation (e.g., the Impact Factor), Campbell’s Lawsuggests any AI system that gains traction will see a similar gaming effect, in which researchers fine-tune their papers to match the biases of the AI tool to win its approval (human algorithmic perfection).

As it seems both unlikely and undesirable that humans are entirely cut out of the peer review process, perhaps the more fruitful areas to focus on should be AI-driven tools that support the efforts of those humans.

Two important pilots in the rapidly developing area of AI-powered Peer Review Assistants (PRAs) have recently published updates. One was conducted by the  Purpose-Led Publishing (PLP) coalition (AIP Publishing, the American Physical Society, and IOP Publishing). The other by NEJM AI. Together they offer early evidence of how AI might support, not replace, human reviewers. 

Purpose-Led Publishing Pilot 

In April 2025, the PLP coalition began piloting an AI tool to assist peer reviewers. PLP’s goal was to explore how AI might support human review by taking on some of the routine, time-consuming parts of the process. The pilot has now concluded, and the PLP coalition has released its findings. The pilot used Alchemist Review, the PRA developed by Hum in collaboration with GroundedAI.

PLP piloted Alchemist Review with one journal editorial team from each of the coalition’s three publishers. Review capabilities included: AI-generated summaries and key claims, citation checks for accuracy and relevance, and a “chat with the manuscript” feature.

Editors who participated in the PLP coalition trial agreed that tools such as the PRA can make a difference, especially when they focus on repetitive administrative work. Citation checking, metadata validation, and ethics compliance review stood out as areas in which the PRA added genuine value. Freeing up time spent on “busy work” to enable reviewers and editors to focus on the science aligns with the mission of the PLP coalition. 

However, significantly, PLP found that the manuscript digest generated via their PRA “didn’t always capture a paper’s nuance.” The pilot suggested that PRAs can be helpful with structured checks but are still limited in tasks that require interpretive judgment.

NEJM AI Pilot 

Meanwhile, the NEJM AI journal has introduced an AI-supported invitation-only Fast Track “Human + AI review” process that offers an initial decision within seven days. The NEJM AI pilot goes beyond that of the PLP coalition, with an ambition to accelerate the review process by introducing AI-generated peer review as a “voice” in the human editors’ assessment of manuscripts. The first two NEJM AI manuscripts under this Fast Track review process have been published alongside an editorial describing the process in detail, with extensive supplementary material providing the human and AI reviews that informed the final editorial decision.

The NEJM AI Fast Track pilot might be more ambitious — introducing the AI as a peer reviewer rather than simply as a review assistant — than that of the PLP coalition, but it is also starting with deliberate caution. Invitations to participate are extended only to “articles judged initially by multiple editors to have a high likelihood of eventual acceptance.” 

NEJM AI employs general purpose large language models (LLMs) rather than an off-the-shelf PRA tool, as in the case of the PLP experiment. The Fast Track review begins conventionally: a human editor writes a complete review of the manuscript independent of AI input. Following this, the human editor uses two separate LLMs (secure versions of OpenAI’s Generative Pretrained Transformer [GPT]-5 with Thinking and Google’s Gemini 2.5 Pro) to generate a structured review containing a summary, major comments, minor comments, and an overall assessment. Separately, a statistical editor interacts with the LLMs using an “iterative, conversational approach” to generate a statistical review. 

NEJM AI found that the human and AI reviews of manuscripts “contained both overlapping and complementary critiques” and that the published articles were revised substantively based on both human and AI comments. They also found that the AI review proved highly effective in assessing whether articles followed NEJM AI reporting guidelines and whether reported analyses aligned with the Statistical Analysis Plan, a time-consuming but critical component of statistical reviews.

An important finding in NEJM AI’s pilot was that, in its initial attempt at an AI-assisted statistical review, an unstructured engagement with the LLM “generated multiple unsolicited critiques and suggestions that […] were sometimes off topic.” A six-step process was subsequently introduced to provide structure for the human–AI interaction, and “to establish GPT-5 as an assistant rather than a reviewer.”

NEJM AI’s pilot shows that AI can contribute meaningfully to review, but only when held within a carefully structured, human-led process.

Across Both Tools 

Both pilots make the same two points:

  1. AI can strengthen peer review, but only when it is applied to structured tasks and kept away from the parts of review that rely on human judgment. Since LLMs are language models, not knowledge models, this is not altogether surprising. 
  2. These two trials confirm that manuscript compliance and specific research integrity checks, which are becoming more common, are mature and reliable use cases for PRAs.           

The peer review crisis and reviewer fatigue have been much documented. If PRAs can remove some of the drudgery, it may transpire that there are plenty of reviewers in the system who will engage in a better-supported review process. 

The line between “assistant” and “participant” matters. Clear parameters are required in roles and responsibilities that extend to AI assistants. In these two pilots, and in the wider community, there is agreement that a decision to accept or reject a submission can ultimately only be made by a human editor, not by AI. NEJM AI stays within that boundary, yet its framing suggests something new: the AI isn’t just supplying background material, it is sitting at the proverbial review table, with authors expected to respond to its critiques, alongside the human reviewer. 

The PLP participating publishers and The NEJM Group are to be applauded for sharing their findings. Transparency at this stage is essential, as publishers weigh the genuine value of new AI capabilities against the guardrails and policies needed to protect the quality and integrity of the peer review system and its output. As Gilestro points out in the Nature article, “We have the chance to build a system in which algorithms handle the syntax so that humans can handle the significance — but to get there, we must review the peer reviewer.”

AI Everywhere Now

Gemini 3 is Google’s latest and greatest LLM. By many measures, Gemini 3 outperforms recent models by OpenAI and Anthropic, prompting a mad scramble and some soul searching at OpenAI. For observers who saw Google as hopelessly behind in the AI race, this was a plot twist. 

Beyond the standings in the AI horse race (undoubtedly a competitor will release a superior model in the near future), AI development at Google, more so than other companies, has implications for professional publishers. Gemini, unlike ChatGPT or Anthropic, which are thankfully harder to stumble across, is being rolled out across Google’s ecosystem, including in Search (where it delivers “AI Overviews” and prompts “AI Mode”). 

“We are the engine room of Google, and we’re plugging in AI everywhere now,” Demis Hassabis, CEO of Google DeepMind, reportedly told WIRED.

A better Gemini with more AI summarizing means even less linking to publisher websites from Google Search than before. In other words, Gemini 3 inches us that much closer to Google Zero.

For readers of The Brief, the ongoing “AI-ification” of Google’s main search product was perhaps the second most notable announcement out of Mountain View this past month. A more consequential development may prove to be the AI-ification of Google Scholar.

On November 18, Google unveiled Google Scholar Labs, an “AI powered Scholar search that is designed to help you answer detailed research questions.” According to Google, the AI-powered version of Scholar lets users pose complex research questions and receive a curated set of relevant scholarly papers, paired with an AI-generated explanation (a structured summary) of how each paper addresses the query. While Google Scholar Labs might share the same AI “engine room” as Search, its results are very different. Scholar Labs still returns a list of papers that link directly to publisher websites. There is no attempt to provide an overall answer or to summarize answers to the research question. It is simply a list with a small, structured summary after each paper. 

However, the list is much shorter than a typical search result in regular Google Scholar. Google provides some example search queries for Scholar Labs. One positively riveting question is, “Has anyone used single molecule footprinting to examine transcription factor binding in human cells?” The result is a list with 10 articles. A similar query in regular Google Scholar (“single molecule footprinting and transcription factor binding in human cells”) returns a list of 39,000 papers. Ideally, a smaller list with only the best results would be better from a user perspective — and obviously no one is going to wade through 39,000 papers. Nonetheless, it is a huge difference. The Google Scholar Labs user has to trust the AI quite a bit to winnow down those 39,000 papers to just 10. Might there be another 10 or 20 or 100 papers that are also relevant? While is hard to imagine that many researchers will download more than 10 papers from any given query, perhaps overall traffic to publishers from Google Scholar Labs will be similar to that from regular Scholar. But it seems likely that the specific papers downloaded will be different when using the AI version of Scholar as compared with the non-AI version. What this means for publishers (or scientific progress) is anyone’s guess. It may all come out in the wash or some publishers may end up with missing laundry. 

The S Is for OuroboroS

While librarian A.J. Boston has characterized open access (OA) discourse as “just a global community passing ‘wait, no, not like that’ back and forth forever,” our preferred metaphor is that of Ouroboros, the ancient symbolic snake eating its own tail. Each new entrant into the conversation seems to discover the concepts anew, propose the same solutions, and then go through the process of discovering all the same unintended (but not unpredicted) consequences of those solutions. The end result is usually the same: a recognition that one-size-fits-all approaches are not ideal, that the author-pays article processing charge (APC) model merely shifts inequity from readers to authors, and that more experimentation and study are needed (perhaps this might be termed an “OAwakening”).

The basic tenets of OA seem obvious and achievable (“there is enough money in the system”) when pitched to policymakers or funders without an extensive grounding in the history of the movement or the complicated nature of the academic career advancement system. And so, a “simple” problem begets a simple answer. Wash, rinse, repeat. 

On the plus side, this cycle is getting faster. cOAlition S seems to have, with the release of its new strategic plan, compressed the last 25 or so years of OA into a mere seven-year journey. cOAlition S now makes it clear that pluralistic solutions are needed to address a fragmented community rather than shoehorning the diverse needs of all researchers into one EU-centric model. After spending the last seven years resolutely arguing exactly the opposite position, the coalition now concludes that “no single model can meet all needs.” 

The organization’s new strategic plan calls for two phases, the first a wildly optimistic two-year timeline meant to study and understand alternative publishing models (e.g., preprints, publish–review–curate, and Diamond OA), the potential role of AI in scholarly communication, OA infrastructure needs, and how to build long-term sustainable publishing systems. All of these are worthy goals, but all are also questions with no easy answers. Phase two will be putting what has been learned in phase one into action, although as reported in Science, “the group will decide later whether to help fund the cost of operating such alternative venues.” On the one hand, this seems fair, but on the other hand, given that funding is the primary function of the members of the coalition, seems strikingly non-committal.

The Fragmented Social Media Landscape

A new De Gruyter Brill report and a companion Scholarly Kitchen article show that academic social media is splintering. The collapse of “academic Twitter” as a trusted, cross-disciplinary hub has removed a central channel for collaboration, sharing research, and circulating calls for papers. In its place, activity is fragmenting across platforms — with some researchers experimenting with Bluesky, Mastodon, LinkedIn, and Substack, where engagement can be high, but reach remains limited. 

Bluesky currently offers the most promise as a space for informal scientific exchange and real-time community dialogue. Its reach remains modest compared with Twitter’s peak, however, and engagement is uneven. That said, some research communities are beginning to rebuild there and some organizations are already seeing strong engagement as a result. 

Meanwhile, LinkedIn has quietly become the most important work-oriented platform — now the third-most-used overall, according to the survey — even if it is still not part of many researchers’ daily routines. While long used by other professionals, more researchers are finding their footing there. This is why many marketing and editorial teams are increasingly prioritizing it as a core channel. 

It is worth briefly mentioning that Instagram and TikTok are also being used selectively, as organizations seek to reach younger researchers and professionals, as well as the amplification of science and medical influencers.

At the same time, more traditional marketing and communication channels are making a quiet resurgence. As the report notes, perhaps the most striking finding is the quiet resurgence of older, more controlled, and perhaps more trusted forms of communication.” Email, newsletters, peer networks, webinars, podcasts, videos, and in-person events are reclaiming space once ceded to social media.Together, these shifts signal the end of a unified academic commons and the rise of a more uneven, decentralized ecosystem for scholarly communication. This places new pressure on marketing and editorial teams that are already stretched thin. In response, organizations are rethinking how their systems and processes work, where to apply automation, and how to align editorial and marketing efforts more intentionally. 

Briefly Noted

The author of a preprint posted to arXiv alleges that an API (application programming interface) response error at Springer Nature has resulted in misdirected citations to hundreds of thousands of papers, affecting well over a million authors. According to the preprint author’s analysis, the problem affects online-only publication and stems from differences in how online-only articles are typically handled in metadata (with reference to a volume and article number as opposed to a volume and page range). Reportedly, the error results in the first article of a given volume being cited, instead of the paper intended from that volume. If you happen to be the lucky authors of the first paper listed in a volume in one of these journals, you may be receiving a bonkers number of citations. Conversely, other authors are missing out. The impacted journals include Nature CommunicationsScientific Reports, and many of the BioMedCentral journals, among others. According to reporting by Retraction Watch, the citation errors not only impact the journals’ own websites and open citation databases such as Crossref and Google Scholar, but also appear to spill over into curated databases including Scopus and Web of Science. 

Further on the impact of the “Google Zero” effect of LLMs on traffic to websites (discussed in AI Everywhere Now and in the July issue of The Brief), a new M+R report examining 17 nonprofits shows a 13% year-on-year decline in website traffic even as search queries are rising. The data indicate that people are increasingly getting answers without visiting the nonprofits’ sites. Counterintuitively, during the same period, donations rose 3% year-on-year. While this raises more questions than we can cover here, it underscores the need for robust analytics to understand not just traffic, but also true business impact.

A group of biomedical societies have announced the establishment of the BioCore consortium. The initiative is described as a “collaborative framework designed to strengthen society-led publishing.” Founding members include The Federation of American Societies for Experimental Biology (FASEB), Society for the Study of Reproduction (SSR), Society for Experimental Biology and Medicine (SEBM), and the Shock Society. The stated aims of BioCore are to “uphold publishing integrity, expand the reach and influence of society journals, ensure long-term financial sustainability, and enhance the collective presence of scientific societies and their research communities.”

Last month we asked the question — is there a size limit for journals, a point where they grow too large to be governable? We suggested that Scientific Reports may be an outlier from these problems, but perhaps we spoke too soon. This month, one of the more egregious AI-slop images we’ve seen was published and then rapidly retracted by the journal. From “Factor Fexcectorn” to the seeming body horror of disconnected limbs, this one should have been pretty obvious. The number of people who see images before they are published in a journal is typically large: Authors, editors, reviewers, copy editors, production staff, and so on. It is hard to imagine no one noticing the image from submission all the way to publication, but here we are. 

Slop images aren’t the only area where the community is being overrun by AI-generated content. arXiv has announced that it will no longer post review articles and position papers as preprints in its computer science category, and will only host them after they’ve been reviewed and accepted by a journal. This is in response to an overwhelming number of AI-generated submissions of these article types, which are easier to generate than research results.

GigaScience is the latest journal to face a mass editorial resignation after BGI, the genomics company that owns it, relocated the publication’s operations from Hong Kong to Shenzhen, China, and let go much of the journal’s staff. 

The American Medical Association has sunset the AMA Journal of Ethics.

Elsevier announced the launch of LeapSpace, a new AI-assisted research workspace that purports to help researchers brainstorm, review literature, compare studies, and find funding by using trusted, curated academic content. It combines generative AI with Elsevier’s research database and adds transparency features (“Trust Cards”) to show sources and confidence levels. LeapSpace, in other words, appears to be a research-oriented AI assistant designed around researcher use cases and vetted scholarly content.

Lyrasis, along with the Big Ten Academic Alliance’s Center for Library Programs and the California Digital Library, have received a $207k grant from the Gates Foundation to support the Mapping U.S. Diamond Open Access Journals project. The project’s aim is to “illuminate the decentralized U.S. Diamond OA landscape, identify key challenges, and produce actionable recommendations to support sustainable, community-governed scholarly publishing.”

STM has developed “Publishing Decoded,” an excellent “explainer” on scholarly publishing. It provides an overview of the key terms (e.g., “open access,” “version of record”) and a process overview (“A Manuscript’s Journey”). It provides an overview of what the industry does and its value proposition for people who may be visiting from elsewhere (such as board members, funders, policy professionals, and politicians). 

Oxford University Press is preparing for layoffs. According to The Bookseller, the planned restructuring is expected to impact 113 people, primarily in the organization’s Education and English Language Teaching divisions. 

Just in time for your holiday reading list, the National Institutes of Health has released a compilation of comments on their proposed publication fee price caps. If you don’t have time to read the whole thing, Sciencehas helpfully provided a synopsis (TL;DR mostly negative).  

And finally, for those who find the drift toward Google Zero to be a step backward for the Internet, McSweeney’s has a list of revised definitions of the verb “to Google” for you

***
One of the things I worry about publishing … is that many of us are in publishing without clearly wanting to win. – Y.S. Chi, Chairman of Elsevier and Director of RELX, in an interview with Publishers Weekly