Tag: Peer review

  • Plagiarism in peer-review reports could be the ‘tip of the iceberg’

    Plagiarism in peer-review reports could be the ‘tip of the iceberg’

    [ad_1]

    Mikołaj Piniewski is a researcher to whom PhD students and collaborators turn when they need to revise or refine a manuscript. The hydrologist, at the Warsaw University of Life Sciences, has a keen eye for problems in text — a skill that came in handy last year when he encountered some suspicious writing in peer-review reports of his own paper.

    Last May, when Piniewski was reading the peer-review feedback that he and his co-authors had received for a manuscript they’d submitted to an environmental-science journal, alarm bells started ringing in his head. Comments by two of the three reviewers were vague and lacked substance, so Piniewski decided to run a Google search, looking at specific phrases and quotes the reviewers had used.

    To his surprise, he found the comments were identical to those that were already available on the Internet, in multiple open-access review reports from publishers such as MDPI and PLOS. “I was speechless,” says Piniewski. The revelation caused him to go back to another manuscript that he had submitted a few months earlier, and dig out the peer-review reports he received for that. He found more plagiarized text. After e-mailing several collaborators, he assembled a team to dig deeper.

    The team published the results of its investigation in Scientometrics in February1, examining dozens of cases of apparent plagiarism in peer-review reports, identifying the use of identical phrases across reports prepared for 19 journals. The team discovered exact quotes duplicated across 50 publications, saying that the findings are just “the tip of the iceberg” when it comes to misconduct in the peer-review system.

    Dorothy Bishop, a former neuroscientist at the University of Oxford, UK, who has turned her attention to investigating research misconduct, was “favourably impressed” by the team’s analysis. “I felt the way they approached it was quite useful and might be a guide for other people trying to pin this stuff down,” she says.

    Peer review under review

    Piniewski and his colleagues conducted three analyses. First, they uploaded five peer-review reports from the two manuscripts that his laboratory had submitted to a rudimentary online plagiarism-detection tool. The reports had 44–100% similarity to previously published online content. Links were provided to the sources in which duplications were found.

    The researchers drilled down further. They broke one of the suspicious peer-review reports down to fragments of one to three sentences each and searched for them on Google. In seconds, the search engine returned a number of hits: the exact phrases appeared in 22 open peer-review reports, published between 2021 and 2023.

    The final analysis provided the most worrying results. They took a single quote — 43 words long and featuring multiple language errors, including incorrect capitalization — and pasted it into Google. The search revealed that the quote, or variants of it, had been used in 50 peer-review reports.

    Predominantly, these reports were from journals published by MDPI, PLOS and Elsevier, and the team found that the amount of duplication increased year-on-year between 2021 and 2023. Whether this is because of an increase in the number of open-access peer-review reports during this time or an indication of a growing problem is unclear — but Piniewski thinks that it could be a little bit of both.

    Why would a peer reviewer use plagiarized text in their report? The team says that some might be attempting to save time, whereas others could be motivated by a lack of confidence in their writing ability, for example, if they aren’t fluent in English.

    The team notes that there are instances that might not represent misconduct. “A tolerable rephrasing of your own words from a different review? I think that’s fine,” says Piniewski. “But I imagine that most of these cases we found are actually something else.”

    The source of the problem

    Duplication and manipulation of peer-review reports is not a new phenomenon. “I think it’s now increasingly recognized that the manipulation of the peer-review process, which was recognized around 2010, was probably an indication of paper mills operating at that point,” says Jennifer Byrne, director of biobanking at New South Wales Health in Sydney, Australia, who also studies research integrity in scientific literature.

    Paper mills — organizations that churn out fake research papers and sell authorships to turn a profit — have been known to tamper with reviews to push manuscripts through to publication, says Byrne.

    However, when Bishop looked at Piniewski’s case, she could not find any overt evidence of paper-mill activity. Rather, she suspects that journal editors might be involved in cases of peer-review-report duplication and suggests studying the track records of those who’ve allowed inadequate or plagiarized reports to proliferate.

    Piniewski’s team is also concerned about the rise of duplications as generative artificial intelligence (AI) becomes easier to access. Although his team didn’t look for signs of AI use, its ability to quickly ingest and rephrase large swathes of text is seen as an emerging issue.

    A preprint posted in March2 showed evidence of researchers using AI chatbots to assist with peer review, identifying specific adjectives that could be hallmarks of AI-written text in peer-review reports.

    Bishop isn’t as concerned as Piniewski about AI-generated reports, saying that it’s easy to distinguish between AI-generated text and legitimate reviewer commentary. “The beautiful thing about peer review,” she says, is that it is “one thing you couldn’t do a credible job with AI”.

    Preventing plagiarism

    Publishers seem to be taking action. Bethany Baker, a media-relations manager at PLOS, who is based in Cambridge, UK, told Nature Index that the PLOS Publication Ethics team “is investigating the concerns raised in the Scientometrics article about potential plagiarism in peer reviews”.

    An Elsevier representative told Nature Index that the publisher “can confirm that this matter has been brought to our attention and we are conducting an investigation”.

    In a statement, the MDPI Research Integrity and Publication Ethics Team said that it has been made aware of potential misconduct by reviewers in its journals and is “actively addressing and investigating this issue”. It did not confirm whether this was related to the Scientometrics article.

    One proposed solution to the problem is ensuring that all submitted reviews are checked using plagiarism-detection software. In 2022, exploratory work by Adam Day, a data scientist at Sage Publications, based in Thousand Oaks, California, identified duplicated text in peer-review reports that might be suggestive of paper-mill activity. Day offered a similar solution of using anti-plagiarism software, such as Turnitin.

    Piniewski expects the problem to get worse in the coming years, but he hasn’t received any unusual peer-review reports since those that originally sparked his research. Still, he says that he’s now even more vigilant. “If something unusual occurs, I will spot it.”

    [ad_2]

    Source link

  • Algorithm ranks peer reviewers by reputation — but critics warn of bias

    Algorithm ranks peer reviewers by reputation — but critics warn of bias

    [ad_1]

    An algorithm ranks the reputation of peer reviewers on the basis of how many citations the studies they have reviewed attracted.

    The tool, outlined in a study published in February1, could help to identify which papers could become high impact during peer review, its creators say. They add that, during peer review, authors should put the most weight on the recommendations and feedback from reviewers of previous papers that have been highly cited.

    The study authors extracted citation data from 308,243 papers published by journals of the American Physical Society (APS) between 1990 and 2010 that had accumulated more than 5 citations each. Information about the referees of these papers was not available, so the authors used an algorithm to create imaginary reviewers, which rated papers on the basis of an algorithm that was trained on citation data from the APS data set. Using the review scores that these papers received in real life (a score of 1 being poor and 5 being outstanding), the study authors compared how closely the imaginary reviewers’ scores correlated to the actual scores the papers received.

    To rank the imaginary reviewers, the study authors tracked the citations accumulated by the papers published between 1990 and 2000 and checked the review scores they were given. Imaginary reviewers that gave high review scores to papers that went on to attract a high number of citations were given a high ranking.

    The authors then tested how effective these reputation rankings were in predicting citation numbers of papers refereed by the same imaginary reviewers in the second decade of the data. The study found that the imaginary reviewers’ recommendations on the 2000–10 papers were in line with the actual citation counts of these papers over that time span, says study co-author An Zeng, an environmental scientist at Beijing Normal University. This suggests that the algorithm is good at predicting high-impact papers, he adds.

    More eyes on peer reviewers

    Previous attempts to quantify and predict the reach of studies have been widely criticized for relying too heavily on citation-based metrics, which, critics say, exacerbate existing biases in academia. A 2021 study2 found that non-replicable papers are cited more than replicable studies, possibly because they have more ‘interesting’ results.

    Zeng acknowledges the limitations of focusing on citation metrics, but says that it’s important to evaluate the work of peer reviewers. Solid studies are sometimes rejected because of one negative review, he notes, but there’s little attention given to how professional or reliable that reviewer is. “If this algorithm can identify reliable reviewers, it will give less weight to the reviewers who are not so reliable,” says Zeng.

    Journal editors often use search tools to identify candidates to peer review papers, but they have to manually decide who to contact. If referee activities were ranked and quantified, this would make it easier for journal editors to choose, Zeng points out.

    However, ranking reviewers on their reputation is likely to exacerbate the inequities and biases that exist in peer review, says Anita Bandrowski, an information scientist at the University of California, San Diego.

    As previous data have shown, most of the responsibility of the peer-review process in science falls to a small subset of peer reviewers — typically men in senior positions in high-income nations that are geographically closer to most journal editors.

    Bandrowski notes that the algorithm might favour those with a long history of reviewing, because they’ve had more time to accumulate citations on their refereed papers. “The oldest reviewers by this metric would be the best reviewers and yet the oldest reviewers are going to be retired or dead,” she says.

    Zeng disagrees that his approach will make the selection of peer reviewers more inequitable than it is now. After implementing the reputation ranking, editors might find that some reviewers who are not frequently invited have high reputation scores — in some cases better than those who are inundated with referee requests, he says.

    Capturing the nuance

    Laura Feetham-Walker, a reviewer-engagement manager at the Institute of Physics Publishing in Bristol, UK, worries that the algorithm might not account for incremental studies, negative findings and replications of previous studies, all of which are crucial for science, albeit often not highly cited.

    “Under their system, a reviewer who gave a favourable recommendation on an incremental study — for example, for a journal that does not have novelty as an editorial criterion — would go down in the reviewer reputation ranking, simply because that manuscript would be unlikely to accrue large numbers of citations when published,” she says.

    Neither does the ranking account for researchers who have never reviewed before, Feetham-Walker adds, or at least those who have never reviewed for a particular publisher.

    “We know that a reviewer’s ability to provide a helpful review is dependent not just on their expertise, but also their availability and interest in the subject matter. We also know that reviewers are human, and their reviewing behaviour can change over time depending on various factors,” Feetham-Walker says. “A nuanced algorithm that took all of this into account, as well as adding new reviewers to enrich the pool, would be of genuine value to publishers.”

    [ad_2]

    Source link

  • Structure peer review to make it more robust

    Structure peer review to make it more robust

    [ad_1]

    In February, I received two peer-review reports for a manuscript I’d submitted to a journal. One report contained 3 comments, the other 11. Apart from one point, all the feedback was different. It focused on expanding the discussion and some methodological details — there were no remarks about the study’s objectives, analyses or limitations.

    My co-authors and I duly replied, working under two assumptions that are common in scholarly publishing: first, that anything the reviewers didn’t comment on they had found acceptable for publication; second, that they had the expertise to assess all aspects of our manuscript. But, as history has shown, those assumptions are not always accurate (see Lancet 396, 1056; 2020). And through the cracks, inaccurate, sloppy and falsified research can slip.

    As co-editor-in-chief of the journal Research Integrity and Peer Review (an open-access journal published by BMC, which is part of Springer Nature), I’m invested in ensuring that the scholarly peer-review system is as trustworthy as possible. And I think that to be robust, peer review needs to be more structured. By that, I mean that journals should provide reviewers with a transparent set of questions to answer that focus on methodological, analytical and interpretative aspects of a paper.

    For example, editors might ask peer reviewers to consider whether the methods are described in sufficient detail to allow another researcher to reproduce the work, whether extra statistical analyses are needed, and whether the authors’ interpretation of the results is supported by the data and the study methods. Should a reviewer find anything unsatisfactory, they should provide constructive criticism to the authors. And if reviewers lack the expertise to assess any part of the manuscript, they should be asked to declare this.

    Other aspects of a study, such as novelty, potential impact, language and formatting, should be handled by editors, journal staff or even machines, reducing the workload for reviewers.

    The list of questions reviewers will be asked should be published on the journal’s website, allowing authors to prepare their manuscripts with this process in mind. And, as others have argued before, review reports should be published in full. This would allow readers to judge for themselves how a paper was assessed, and would enable researchers to study peer-review practices.

    To see how this works in practice, since 2022 I’ve been working with the publisher Elsevier on a pilot study of structured peer review in 23 of its journals, covering the health, life, physical and social sciences. The preliminary results indicate that, when guided by the same questions, reviewers made the same initial recommendation about whether to accept, revise or reject a paper 41% of the time, compared with 31% before these journals implemented structured peer review. Moreover, reviewers’ comments were in agreement about specific parts of a manuscript up to 72% of the time (M. Malički and B. Mehmani Preprint at bioRxiv https://doi.org/mrdv; 2024). In my opinion, reaching such agreement is important for science, which proceeds mainly through consensus.

    I invite editors and publishers to follow in our footsteps and experiment with structured peer reviews. Anyone can trial our template questions (see go.nature.com/4ab2ppc), or tailor them to suit specific fields or study types. For instance, mathematics journals might also ask whether referees agree with the logic or completeness of a proof. Some journals might ask reviewers if they have checked the raw data or the study code. Publications that employ editors who are less embedded in the research they handle than are academics might need to include questions about a paper’s novelty or impact.

    Scientists can also use these questions, either as a checklist when writing papers or when they are reviewing for journals that don’t apply structured peer review.

    Some journals — including Proceedings of the National Academy of Sciences, the PLOS family of journals, F1000 journals and some Springer Nature journals — already have their own sets of structured questions for peer reviewers. But, in general, these journals do not disclose the questions they ask, and do not make their questions consistent. This means that core peer-review checks are still not standardized, and reviewers are tasked with different questions when working for different journals.

    Some might argue that, because different journals have different thresholds for publication, they should adhere to different standards of quality control. I disagree. Not every study is groundbreaking, but scientists should view quality control of the scientific literature in the same way as quality control in other sectors: as a way to ensure that a product is safe for use by the public. People should be able to see what types of check were done, and when, before an aeroplane was approved as safe for flying. We should apply the same rigour to scientific research.

    Ultimately, I hope for a future in which all journals use the same core set of questions for specific study types and make all of their review reports public. I fear that a lack of standard practice in this area is delaying the progress of science.

    Competing Interests

    M.M. is co-editor-in-chief of the Research Integrity and Peer Review journal that publishes signed peer review reports alongside published articles. He is also the chair of the European Association of Science Editors Peer Review Committee.

    [ad_2]

    Source link

  • Is ChatGPT corrupting peer review? Telltale words hint at AI use

    Is ChatGPT corrupting peer review? Telltale words hint at AI use

    [ad_1]

    A close up view of ChatGPT displayed on a laptop screen while two hands are poised to type.

    A study suggests that researchers are using chatbots to assist with peer review.Credit: Rmedia7/Shutterstock

    A study that identified buzzword adjectives that could be hallmarks of AI-written text in peer-review reports suggests that researchers are turning to ChatGPT and other artificial intelligence (AI) tools to evaluate others’ work.

    The authors of the study1, posted on the arXiv preprint server on 11 March, examined the extent to which AI chatbots could have modified the peer reviews of conference proceedings submitted to four major computer-science meetings since the release of ChatGPT.

    Their analysis suggests that up to 17% of the peer-review reports have been substantially modified by chatbots — although it’s unclear whether researchers used the tools to construct reviews from scratch or just to edit and improve written drafts.

    The idea of chatbots writing referee reports for unpublished work is “very shocking” given that the tools often generate misleading or fabricated information, says Debora Weber-Wulff, a computer scientist at the HTW Berlin–University of Applied Sciences in Germany. “It’s the expectation that a human researcher looks at it,” she adds. “AI systems ‘hallucinate’, and we can’t know when they’re hallucinating and when they’re not.”

    The meetings included in the study are the Twelfth International Conference on Learning Representations, due to be held in Vienna next month, 2023’s Annual Conference on Neural Information Processing Systems, held in New Orleans, Louisiana, the 2023 Conference on Robot Learning in Atlanta, Georgia, and the 2023 Conference on Empirical Methods in Natural Language Processing in Singapore.

    Nature reached out to the organizers of all four conferences for comment, but none responded.

    Buzzword search

    Since its release in November 2022, ChatGPT has been used to write a number of scientific papers, in some cases even being listed as an author. Out of more than 1,600 scientists who responded to a 2023 Nature survey, nearly 30% said they had used generative AI to write papers and around 15% said they had used it for their own literature reviews and to write grant applications.

    In the arXiv study, a team led by Weixin Liang, a computer scientist at Stanford University in California, developed a technique to search for AI-written text by identifying adjectives that are used more often by AI than by humans.

    By comparing the use of adjectives in a total of more than 146,000 peer reviews submitted to the same conferences before and after the release of ChatGPT, the analysis found that the frequency of certain positive adjectives, such as ‘commendable’, ‘innovative’, ‘meticulous’, ‘intricate’, ‘notable’ and ‘versatile’, had increased significantly since the chatbot’s use became mainstream. The study flagged the 100 most disproportionately used adjectives.

    Reviews that gave a lower rating to conference proceedings or were submitted close to the deadline, and those whose authors were least likely to respond to rebuttals from authors, were most likely to contain these adjectives, and therefore most likely to have been written by chatbots at least to some extent, the study found.

    “It seems like when people have a lack of time, they tend to use ChatGPT,” says Liang.

    The study also examined more than 25,000 peer reviews associated with around 10,000 manuscripts that had been accepted for publication across 15 Nature Portfolio journals between 2019 and 2023, but didn’t find a spike in usage of the same adjectives since the release of ChatGPT.

    A spokesperson for Springer Nature said the publisher asks peer reviewers not to upload manuscripts into generative AI tools, noting that these still have “considerable limitations” and that reviews might include sensitive or proprietary information. (Nature’s news team is independent of its publisher.)

    Springer Nature is exploring the idea of providing peer reviewers with safe AI tools to guide their evaluation, the spokesperson said.

    Transparency issue

    The increased prevalence of the buzzwords Liang’s study identified in post-ChatGPT reviews is “really striking”, says Andrew Gray, a bibliometrics support officer at University College London. The work inspired him to analyse the extent to which some of the same adjectives, as well as a selection of adverbs, crop up in peer-reviewed studies published between 2015 and 2023. His findings, described in an arXiv preprint published on 25 March, show a significant increase in the use of certain terms, including ‘commendable’, ‘meticulous’ and ‘intricate’, since ChatGPT surfaced2. The study estimates that the authors of at least 60,000 papers published in 2023 — just over 1% of all scholarly studies published that year — used chatbots to some extent.

    Gray says it’s possible peer reviewers are using chatbots only for copyediting or translation, but that a lack of transparency from authors makes it difficult to tell. “We have the signs that these things are being used,” he says, “but we don’t really understand how they’re being used.”

    “We do not wish to pass a value judgement or claim that the use of AI tools for reviewing papers is necessarily bad or good,” Liang says. “But we do think that for transparency and accountability, it’s important to estimate how much of that final text might be generated or modified by AI.”

    Weber-Wulff doesn’t think tools such as ChatGPT should be used to any extent during peer review, and worries that the use of chatbots might be even higher in cases in which referee reports are not published. (The reviews of papers published by Nature Portfolio journals used in Liang’s study were available online as part of a transparent peer-review scheme.) “Peer review has been corrupted by AI systems,” she says.

    Using chatbots for peer review could also have copyright implications, Weber-Wulff adds, because it could involve giving the tools access to confidential, unpublished material. She notes that the approach of using telltale adjectives to detect potential AI activity might work well in English, but could be less effective for other languages.

    [ad_2]

    Source link

  • Three ways ChatGPT helps me in my academic writing

    Three ways ChatGPT helps me in my academic writing

    [ad_1]

    Jon Gruda

    For Dritjon Gruda, artificial-intelligence chatbots have been a huge help in scientific writing and peer review.Credit: Vladimira Stavreva-Gruda

    Confession time: I use generative artificial intelligence (AI). Despite the debate over whether chatbots are positive or negative forces in academia, I use these tools almost daily to refine the phrasing in papers that I’ve written, and to seek an alternative assessment of work I’ve been asked to evaluate, as either a reviewer or an editor. AI even helped me to refine this article.

    I study personality and leadership at Católica Porto Business School in Portugal and am an associate editor at Personality and Individual Differences and Psychology of Leaders and Leadership. The value that I derive from generative AI is not from the technology itself blindly churning out text, but from engaging with the tool and using my own expertise to refine what it produces. The dialogue between me and the chatbot both enhances the coherence of my work and, over time, teaches me how to describe complex topics in a simpler way.

    Whether you’re using AI in writing, editing or peer review, here’s how it can do the same for you.

    Polishing academic writing

    Ever heard the property mantra, ‘location, location, location’? In the world of generative AI, it’s ‘context, context, context’.

    Context is king. You can’t expect generative AI — or anything or anyone, for that matter — to provide a meaningful response to a question without it. When you’re using a chatbot to refine a section of your paper for clarity, start by outlining the context. What is your paper about, and what is your main argument? Jot down your ideas in any format — even bullet points will work. Then, present this information to the generative AI of your choice. I typically use ChatGPT, made by OpenAI in San Francisco, California, but for tasks that demand a deep understanding of language nuances, such as analysing search queries or text, I find Gemini, developed by researchers at Google, to be particularly effective. The open-source large language models made by Mixtral, based in Paris, are ideal when you’re working offline but still need assistance from a chatbot.

    Regardless of which generative-AI tool you choose, the key to success lies in providing precise instructions. The clearer you are, the better. For example, you might write: “I’m writing a paper on [topic] for a leading [discipline] academic journal. What I tried to say in the following section is [specific point]. Please rephrase it for clarity, coherence and conciseness, ensuring each paragraph flows into the next. Remove jargon. Use a professional tone.” You can use the same technique again later on, to clarify your responses to reviewer comments.

    Remember, the chatbot’s first reply might not be perfect — it’s a collaborative and iterative process. You might need to refine your instructions or add more information, much as you would when discussing a concept with a colleague. It’s the interaction that improves the results. If something doesn’t quite hit the mark, don’t hesitate to say, “This isn’t quite what I meant. Let’s adjust this part.” Or you can commend its improvements: “This is much clearer, but let’s tweak the ending for a stronger transition to the next section.”

    This approach can transform a challenging task into a manageable one, filling the page with insights you might not have fully gleaned on your own. It’s like having a conversation that opens new perspectives, making generative AI a collaborative partner in the creative process of developing and refining ideas. But importantly, you are using the AI as a sounding board: it is not writing your document for you; nor is it reviewing manuscripts.

    Elevating peer review

    Generative AI can be a valuable tool in the peer-review process. After thoroughly reading a manuscript, summarize key points and areas for review. Then, use the AI to help organize and articulate your feedback (without directly inputting or uploading the manuscript’s text, thus avoiding privacy concerns). For example, you might instruct the AI: “Assume you’re an expert and seasoned scholar with 20+ years of academic experience in [field]. On the basis of my summary of a paper in [field], where the main focus is on [general topic], provide a detailed review of this paper, in the following order: 1) briefly discuss its core content; 2) identify its limitations; and 3) explain the significance of each limitation in order of importance. Maintain a concise and professional tone throughout.”

    I’ve found that AI partnerships can be incredibly enriching; the tools often offer perspectives I hadn’t considered. For instance, ChatGPT excels at explaining and justifying the reasons behind specific limitations that I had identified in my review, which helps me to grasp the broader implications of the study’s contribution. If I identify methodological limitations, ChatGPT can elaborate on these in detail and suggest ways to overcome them in a revision. This feedback often helps me to connect the dots between the limitations and their collective impact on the paper’s overall contribution. Occasionally, however, its suggestions are off-base, far-fetched, irrelevant or simply wrong. And that is why the final responsibility for the review always remains with you. A reviewer must be able to distinguish between what is factual and what is not, and no chatbot can reliably do that.

    Optimizing editorial feedback

    The final area in which I benefit from using chatbots is in my role as a journal editor. Providing constructive editorial feedback to authors can be challenging, especially when you oversee several manuscripts every week. Having personally received countless pieces of unhelpful, non-specific feedback — such as, “After careful consideration, we have decided not to proceed with your manuscript” — I recognize the importance of clear and constructive communication. ChatGPT has become indispensable in this process, helping me to craft precise, empathetic and actionable feedback without replacing human editorial decisions.

    For instance, after evaluating a paper and noting its pros and cons, I might feed these into ChatGPT and get it to draft a suitable letter: “On the basis of these notes, draft a letter to the author. Highlight the manuscript’s key issues and clearly explain why the manuscript, despite its interesting topic, might not provide a substantial enough advancement to merit publication. Avoid jargon. Be direct. Maintain a professional and respectful tone throughout.” Again, it might take a few iterations to get the tone and content just right.

    I’ve found that this approach both enhances the quality of my feedback and helps to guarantee that I convey my thoughts supportively. The result is a more positive and productive dialogue between editors and authors.

    There is no doubt that generative AI presents challenges to the scientific community. But it can also enhance the quality of our work. These tools can bolster our capabilities in writing, reviewing and editing. They preserve the essence of scientific inquiry — curiosity, critical thinking and innovation — while improving how we communicate our research.

    Considering the benefits, what are you waiting for?

    [ad_2]

    Source link

  • Is AI ready to mass-produce lay summaries of research articles?

    Is AI ready to mass-produce lay summaries of research articles?

    [ad_1]

    AI chatbot use showing a tablet screen with language bubbles on top of it.

    Generative AI might be a powerful tool in making research more accessible for scientists and the broader public alike.Credit: Getty

    Thinking back to the early days of her PhD programme, Esther Osarfo-Mensah recalls struggling to keep up with the literature. “Sometimes, the wording or the way the information is presented actually makes it quite a task to get through a paper,” says the biophysicist at University College London. Lay summaries could be a time-saving solution. Short synopses of research articles written in plain language could help readers to decide which papers to focus on -— but they aren’t common in scientific publishing. Now, the buzz around artificial intelligence (AI) has pushed software engineers to develop platforms that can mass produce these synopses.

    Scientists are drawn to AI tools because they excel at crafting text in accessible language, and they might even produce clearer lay summaries than those written by people. A study1 released last year looked at lay summaries published in one journal and found that those created by people were less readable than were the original abstracts -— potentially because some researchers struggle to replace jargon with plain language or to decide which facts to include when condensing the information into a few lines.

    AI lay-summary platforms come in a variety of forms (see ‘AI lay-summary tools’). Some allow researchers to import a paper and generate a summary; others are built into web servers, such as the bioRxiv preprint database.

    AI lay-summary tools

    Several AI resources have been developed to help readers glean information about research articles quickly. They offer different perks. Here are a few examples and how they work:

    – SciSummary: This tool parses the sections of a paper to extract the key points and then runs those through the general-purpose large language model GPT-3.5 to transform them into a short summary written in plain language. Max Heckel, the tool’s founder, says it incorporates multimedia into the summary, too: “If it determines that a particular section of the summary is relevant to a figure or table, it will actually show that table or figure in line.”

    – Scholarcy: This technology takes a different approach. Its founder, Phil Gooch, based in London, says the tool was trained on 25,000 papers to identify sentences containing verb phrases such as “has been shown to” that often carry key information about the study. It then uses a mixture of custom and open-source large language models to paraphrase those sentences in plain text. “You can actually create ten different types of summaries,” he adds, including one that lays out how the paper is related to previous publications.

    – SciSpace: This tool was trained on a repository of more than 280 million data sets, including papers that people had manually annotated, to extract key information from articles. It uses a mixture of proprietary fine-tuned models and GPT-3.5 to craft the summary, says the company’s chief executive, Saikiran Chandha, based in San Francisco, California. “A user can ask questions on top of these summaries to further dig into the paper,” he notes, adding that the company plans to develop audio summaries that people can tune into on the go.

    Benefits and drawbacks

    Mass-produced lay summaries could yield a trove of benefits. Beyond helping scientists to speed-read the literature, the synopses can be disseminated to people with different levels of expertise, including members of the public. Osarfo-Mensah adds that AI summaries might also aid people who struggle with English. “Some people hide behind jargon because they don’t necessarily feel comfortable trying to explain it,” she says, but AI could help them to rework technical phrases. Max Heckel is the founder of SciSummary, a company in Columbus, Ohio, that offers a tool that allows users to import a paper to be summarized. The tool can also translate summaries into other languages, and is gaining popularity in Indonesia and Turkey, he says, arguing that it could topple language barriers and make science more accessible.

    Despite these strides, some scientists feel that improvements are needed before we can rely on AI to describe studies accurately.

    Will Ratcliff, an evolutionary biologist at the Georgia Institute of Technology in Atlanta, argues that no tool can produce better text than can professional writers. Although researchers have different writing abilities, he invariably prefers reading scientific material produced by study authors over those generated by AI. “I like to see what the authors wrote. They put craft into it, and I find their abstract to be more informative,” he says.

    Nana Mensah, a PhD student in computational biology at the Francis Crick Institute in London, adds that, unlike AI, people tend to craft a narrative when writing lay summaries, helping readers to understand the motivations behind each step of the study. He says, however, that one advantage of AI platforms is that they can write summaries at different reading levels, potentially broadening the audience. In his experience, however, these synopses might still include jargon that can confuse readers without specialist knowledge.

    AI tools might even struggle to turn technical language into lay versions at all. Osarfo-Mensah works in biophysics, a field with many intricate parameters and equations. She found that an AI summary of one of her research articles excluded information from a whole section. If researchers were looking for a paper with those details and consulted the AI summary, they might abandon her paper and look for other work.

    Andy Shepherd, scientific director at global technology company Envision Pharma Group in Horsham, UK, has in his spare time compared the performances of several AI tools to see how often they introduce blunders. He used eight text generators, including general ones and some that had been optimized to produce lay summaries. He then asked people with different backgrounds, such as health-care professionals and the public, to assess how clear, readable and useful lay summaries were for two papers.

    “All of the platforms produced something that was coherent and read like a reasonable study, but a few of them introduced errors, and two of them actively reversed the conclusion of the paper,” he says. It’s easy for AI tools to make this mistake by, for instance, omitting the word ‘not’ in a sentence, he explains. Ratcliff cautions that AI summaries should be viewed as a tool’s “best guess” of what a paper is about, stressing that it can’t check facts.

    Broader readership

    The risk of AI summaries introducing errors is one concern among many. Another is that one benefit of such summaries — that they can help to share research more widely among the public — could also have drawbacks. The AI summaries posted alongside bioRxiv preprints, research articles that have yet to undergo peer review, are tailored to different levels of reader expertise, including that of the public. Osarfo-Mensah supports the effort to widen the reach of these works. “The public should feel more involved in science and feel like they have a stake in it, because at the end of the day, science isn’t done in a vacuum,” she says.

    But others point out that this comes with the risk of making unreviewed and inaccurate research more accessible. Mensah says that academics “will be able to treat the article with the sort of caution that’s required”, but he isn’t sure that members of the public will always understand when a summary refers to unreviewed work. Lay summaries of preprints should come with a “hazard warning” informing the reader upfront that the material has yet to be reviewed, says Shepherd.

    “We agree entirely that preprints must be understood as not peer-reviewed when posted,” says John Inglis, co-founder of bioRxiv, who is based at Cold Spring Harbor Laboratory in New York. He notes that such a disclaimer can be found on the homepage of each preprint, and if a member of the public navigates to a preprint through a web search, they are first directed to the homepage displaying this disclaimer before they can access the summary. But the warning labels are not integrated into the summaries, so there is a risk that these could be shared on social media without the disclaimer. Inglis says bioRxiv is working with its partner ScienceCast, whose technology produces the synopses, on adding a note to each summary to negate this risk.

    As is the case for many other nascent generative-AI technologies, humans are still working out the messaging that might be needed to ensure users are given adequate context. But if AI lay-summary tools can successfully mitigate these and other challenges, they might become a staple of scientific publishing.

    [ad_2]

    Source link

  • Peer-replication model aims to address science’s ‘reproducibility crisis’

    Peer-replication model aims to address science’s ‘reproducibility crisis’

    [ad_1]

    A group of three female technicians discuss work in laboratory while wearing white lab coats.

    An independent team could replicate select experiments in a paper before publication, to help catch errors and poor methodology.Credit: SolStock/Getty

    Could the replication crisis in scientific literature be addressed by having scientists independently attempt to reproduce their peers’ key experiments during the publication process? And would teams be incentivized to do so by having the opportunity to report their findings in a citable paper, to be published alongside the original study?

    These are questions being asked by two researchers who say that a formal peer-replication model could greatly benefit the scientific community.

    Anders Rehfeld, a researcher in human sperm physiology at Copenhagen University Hospital, began considering alternatives to standard peer review after encountering a published study that could not be replicated in his laboratory. Rehfeld’s experiments1 revealed that the original paper was flawed, but he found it very difficult to publish the findings and correct the scientific record.

    “I sent my data to the original journal, and they didn’t care at all,” Rehfeld says. “It was very hard to get it published somewhere where you thought the reader of the original paper would find it.”

    The issues that Rehfeld encountered could have been avoided if the original work had been replicated by others before publication, he argues. “If a reviewer had tried one simple experiment in their own lab, they could have seen that the core hypothesis of the paper was wrong.”

    Rehfeld collaborated with Samuel Lord, a fluorescence-microscopy specialist at the University of California, San Francisco, to devise a new peer-replication model.

    In a white paper detailing the process2, Rehfeld, Lord and their colleagues describe how journal editors could invite peers to attempt to replicate select experiments of submitted or accepted papers by authors who have opted in. In the field of cell biology, for example, that might involve replicating a western blot, a technique used to detect proteins, or an RNA-interference experiment that tests the function of a certain gene. “Things that would take days or weeks, but not months, to do” would be replicated, Lord says.

    The model is designed to incentivize all parties to participate. Peer replicators — unlike peer reviewers — would gain a citable publication, and the authors of the original paper would benefit from having their findings confirmed. Early-career faculty members at mainly undergraduate universities could be a good source of replicators: in addition to gaining citable replication reports to list on their CVs, they would get experience in performing new techniques in consultation with the original research team.

    Rehfeld and Lord are discussing their idea with potential funders and journal editors, with the goal of running a pilot programme this year.

    “I think most scientists would agree that some sort of certification process to indicate that a paper’s results are reproducible would benefit the scientific literature,” says Eric Sawey, executive editor of the journal Life Science Alliance, who plans to bring the idea to the publisher of his journal. “I think it would be a good look for any journal that would participate.”

    Who pays?

    Sawey says there are two key questions about the peer-replication model: who will pay for it, and who will find the labs to do the reproducibility tests? “It’s hard enough to find referees for peer review, so I can’t imagine cold e-mailing people, asking them to repeat the paper,” he says. Independent peer-review organizations, such as ASAPbio and Review Commons, might curate a list of interested labs, and could even decide which experiments will be replicated.

    Lord says that having a third party organize the replication efforts would be great, and adds that funding “is a huge challenge”. According to the model, funding agencies and research foundations would ideally establish a new category of small grants devoted to peer replication. “It could also be covered by scientific societies, or publication fees,” Rehfeld says.

    It’s also important for journals to consider what happens when findings can’t be replicated. “If authors opt in, you’d like to think they’re quite confident that the work is reproducible,” says Sawey. “Ideally, what would come out of the process is an improved methods or protocols section, which ultimately allows the replicating lab to reproduce the work.”

    Most important, says Rehfeld, is ensuring that the peer-replication reports are published, irrespective of the outcome. If replication fails, then the journal and original authors would choose what to do with the paper. If an editor were to decide that the original manuscript was seriously undermined, for example, they could stop it from being published, or retract it. Alternatively, they could publish the two reports together, and leave the readers to judge. “I could imagine peer replication not necessarily as an additional ‘gatekeeper’ used to reject manuscripts, but as additional context for readers alongside the original paper,” says Lord.

    A difficult but worthwhile pursuit

    Attempting to replicate others’ work can be a challenging, contentious undertaking, says Rick Danheiser, editor-in-chief of Organic Syntheses, an open-access chemistry journal in which all papers are checked for replicability by a member of the editorial board before publication. Even for research from a well-resourced, highly esteemed lab, serious problems can be uncovered during reproducibility checks, Danheiser says.

    Replicability in a field such as synthetic organic chemistry — in which the identity and purity of every component in a reaction flask should already be known — is already challenging enough, so the variables at play in some areas of biology and other fields could pose a whole new level of difficulty, says Richard Sever, assistant director of Cold Spring Harbor Laboratory Press in New York, and co-founder of the bioRxiv and medRxiv preprint servers. “But just because it’s hard, doesn’t mean there might not be cases where peer replication would be helpful.”

    The growing use of preprints, which decouple research dissemination from evaluation, allows some freedom to rethink peer evaluation, Sever adds. “I don’t think it could be universal, but the idea of replication being a formal part of evaluating at least some work seems like a good idea to me.”

    An experiment to test a different peer-replication model in the social sciences is currently under way, says Anna Dreber Almenberg, who studies behavioural and experimental economics at the Stockholm School of Economics. Dreber is a board member of the Institute for Replication (I4R), an organization led by Abel Brodeur at University of Ottawa, which works to systematically reproduce and replicate research findings published in leading journals. In January, I4R entered an ongoing partnership with Nature Human Behaviour to attempt computational reproduction of data and findings of as many studies published from 2023 onwards as possible. Replication attempts from the first 18 months of the project will be gathered into a ‘meta-paper’ that will go through peer review and be considered for publication in the journal.

    “It’s exciting to see how people from completely different research fields are working on related things, testing different policies to find out what works,” says Dreber. “That’s how I think we will solve this problem.”

    [ad_2]

    Source link